1 Introduction

While neutrino oscillation experiments [1,2,3] have provided unambiguous evidence of non-zero neutrino masses, the absolute neutrino mass scale remains an open question. The primary objective of the Karlsruhe Tritium Neutrino (KATRIN) experiment is to probe this scale in a direct kinematic measurement at an unprecedented sensitivity of \(0.2 \hbox {eV}/\hbox {c}^2\) (\(90 \%\,\hbox {C.L.}\)) [4]. The measurement principle is based on a shape analysis of the tritium \(\upbeta \)-decay spectrum by high precision electron spectroscopy. A non-zero neutrino mass will cause a distortion in the observed spectrum, which is most pronounced close to the endpoint energy of 18.6 keV. This technique has been successfully established by the direct neutrino mass experiments in Mainz and Troitsk, which place the most stringent direct upper limit on the effective electron neutrino mass [5,6,7,8]:

$$\begin{aligned} m(\upnu _\text {e}) < 2\hbox { eV}/\hbox {c}^2 \quad (95 \%\,\hbox {C.L.}). \end{aligned}$$
(1)

Improving this limit in \(m(\upnu _\text {e})\) by a factor of 10 demands an enhancement in statistical and systematic precision of the effective observable \(m^2(\upnu _\text {e})\) by a factor of 100. This requires both an in-depth understanding of the theoretical electron \(\upbeta \)-decay spectrum and an accurate knowledge of the experimental response in measuring the spectral shape. In Sect. 3 we explain the KATRIN setup in more detail.

It is the goal of this work to provide a complete and up-to-date model of the experiment, such that it can be used as either a prescription or reference for upcoming analyses of tritium \(\upbeta \)-decay data observed with KATRIN. For established aspects of this model, we refer to the appropriate publications. For those not yet published at all or not in the required detail, we provide the necessary derivations. The later will mostly be the case for the description of the experimental response function, which has been considerably refined during recent commissioning phases.

In this work we first present a detailed account of the theoretical \(\upbeta \)  spectrum of tritium, with an emphasis on molecular effects in \(\mathrm {T_2}\) (Sect. 2). We then outline the experimental configuration of KATRIN (Sect. 3), before we elaborate on the individual characteristics that define the response of our instrument in Sect. 4. The statistical techniques suited to determine the effective neutrino mass from a fit of the modeled \(\upbeta \)  spectrum to the measured data are treated in Sect. 5. A summary of this work is given in Sect. 6.

Throughout this article we use natural units (\(c = \hbar = 1\)) for better readability, except for Sects. 4.7 and 4.8 where we use SI units instead.

2 Theoretical description of the differential \(\upbeta \)-decay spectrum

In this section we compile a comprehensive analytical description of the differential \(\upbeta \)-decay spectrum, with specific focus on gaseous molecular tritium \(\mathrm {T_2}\), the \(\upbeta \)  emitter used by KATRIN. We will also evaluate the relevance of various theoretical correction terms on the neutrino mass analysis.

In the following, we use the shorthand notation \( {m_{\upnu }} = m(\upnu _\text {e})\) for better readability. Furthermore, we assume there is no difference between the masses of the neutrinos and the anti-neutrinos, i.e. \( {m_{\upnu }} = m(\upnu _\text {e}) = m({\bar{\upnu }}_\text {e})\).

In the \(\upbeta \)-decay of atomic tritium, the surplus energy Q is shared between the electron’s kinetic energy E, the total neutrino energy and the recoil energy \(E_\text {rec}\) of the much heavier daughter nucleus:

$$\begin{aligned} \mathrm {T} \, \longrightarrow \, \mathrm {^3He}^+ + \text {e}^- + {\bar{\upnu }}_\text {e} \, + \, Q(\mathrm {T}). \end{aligned}$$
(2)

In the case of a vanishing neutrino mass, the electron spectrum would terminate at the endpoint energy

$$\begin{aligned} E_0 = Q - E_\text {rec}. \end{aligned}$$
(3)

2.1 Fermi theory

The differential decay rate of a tritium nucleus can be described with Fermi’s Golden Rule as [9]

$$\begin{aligned} \left( \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E} \right) _\text {nuc}&= \frac{G_\text {F}^2 \, |V_\text {ud}|^2}{2\pi ^3} \; |M_\text {nuc}|^2 \; F(Z, E) \; p\, (E+ {m_\text {e}} ) \nonumber \\&\quad \cdot \sum _i \; |U_{\text {e}i}|^2 \; \epsilon \; \sqrt{ \epsilon ^2 - m_i^2} \; \varTheta (\epsilon - m_i). \end{aligned}$$
(4)

The Fermi coupling constant \(G_\text {F}\) is projected onto the (u, d) coupling by the Cabibbo angle \(\theta _\text {C}\) with \(|V_\text {ud}| = \cos {\theta _\text {C}} = 0.97425\pm 0.00022\) [8].

For tritium \(\upbeta \)-decay – a super-allowed transition – the nuclear transition matrix element \(M_\text {nuc}\) is independent of the electron energy. It can be divided into a vector (Fermi) part and an axial (Gamow–Teller) part

$$\begin{aligned} \left| M_\text {nuc}\right| ^2 = g_\text {V}^2 + 3 g_\text {A}^2, \end{aligned}$$
(5)

with the vector coupling constant \(g_\text {V} = 1\) and the axial-vector coupling constant defined by \(g_\text {A}/g_\text {V} = -1.2646\pm 0.0035\) in tritium [10].

The classical Fermi function F(ZE) accounts for the Coulomb interaction between the outgoing electron and the daughter nucleus with atomic charge Z (here \(Z = 2\)):

$$\begin{aligned} F(Z, E) = \frac{2\pi \eta }{1-\exp (-2\pi \eta )} \end{aligned}$$
(6)

with the Sommerfeld parameter \(\eta = \alpha Z/\beta \); \(\alpha \) is the fine structure constant and \(\beta = v/c\) is the electron velocity relative to speed of light. Here F(ZE) is written in the non-relativistic approximation; the relativistic \(F(Z,E)_\text {rel}\) and its commonly-used approximation is given in Appendix A.1.

The full spectrum is an incoherent sum over the three known neutrino mass eigenstates \(m_i\) (\(i=1,2,3\)) with the intensity of each component defined by the squared magnitude of the neutrino mixing matrix elements \(|U_{\text {e}i}|^2\) [11].

The phase-space factor of the outgoing electron with momentum p is given by the factor \(p\, (E+ {m_\text {e}} )\). The phase space of the emitted neutrino is the product of the neutrino energy \(\epsilon = E_0 - E\) and the neutrino momentum \(\sqrt{ \epsilon ^2 - m_i^2}\), which determines the shape of the \(\upbeta \)-electron spectrum near the tritium endpoint \(E_0\). The Heaviside step function \(\varTheta \) ensures that the kinetic energy cannot become negative.

The full \(\upbeta \)-decay spectrum is shown in Fig. 1. The dependence of the spectral shape on the effective neutrino mass close to the endpoint is depicted in Fig. 2.

Fig. 1
figure 1

The differential \(\upbeta \)-electron energy spectrum for the \(\upbeta \)-decay of molecular tritium with the endpoint energy \(E_0\) of 18.574 keV. The given units correspond to the decay rate of a single tritium nucleus

Fig. 2
figure 2

The differential \(\upbeta \)-electron energy spectrum near the endpoint for the decay of molecular tritium as given by Eq. (4), under the assumption of various neutrino masses \( {m_{\upnu }} \)

2.2 Neutrino mass eigenstate splittings

In the KATRIN sensitivity range we can simplify the analysis by considering the effective electron neutrino mass square \( {m_{\upnu }^2} \) of a quasi-degenerate model in Eq. (4), given by an incoherent sum as

$$\begin{aligned} {m_{\upnu }^2} = \sum _{i} |U_{\text {e}i}|^2 \, m_i^2. \end{aligned}$$
(7)

Calculations have shown this approximation of the \(\upbeta \)-decay spectrum to be valid, both for the normal and inverted mass hierarchies [12, 13].

2.3 Molecular tritium \(\mathrm {T_2}\)

When we consider the \(\upbeta \)-decay of gaseous molecular tritium \(\mathrm {T_2}\),

$$\begin{aligned} \mathrm {T_2} \, \longrightarrow \, \mathrm {^3HeT}^+ + \text {e}^- + {\bar{\upnu }}_\text {e} \, + \, Q(\mathrm {T_2}), \end{aligned}$$
(8)

the released energy Q has to be corrected for the differences in electronic binding energies between the atomic and actual molecular systems (see [9] for a detailed explanation). The nuclear recoil also excites a spectrum of rotational and vibrational final states in the daughter molecular system, and generates excitations of its electronic shell. The neutrino energy in Eq. (4) has to be corrected by

$$\begin{aligned} \epsilon \; \rightarrow \; \epsilon _f = E_0 - V_f - E, \end{aligned}$$
(9)

with the endpoint \(E_0(\mathrm {T_2}) = (18574.00\pm 0.07)\hbox { eV}\) for molecular tritium [9, 14]. The recoil energy reaches a maximum of \(E_\text {rec} = 1.72\hbox { eV}\) at the \(\upbeta \)-endpoint, which gives a fixed endpoint energy \(E_0(\mathrm {T_2}) = Q(\mathrm {T_2}) - E_\text {rec}\) [9]. The differential decay rate, with the additional summation over each final state f with energy \(V_f\) and weighing by the transitional probability \(P_f\) to a state f in the daughter molecule, is then:

$$\begin{aligned} \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}&= \frac{G_\text {F}^2 \, |V_\text {ud}|^2}{2\pi ^3} \; |M_\text {nuc}|^2 \; F(Z, E) \cdot p\, (E+ {m_\text {e}} ) \nonumber \\&\quad \cdot \sum _f \; P_f \; \epsilon _f \; \sqrt{ \epsilon _f^2 - {m_{\upnu }^2} } \; \varTheta (\epsilon _f - {m_{\upnu }} ). \end{aligned}$$
(10)

2.4 Excited molecular final states

Fig. 3
figure 3

Comparison of molecular final-state distributions of \(\mathrm {HeT^+}\) and \(\mathrm {HeD^+}\). Sampled from [15, 16] with a 0.1 eV binning for excitation energies \(V_f \le 4\hbox { eV}\) and a 1.0 eV binning for \(V_f > 4\hbox { eV}\), summed over the initial angular momenta states \(0 \le J \le 2\) according to their population at a temperature of \(T = 30\hbox { K}\)

After the decay, the daughter molecular system is left in an excited rotational, vibrational and electronic state. According to theoretical calculations, about 57 % of all \(\mathrm {T_2}\) \(\upbeta \)-decays result in the rovibronically-broadened electronic ground state with an average excitation energy of about 1.7 eV, while the others go to the excited electronic states [17]. Each discrete final state effectively branches into its own \(\upbeta \)  spectrum with a distinct endpoint energy.

The accuracy of a neutrino mass measurement critically depends on the knowledge of the distribution of these final states, which have to be taken from theory. Precise calculations of the final state distributions of the hydrogen isotopologues (\(\mathrm {T_2 \rightarrow HeT^+}\), \(\mathrm {DT \rightarrow HeD^+}\) and \(\mathrm {HT \rightarrow HeH^+}\)) have been performed in the endpoint region [15, 16]. The discrete energy states and their transition probabilities have been determined below the dissociation threshold, while continuous distributions are available above the threshold. A comprehensive review of the theory of the tritium final-state spectrum and current validation efforts can be found in [18].

Figure 3 gives a comparison of the final-state distributions of \(\mathrm {HeT^+}\) and \(\mathrm {HeD^+}\). The differences in their distributions arise from the mass difference; thus, a precise knowledge of the source gas isotopological composition and its stabilization on the 0.1 %level are necessary. Laser Raman spectroscopy [19] provides two important input parameters for our source model: the tritium purity \(\epsilon _\mathrm {T}\) denoting the fraction of tritium nuclei,Footnote 1 and \(\kappa \) denoting the ratio of \(\mathrm {DT}\) versus \(\mathrm {HT}\).

In the calculations provided by Doss et al. [15, 16] and Saenz et al. [20], the higher recoil energies of the lighter isotopologues are incorporated into their respective energy spectra that are given relative to the recoil energy of \(\mathrm {HeT^+}\). That way, the final-state distributions of each isotopologue can be summed and weighted according to its abundance in the source gas. Furthermore, these calculations provide separate distributions for each initial quantum state of molecular angular momentum, denoted by the quantum number J. These must be weighted according to the population of their respective J states before the \(\upbeta \)-decay, which is given by a Boltzmann distribution

$$\begin{aligned} P_J(T) \propto g_s g_J \exp \left( - \frac{\varDelta E_J}{k_B T}\right) \; , \end{aligned}$$
(11)

where T is the local temperature of the source gas, \(k_B\) the Boltzmann constant and \(\varDelta E_J\) the energy to the electronic ground state. The rotational degeneracy of the distribution is given by the factor \(g_J = (2J+1)\), whereas \(g_s\) accounts for the spin degeneracy of the nuclei. It is \(g_s = 1\) for heteronuclear molecules (\(\mathrm {DT}\), \(\mathrm {HT}\)) without spin coupling. For \(\mathrm {T_2}\) as a homonuclear molecule, it is given by the ratio \(\lambda \) of molecules in an ortho (parallel nuclei spins) state or the ratio \(1-\lambda \) in the para states (anti-parallel nuclei spins). Hence, \(g_s = \lambda \) for ortho states with odd J and \(g_s = 1-\lambda \) for para states with even J [21]. In the KATRIN tritium circulation system the source gas is forced into thermal equilibrium at \(T = 300\hbox { K}\) by a permeator membrane,Footnote 2 resulting in \(\lambda \simeq 0.75\) [18].

2.5 Exact relativistic three-body calculation

The \(\upbeta \)  spectrum formalism outlined above contains approximations to the exact relativistic calculations of the three-body phase space density [22, 23]. In deriving Eq. (10), the dependence of the daughter molecule’s recoil energy \(E_\text {rec}\) on the neutrino mass \(m_i\) and the final-state spectrum \(V_f\) is neglected. This approximation results in a minute shift of the maximum electron energy, which is on the order of 0.1 meV [23], as depicted in Fig. 4. In the neutrino mass analysis, such a shift in the energy scale is compensated by the external constraint of the endpoint \(E_0\); thus, the effective two-body representation of Eq. (10) is an adequate approximation in the energy region of interest (also see Table 1). A summary of the energy-dependent, higher-order correction terms is given in Sect. 2.6.

Fig. 4
figure 4

Comparison of the differential \(\upbeta \)-electron energy spectrum of atomic tritium for the full relativistic kinematic treatment and the non-relativistic approximation, assuming a neutrino mass of \( {m_{\upnu }} = 1\hbox { eV}\)

2.6 Additional correction terms

In addition to the Fermi function (F(ZE)) correction factors arising from other nuclear and atomic physics effects must be evaluated and applied multiplicatively. The formulae and the references to these effects are given in Appendix A.1. The following is a synopsis.

  • Radiative corrections: In addition to the Coulomb interaction described by F(ZE), electromagnetic effects involving contributions from virtual and real photons give rise to a correction factor \(G(E,E_0)\).

  • Screening: The unscreened F(ZE), which describes the Coulomb interaction between the daughter nucleus and the departing \(\upbeta \)-electron, must be corrected by a factor S(ZE) that accounts for the screening effect on the Coulomb field by the \(\text {1s}\)-orbital electrons left behind by the parent molecule.

  • Recoil effects: In the relativistic elementary particle treatment of the \(\upbeta \)-decay (see for instance [23, 24]), energy-dependent recoil effects on the order of 1 / M can be calculated, with M being the mass of \(\mathrm {^3He}\). These effects – spectrum shape modification due to a three-body phase space, weak magnetism and \(V-A\) interference – are typically combined into a common factor \(R(E, E_0, M)\).

  • Finite structure of the nucleus: Because the \(\mathrm {^3He^+}\) daughter nucleus is not a point-like object, the Coulomb field does not scale with an inverse-squared relationship within the radius, leading to a correction factor L(ZE). A proper convolution of the electron and neutrino wave functions with the nucleonic wave function throughout the nuclear volume leads to another factor C(ZE).

  • Recoiling Coulomb field: The departing electron does not propagate in the field of a stationary charge, but one which is itself recoiling from the electron emission. This effect introduces another correction factor \(Q(Z, E, E_0, M)\).

  • Orbital-electron interactions: A correction factor I(ZE) is introduced to account for possible quantum mechanical interactions between the departing \(\upbeta \)-electron and the \(\text {1s}\)-orbital electrons.

The differential \(\upbeta \)  spectrum, including all the theoretical correction factors discussed above, can be written as follows:

$$\begin{aligned} \left( \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E} \right) _\text {C}&= \frac{G_\text {F}^2 \, |V_\text {ud}|^2}{2\pi ^3} \; (g_\text {V}^2 + 3 g_\text {A}^2) \; F_\text {rel}(Z, E) \nonumber \\&\quad \cdot p \, (E+ {m_\text {e}} ) \cdot S \, L \, C \, I \nonumber \\&\quad \cdot \sum _f \; G \, R \, Q \cdot P_f \, \epsilon _f \, \sqrt{\epsilon _f^2 - {m_{\upnu }^2} } \; \varTheta (\epsilon _f - {m_{\upnu }} ). \end{aligned}$$
(12)

The corrections connected to the recoil of the daughter nucleus, namely R and Q, and the radiative corrections G, depend on the endpoint energy and the phase space of a specific excited final state. This dependency is reflected in Eq. (12), as these factors are summed over the possible final states.

Fig. 5
figure 5

Theoretical correction factors to the differential \(\upbeta \)-decay spectrum of \(\mathrm {T_2}\), evaluated in an interval \(30\,\text {eV}\) below the endpoint \(E_0\) and summed over possible final states

Fig. 6
figure 6

The KATRIN experimental setup, 70 m in length. The monitoring and calibration section (a) residing at the rear of the high-luminosity windowless source (b) provides stable and precise monitoring of tritium gas properties. The transport system (c) magnetically guides the electrons further downstream and prevents tritiated gas from entering the spectrometer section, which features two spectrometers operating as MAC-E-filters. The smaller pre-spectrometer (d) acts as a pre-filter for low energy electrons, and the larger main spectrometer (e) is used for the energy analysis in the endpoint region. A segmented detector (f) acts as a counter for the transmitted signal electrons

In Fig. 5, a graphical overview of these correction factors in the energy interval 30 eV below the tritium endpoint is given. The radiative corrections have the most significant effect with a pronounced energy dependence, as they deplete the spectrum completely towards the endpoint. Most other corrections are negligible in the neutrino mass analysis, as further detailed in Sect. 4.12 and Table 1.

3 The KATRIN experiment

The experimental setup of KATRIN combines a high-luminosity windowless gaseous molecular tritium source (WGTS) with an integrating electrostatic spectrometer of MAC-E filter (magnetic adiabatic collimation with electrostatic filter) type [25,26,27], offering a narrow filter width and a wide solid-angle acceptance at the same time.

The apparatus depicted in Fig. 6 features several major subsystems. The isotopological composition, temperature, and density fluctuations of the tritium source are monitored by a set of calibration devices housed in the rear section (a). The windowless gaseous tritium source (b) contains a beam tube of length \(L = 10\hbox { m}\) and diameter \(d = 90\hbox { mm}\), residing in a nominal magnetic field of 3.6 T, where re-purified molecular tritium (\(\mathrm {T_2}\)) is continuously circulated by injection at the center and pumping at both ends through a closed loop system [28,29,30]. To prevent tritiated gas from entering the spectrometer section, the transport section (c) combines differential pumping with cryogenic pumping to reduce the tritium flow by 14 orders of magnitude [31, 32]. The \(\upbeta \)-electrons are guided through the entire beamline by a magnetic field [33] into the pre-spectrometer (d), which acts as a pre-filter that blocks the low-energy electrons of the \(\upbeta \)-spectrum [34]. The energy analysis around the endpoint region takes place in the main spectrometer (e), which is operated under ultra-high vacuum conditions [35] at a retarding voltage of about \(-18.6\hbox { kV}\). Both spectrometers are designed as MAC-E filters, and the main spectrometer achieves a very narrow filter width (\(\lesssim 1\hbox { eV}\)) [9] while providing high luminosity for the \(\upbeta \)-electrons. Electrons with sufficient energy pass both the MAC-E filters and are then counted at a segmented silicon PIN diode detector (f) [36] with 148 individual pixels. An integrated \(\upbeta \)-spectrum is recorded by scanning the retarding voltage in the endpoint region.

3.1 MAC-E filter principle

The electrons emitted isotropically from tritium \(\upbeta \)-decay in the gaseous source are guided adiabatically by magnetic fields. In the forward direction the \(\upbeta \)-electrons are confined in cyclotron motion along the magnetic field lines towards the MAC-E filter. Along their path to the analyzing plane (central plane) of the spectrometer, the magnetic field strength decreases by several orders of magnitude.Footnote 3 Due to the conservation of magnetic moment in a slowly varying field, most of the electrons’ transverse momentum is adiabatically transformed into longitudinal momentum. With a high negative potential (\(U \approx -18.6\hbox { kV}\), corresponding to the endpoint energy of tritium) at its center and most of the electron momentum being parallel to the magnetic field lines, the MAC-E filter acts as an electrostatic high-pass energy filter. Only electrons with positive longitudinal energy (the kinetic energy in direction of the magnetic field line) along their entire trajectory are transmitted, while the others are reflected and re-accelerated towards the entrance of the spectrometer.

The residual transverse energy, which cannot be analyzed by the filter, is defined by the ratio of the maximum \(B_\text {max}\) to the minimum magnetic field \(B_\mathrm {min} = B_\text {A}\). This key characteristic of the MAC-E filter is commonly called the filter width (or sometimes energy resolution)

$$\begin{aligned} \varDelta E = \frac{B_\text {A}}{B_\text {max}} \cdot E \, \frac{\gamma +1}{2}, \end{aligned}$$
(13)

with E being the electron kinetic energy and \(\gamma = \frac{E}{m_\text {e}} + 1\) the relativistic gamma factor with the electron rest mass \(m_\text {e}\).

4 Response function of the KATRIN experiment

In the KATRIN experiment, the energy of the \(\upbeta \)-electrons is analyzed using the MAC-E filter technique as described in Sect. 3. For a specific electrostatic retardation potential U, the count rate of electrons at the detector can be calculated, given the probability of an electron with a starting energy E to traverse the whole apparatus and hit the detector. This probability is described by the so-called transmission function T(EU). Additional modifications arise from energy loss and scattering in the source, and reflection of signal electrons propagating from their point of origin until detection. These effects are incorporated together with the transmission function into the response function R(EU), which is vital for the neutrino mass analysis as it describes the propagation of signal electrons that contribute to the integrated \(\upbeta \)-spectrum.

For illustrative purposes, we first consider a source containing a given number of tritium nuclei (\(N_\mathrm {T}\)) that decay with an isotropic angular distribution.Footnote 4 The emitted electrons are guided by magnetic fields through the spectrometer. The detection rate at the detector for a given spectrometer potential U can be expressed as:

$$\begin{aligned} \dot{N}(U) = \frac{1}{2}\,N_\mathrm {T}\int \limits _{qU}^{E_0} \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}(E_0, m_\upnu ^2) \cdot R(E, U) \, \,\mathrm {d}E, \end{aligned}$$
(14)

where the factor of \(\frac{1}{2}\) incorporates the fact that the response function R(EU) only considers electrons emitted in the forward direction.

In the following, an analytical description of the response function of the KATRIN experiment will be laid out. At first, we derive the transmission function of the MAC-E filter that is implemented by the main spectrometer (Sect. 4.1). In Sect. 4.2 we consider energy loss in the source and develop a first description of the response function. Inhomogeneities in the MAC-E filter (Sect. 4.3) and the source (Sect. 4.4) requires extension of the model by a segmentation of the source and spectrometer volume. Further modifications to the response function arise from considering the effective source column density which an individual \(\upbeta \)-electron traverses (Sect. 4.5), changes to the electron angular distribution (Sect. 4.6), thermal motion of the source gas (Sect. 4.7), and energy loss by cyclotron radiation (Sect. 4.8). After discussing these contributions, in Sect. 4.9 we arrive at a description of the integrated spectrum that is measured by the KATRIN experiment. We close the discussion with a general note on experimental energy uncertainties (Sect. 4.11) and give a quantitative overview of theoretical corrections and systematic effects (Sect. 4.12) on the neutrino mass analysis.

4.1 Transmission function of the MAC-E filter

The transmission of \(\upbeta \)-electrons through the MAC-E filter is an important characteristic of the measurement and a significant part of the response function. In the simplest case, one can assume that electrons enter the MAC-E filter with an isotropic angular distribution and propagate adiabatically towards the detector. In the discussion here we apply the adiabatic approximation (see Eq. (15) below), which is fulfilled in the case of KATRIN.

In general, an electron from the source will reach the detector if the momentum \(p_\parallel \) parallel to the magnetic field lines (or the corresponding fraction \(E_\parallel \) of the kinetic energy) is always positive. The transformation of transverse to parallel momentum and back in a slowly varying magnetic field B is governed by the following adiabatic invariant (which corresponds to the conserved orbital momentum \(\mu = E_\perp / B\) in the non-relativistic limit):

$$\begin{aligned} \frac{p^2_{\perp }}{B} = \text {const.} \end{aligned}$$
(15)

In the following discussion we use the general relation between the transverse momentum \(p_\perp \) of an electron with its transverse kinetic energy \(E_\perp \):

$$\begin{aligned} p_\perp ^2 = E_\perp \; (\gamma + 1) \cdot {m_\text {e}} \end{aligned}$$
(16)

with the relativistic gamma factor \(\gamma = \frac{E}{ {m_\text {e}} } + 1\), and thereby define the transverse kinetic energy as:

$$\begin{aligned} E_\perp = E \; \sin ^2 \theta . \end{aligned}$$
(17)

Similarly, we define the longitudinal kinetic energy as \(E_\parallel = E \; \cos ^2 \theta \). The polar angle \(\theta = \angle (\mathbf {p},\mathbf {B})\) of an electron momentum to the magnetic field is called the pitch angle.

We can now define the adiabatic transmission condition for an electron starting at the position \(z_\text {S}\) with a magnetic field \(B_\text {S} = B(z_\text {S})\), an electrostatic potential \(U_\text {S} = U(z_\text {S})\), a kinetic energy \(E = E(z_\text {S})\) with a corresponding gamma factor \(\gamma \), and a pitch angle \(\theta = \theta (z_\text {S})\). The transmission condition then reads for all longitudinal positions z:

$$\begin{aligned} 0&\le E_\parallel (z) \nonumber \\&= E + q U_\text {S} - E_\perp (z) - q U(z) \nonumber \\&= E + q U_\text {S} - E \; \sin ^2 \theta \cdot \frac{B(z)}{B_\text {S}} \; \frac{\gamma +1}{\gamma (z)+1} - qU(z), \end{aligned}$$
(18)

where \(\gamma (z)\) corresponds to the gamma factor at an arbitrary position z along the beam line where the electron has a kinetic energy \(E(z) = E_\parallel (z) + E_\perp (z)\) at a magnetic field B(z) and an electrostatic potential U(z).

Usually in a MAC-E filter the highest retarding potential U and at the same time the smallest magnetic field \(B_\text {A}\) is reached in the analyzing plane (located at \(z_\text {ap} = 0\) in our definition). Secondly we can assume the electrical potential \(U_\text {S}\) at the start to be zero and the relativistic factor in the analyzing plane at the largest retardation (minimum kinetic energy) to equal one, \(\gamma (z_\text {ap}) = 1\). Therefore the transmission condition in Eq. (18) simplifies to

$$\begin{aligned} 0 \le E - E \; \sin ^2 \theta \cdot \frac{B_\text {A}}{B_\text {S}} \; \frac{\gamma +1}{2} - qU. \end{aligned}$$
(19)

For a given electric potential and magnetic field configuration of the MAC-E filter, the transmission condition \({\mathcal {T}}\) is thus just governed by the starting energy E, the starting angle \(\theta \) and the retarding voltage U.

$$\begin{aligned} {\mathcal {T}}(E,\theta ,U) =\left\{ \begin{array}{ll} 1 &{}\quad \text {if}\quad \displaystyle E \, \left( 1 - \sin ^2 \theta \cdot \frac{B_\text {A}}{B_\text {S}} \cdot \frac{\gamma +1}{2} \right) \\ &{}\qquad \qquad - qU > 0\\ 0 &{}\quad \text {else} \end{array} \right. \, . \end{aligned}$$
(20)

For an isotropically emitting electron source with angular distribution \(\omega (\theta ) \, \,\mathrm {d}\theta = \sin \theta \, \,\mathrm {d}\theta \), we can integrate \({\mathcal {T}}(E,\theta ,U)\) over the angle \(\theta \) and define a response or transmission function. From here on we associate the remaining energy in the analyzing plane of the MAC-E filter – the surplus energy – with the expression \({\mathcal {E}}= E - qU\).

In the KATRIN setup the maximum magnetic field \(B_\text {max}\) is larger than \(B_\text {S}\), so that \(\upbeta \)-electrons emitted at large pitch angles in the source are reflected magnetically before reaching the detector. The magnetic reflection occurs at the pinch magnet (with \(B = B_\text {max}\) and zero potential), and in the source the electric potential is zero. The maximum pitch angle of the transmitted electrons is therefore independent of the electron energy and given by:

$$\begin{aligned} \theta _\text {max} = \text {arcsin}\left( \sqrt{\frac{B_\text {S}}{B_\text {max}}}\right) \; , \end{aligned}$$
(21)

For the standard operating parameters of KATRIN (see Table 2), \(\theta _\text {max}\) evaluates to about \(50.8^{\circ }\). This reflection is desired by design, since \(\upbeta \)-electrons emitted with larger pitch angles have to traverse a longer effective column of source gas and are therefore more likely to scatter and undergo energy loss, as detailed in the following sections.

With this additional magnetic reflection after the analyzing plane, the transmission function is given by:

$$\begin{aligned} T(E,U)&= \int \limits _{\theta =0}^{\theta _\text {max}} \; {\mathcal {T}}(E,\theta ,U) \cdot \sin \theta \, \,\mathrm {d}\theta \nonumber \\&= \left\{ \begin{array}{ll} 0 &{}\quad \;\;{\mathcal {E}}<0 \\ 1 - \sqrt{1-\frac{{\mathcal {E}}}{E} \frac{B_\text {S}}{B_\text {A}} \frac{2}{\gamma +1}} &{}\quad 0\le {\mathcal {E}}\le \varDelta E \\ 1 - \sqrt{1-\frac{B_\text {S}}{B_\text {max}}} &{}\quad \;\;{\mathcal {E}}>\varDelta E \end{array} \right. \, , \end{aligned}$$
(22)

with the filter width \(\varDelta E\) from Eq. (13). In Fig. 7, the transmission function is shown for the nominal KATRIN operating parameters and for the case \(B_\text {S} = B_\text {max}\). The magnetic reflection imposes an upper limit on the pitch angle, which reduces the effective width of the transmission function. As indicated in Fig. 7, this improves the filter width of the spectrometer to 0.93 eV, compared with 1.55 eV for \(\theta _\mathrm {max} = 90 ^{\circ }\) without magnetic reflection.

Fig. 7
figure 7

Transmission function T at a retarding potential of \(U = 18{,}545\hbox { V}\) with nominal magnetic field configuration (\(\frac{B_\text {max}}{B_\text {A}} = 20{,}000\)). The transmission condition in Eq. (20) relates the surplus energy to the pitch angle \(\theta \), as shown at the top of the figure. The solid red line shows the cut-off caused by a magnetic reflection of all electrons with high pitch angle in the strongest magnetic field at reference conditions \(\frac{B_\text {max}}{B_\text {S}} = \frac{6.0}{3.6}\). The dashed blue line shows the transmission function without magnetic reflection

4.2 Response function and energy loss

In the next step we consider the energy loss when the electron traverses the gaseous source. The dominant energy loss process is the scattering of electrons on gas molecules within the source. Because the pressure decreases rapidly outside the source, scattering processes in the transport section or thereafter are of no concern.

Two ingredients are required to appropriately treat electron scattering in the source. First, the energy loss function \({\tilde{f}}(\epsilon , \delta \vartheta )\) describes the probability for a certain energy loss \(\epsilon \) and scattering angle \(\delta \vartheta \) of the \(\upbeta \)-electrons to occur in a scattering process. Because the scattering angles \(\delta \vartheta \) are small,Footnote 5 we will neglect them in the following formulae and describe the scattering energy losses by the function \(f(\epsilon )\). Here we do not consider a dependence of f or \(P_s\) on the incident kinetic energy E of the electrons, since for the KATRIN experiment the energy range of interest amounts to a very narrow interval of a few times 10 eV below the tritium endpoint only, where these functions can be considered as independent of E. The other important ingredients are the scattering probability functions \(P_s(\theta )\) for an electron with pitch angle \(\theta \) to scatter s times before leaving the source. These scattering probabilities depend on \(\theta \), since electrons with a larger pitch angle must traverse a longer path, meaning a larger effective column density, and are thus likely to scatter more often.

With these considerations, the response function no longer comprises only the transmission function, but is modified as follows:

$$\begin{aligned} R(E, U)&= \int \limits _{\epsilon = 0}^{E-qU} \; \int \limits _{\theta =0}^{\theta _\text {max}} \; {\mathcal {T}}(E-\epsilon ,\theta ,U) \cdot \sin \theta \nonumber \\&\qquad \cdot \biggl [ \; P_0(\theta ) \, \delta (\epsilon ) \, + \, P_1(\theta ) \; f(\epsilon ) \; \biggr . \nonumber \\&\qquad \biggl . \; + \, P_2(\theta ) \; (f \otimes f)(\epsilon ) \, + \, \cdots \; \biggr ] \, \,\mathrm {d}\theta \, \,\mathrm {d}\epsilon \end{aligned}$$
(23)
$$\begin{aligned}&= \int \limits _{\epsilon = 0}^{E-qU} \; \int \limits _{\theta =0}^{\theta _\text {max}} \; {\mathcal {T}}(E-\epsilon ,\theta ,U) \cdot \sin \theta \nonumber \\&\qquad \cdot \, \sum _{s} \; P_s(\theta ) \; f_s(\epsilon ) \, \,\mathrm {d}\theta \, \,\mathrm {d}\epsilon . \end{aligned}$$
(24)

Electrons leaving the source without scattering \((s = 0)\) do not lose any energy, hence \(f_0(\epsilon ) = \delta (\epsilon )\). For s-fold scattering, \(f_s(\epsilon )\) is obtained by convolving the energy loss function \(f(\epsilon )\) s times with itself.

The scattering cross section can be divided into an elastic and an inelastic component. The inelastic cross section and the energy loss function for electrons with kinetic energies of \(\approx 18.6\hbox { keV}\) scattering from tritium molecules have both been measured in [40, 41]. In this work, the inelastic scattering cross section was determined to be \(\sigma _\text {inel} = (3.40 \pm 0.07) \times 10^{-18}\,\hbox {cm}^2\) and an empirical model was fit to the energy loss spectrum.

The latter is parameterized by a low-energy Gaussian and a high-energy Lorentzian part:

$$\begin{aligned} f(\epsilon ) = \left\{ \begin{array}{ll} A_1 \cdot \, \exp \left( -2 \, \left( \dfrac{\epsilon - \epsilon _1}{\omega _1}\right) ^2 \right) &{}\;\;\epsilon < \epsilon _c \\ A_2 \cdot \, \dfrac{\omega _2^2}{\omega _2^2 + 4 (\epsilon - \epsilon _2)^2} &{}\;\;\epsilon \ge \epsilon _c \\ \end{array} \right. \, , \end{aligned}$$
(25)

with \(A_1 = (0.204 \pm 0.001)\,\text {eV}^{-1}\), \(A_2 = (0.0556 \pm 0.0003)\,\text {eV}^{-1}\), \(\omega _1 = (1.85 \pm 0.02)\,\text {eV}\), \(\omega _2 = (12.5 \pm 0.1)\,\text {eV}\), \(\epsilon _2 = (14.30 \pm 0.02)\,\text {eV}\) and a fixed \(\epsilon _1 = 12.6\,\text {eV}\). To obtain a continuous transition between the two parts of \(f(\epsilon )\), a value \(\epsilon _c = 14.09\,\text {eV}\) was chosen. The Gaussian part summarizes the energy loss due to (discrete) excitation processes, while the Lorentzian part describes the energy loss due to ionization of tritium molecules.

This parameterization of the energy loss function is used for the response model presented in this paper. However, the parameters are not precise enough for KATRIN to meet its physics goals. Dedicated electron gun measurements with the full experimental KATRIN setup have been planned for the determination of the inelastic scattering cross section and the energy loss function with higher precision; the analysis of these data will involve a sophisticated deconvolution technique [42].

At \(\sigma _\mathrm {el} = 0.29 \times 10^{-18}\,\mathrm {cm}^{2}\), the total cross section of elastic scattering of 18.6 keV electrons with molecular hydrogen isotopologues is smaller than that for inelastic scattering by an order of magnitude [43, 44]. In addition, the elastically scattered electrons are strongly forward peaked with a median scattering angle of \({{\overline{\theta }}}_\text {scat} = 2.1 ^{\circ }\) near the tritium endpoint energy. The energy loss due to elastic scattering is given by the relation

$$\begin{aligned} \varDelta E_\text {scat} = 2 \; \frac{ {m_\text {e}} }{M_\mathrm {T_2}} \; E \cdot \left( 1-\cos \theta _\text {scat} \right) . \end{aligned}$$
(26)

With an angular distribution for elastic scattering of molecular hydrogen by electron impact based on [45], the corresponding median energy loss amounts to \(\overline{\varDelta E} = 2.3\hbox { meV}\). The energy loss function, containing the elastic and inelastic components weighted by their individual cross section, is shown in Fig. 8.

Fig. 8
figure 8

Theoretical energy loss function for elastic and inelastic scattering processes, shown as a probability density function. The leftmost enlarged region (\(\epsilon \lesssim 0.01\hbox { eV}\)) is dominated by elastic scattering, and the region at higher energy is due to inelastic excitation and ionization, as parameterized by Aseev et al. [40]

The elastic energy loss component can be accurately calculated. Due to its narrow width and steep slope, \(\sim \hbox {meV}\) binning is required for incorporating it accurately in the response function, thereby increasing computational cost considerably. We will neglect the elastic scattering component in neutrino mass measurements as the associated systematic error on an \( {m_{\upnu }^2} \) is minute (\(\sim 5.10^{-5}\hbox {eV}^2\), see Table 1).

4.3 Radial inhomogeneity of the electromagnetic field

To calculate the transmission and response functions of the KATRIN setup as explained in Sects. 4.1 and 4.2, it is in principle sufficient to only consider the axial position of an electron to identify the initial conditions such as electromagnetic fields or scattering probabilities. In the case of the main spectrometer, radial dependencies must be incorporated in the description of the magnetic field and the electrostatic potential in the analyzing plane. Additional radial dependencies in the source are discussed in Sect. 4.4; these are then incorporated into the model together with the spectrometer effects.

In order to achieve a MAC-E filter width in the eV-regime, a reduction of the magnetic field strength in the analyzing plane on the order of \(\frac{B_\text {A}}{B_\text {max}} \approx \frac{\varDelta E}{E} \approx 10^{-4}\) is required (see Eq. (13)). Consequently the diameter of the flux-tube area A is drastically increased due to the conservation of magnetic flux \(\varPhi = \mathrm {const} \approx B \cdot A\). When nominal field settings are applied (see Table 2), the projection of the detector surface with radius \(r_\mathrm {det} = 4.5\hbox { cm}\) has a radius of about 4 m in the analyzing plane. A larger (smaller) magnetic field in the analyzing plane \(B_\mathrm {A}\) shifts the transmission edge to a larger (lower) energy, see Eq. (20). This effect is even more pronounced for larger electron pitch angles. Consequently, the transmission function (see Eq. (22)) is also widened or narrowed. Utilizing a set of magnetic field compensation coils, operated with an optimal current distribution, around the spectrometer vessel, the spread of the radial inhomogeneity of the magnetic field is minimized to a few \({\upmu \hbox { T}}\) when an optimized current distribution is applied [37, 38]. The resulting variation in the filter width in the analyzing plane due to the magnetic field inhomogeneity is thus reduced to about 10 meV [46].

Table 1 Impact of individual theoretical and experimental model corrections on the measured squared neutrino mass \( {m_{\upnu }^2} \), if neglected or approximated. The analysis energy window is restricted to \([E_0 - 30\hbox { eV}; E_0 + 5\hbox { eV}]\). For \( {m_{\upnu }} \) a true value of 200 meV is assumed

In the case of the electrostatic potential, unavoidable radial variation arises from the design of the spectrometer. To fulfill the transmission condition in Eq.  (19), the electrode segments at the entrance and exit are operated on a more positive potential than in the central region close to the analyzing plane.Footnote 6 Depending on the final potential setting, the radial potential variation in the analyzing plane is expected to be of order 1 V [39]. In comparison, azimuthal variations are negligible. It is possible to considerably reduce the radial potential inhomogeneity by operating the MAC-E filter at larger \(B_\mathrm {A}\). However, this would require better knowledge of the magnetic field in the analyzing plane [46] and also increase the filter width.

Table 2 Key operational and derived parameters of KATRIN as defined in the technical design report [4]
Fig. 9
figure 9

The calculated radial inhomogeneity of the electrostatic potential and the magnetic field in the analyzing plane of the main spectrometer, for the standard setting of \(U = -18{,}600\hbox { V}\) and \(B_\text {A} = 0.3\hbox { mT}\). The plot shows the offset in the potential and the magnetic field values in the spectrometer center. The vertical dashed lines mark the corresponding outer radii of annuli mapped to the 13 detector rings

Even with these optimizations of the setup, the small radial variations in the electromagnetic fields at the analyzing plane, as shown in Fig. 9, cannot be neglected. The segmentation of the KATRIN main detector into annuli of pixels allows us to incorporate such radial variations in the response function model for each individual detector pixel. Because the tritium source also features radial variations of certain parameters, this segmentation is combined with a full segmentation of the source volume as described in Sect. 4.4. Dependencies of the electromagnetic field are typically averaged over the surface area of a pixel. The specific detector geometry with thinner annuli towards outer radii (each with equal surface area) helps minimize the potential variation within individual annuli, despite the increasing steepness of the potential.

4.4 Source volume segmentation and effects

In addition to radial dependencies of the analyzing plane parameters that govern the energy analysis of the \(\upbeta \)-electrons (Sect. 4.3), the tritium source also features radial and axial dependencies of its parameters. In the following, we will briefly outline the most relevant source parameters that are required to accurately model the differential \(\upbeta \)  spectrum and the response function. These parameters include the beam tube temperature \(T_\text {bt}\), the magnetic field strength \(B_\text {S}\), plasma potentials \(U_\text {P}\), the particle density \(\rho \) and the bulk velocity u of the gas, all of which may vary slightly in longitudinal, radial and azimuthal directions. The complex gas dynamic simulations, which are needed to calculate these local source parameters, are described in comprehensive detail in [47, 48].

In order to model accurately these effects for each individual detector pixel, the simulation source model is partitioned to match the detector geometry. It is partitioned longitudinally into \(N_L\) slices and segmented radially into \(N_R\) annuli (rings) of \(N_S\) segments each, resulting in a total of \(N_L \cdot N_R \cdot N_S\) segments (see Fig. 10). The geometry of these segments is chosen in such a way, that a longitudinal stack of segments is magnetically projectedFootnote 7 onto a corresponding detector pixel. Note that all detector pixels have identical surface area, which leads to broader annuli at the center and thinner annuli towards larger radii. In the following, we index the longitudinal slices by the subscript i and radial/azimuthal segments with their corresponding detector pixel by the subscript j.

At a retarding potential U, the detection rate for a specific detector pixel j can then be stated as

$$\begin{aligned} \dot{N}_j(U) = \frac{1}{2} \, \sum _{i=0}^{N_L-1} \; N_{\text {T},i} \, \int \limits _{qU}^{E_0} \; \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}(E_0, m_\upnu ^2) \; R_{i,j}(E, U) \, \,\mathrm {d}E, \end{aligned}$$
(27)

where \(N_{\text {T},i}\) is the number of tritium nuclei (assuming that the gas density has no radial or azimuthal dependence). The response function \(R_{i,j}(E, U)\) depends on the index i (i.e. the axial position) and the index j (i.e. the radial/azimuthal position) of the source segment. With the indices ij we can describe the dependence on local source parameters such as the magnetic field. The most significant effect on the response is caused by the scattering probabilities, as detailed in Sect. 4.2. The index j further describes non-uniformities of the retarding potential U and the magnetic field \(B_\text {A}\) in the spectrometer (see Fig. 9).

Fig. 10
figure 10

In the numerical model, the source is partitioned in such a way that each radial/azimuthal segment (index j) in the source, consisting of stacked longitudinal slices (index i), corresponds to the part of the magnetic flux tube seen by the matching detector pixel (index j). (Diagram not drawn to scale)

4.5 Scattering probabilities

As discussed in Sect. 4.2, inelastic scattering results in an energy loss that directly affects the energy analysis of the signal electrons, and needs to be incorporated accurately into the analytical description. Changes to the angular distribution of the emitted electrons due to scattering processes, which also modify the response function, are discussed in Sect. 4.6.

The scattering probability for \(\upbeta \)-electrons is considerably different depending on their starting position in the 10 m long source beam tube, as visualized in Fig. 11. The longitudinal segmentation of the source volume in our model allows us to incorporate this behavior. The probability \(P_s\) for an electron to leave the source after scattering exactly s times depends on the total cross section \(\sigma \) and the effective column density \(\mathcal {N}_\text {eff}\) that the electron traverses. This effective column density depends not only on the electron’s starting position z inside the source and the axial density distribution \(\rho (z)\), but also on the starting pitch angle \(\theta \) in the source (Eq. (21)):

$$\begin{aligned} \mathcal {N}_\text {eff}(z,\theta ) = \frac{1}{\cos (\theta )} \cdot \int \limits _{z}^{L/2} \; \rho (z') \, \,\mathrm {d}z'. \end{aligned}$$
(28)

L denotes the length of the source beam tube with \(-L/2 \le z \le L/2\). The nominal column density is then given by \(\mathcal {N}= \mathcal {N}_\text {eff}(z=-L/2,\ \theta =0)\).

Because of the low probability to scatter off a single tritium molecule, the number of scatterings during propagation can be calculated according to a Poisson distribution:

$$\begin{aligned} P_s(z, \theta ) = \frac{(\, \mathcal {N}_\text {eff}(z,\theta ) \cdot \sigma \,)^s}{s!} \, \cdot \, \exp ( -\mathcal {N}_\text {eff}(z,\theta )\cdot \sigma ). \end{aligned}$$
(29)

The mean scattering probabilities for a specific position z can be calculated using the isotropic angular distribution \(\omega (\theta ) = \sin {\theta }\) and the maximum pitch angle \(\theta _\text {max}\):

$$\begin{aligned} P_s(z) = \frac{1}{1-\cos (\theta _\mathrm {max})} \; \int \limits _{\theta =0}^{\theta _\mathrm {max}} \; \sin (\theta ) \; P_s(z,\theta ) \, \,\mathrm {d}\theta . \end{aligned}$$
(30)

This integration assumes that the angular distribution is not significantly affected by the small angular change in the discussed scattering processes. A higher total column density \(\mathcal {N}\), as well as a larger \(\theta _\text {max}\), would provide a larger number of \(\upbeta \)-electrons at the exit of the source and at the detector. However, they also raise the proportion of scattered over unscattered electrons, thereby increasing the systematic uncertainties due to energy loss, and at some point, limiting the \(\upbeta \)-electron detection rate close to the endpoint. The optimal design values of \(\mathcal {N}= 5.10^{17}\hbox { cm}^{-2}\) and \(\theta _\text {max} = 50.8 ^{\circ }\) [4] balance these effects.

4.6 Response function for non-scattered electrons

The transmission function in Eq. (22) describes the transmission probability of isotropically emitted electrons. Even if we consider only non-scattered electrons, the \(\upbeta \)-electrons do not follow an isotropic angular distribution before entering the spectrometer due to the pitch angle dependence of the s-fold scattering probabilities \(P_s(z,\theta )\) in the source (see Sect. 4.5).

Fig. 11
figure 11

The response function R(EqU) at a retarding energy of \(qU = 18{,}545\hbox { eV}\). The dash-dotted and dashed curves show the response function close to the front (spectrometer-facing, \(z=+\,4\hbox { m}\)) vs. rear (\(z=-\,4\hbox { m}\)) of the WGTS, which has a length of 10 m in total. An averaged version, weighted by the gas density in each source segment, is shown as the solid curve

Following the description in [39], the zero-scattering (\(s=0\)) transmission function needs to be modified to the form:

$$\begin{aligned} T^\star _{s=0}(E, U)&= \biggl . R(E,U) \biggr |_{\; {\mathcal {E}}\, < \, 10\hbox { eV}} \nonumber \\&= \int \limits _{\theta =0}^{\theta _\text {max}} \; {\mathcal {T}}(E,\theta ,U) \cdot \sin \theta \; P_0(\theta ) \, \,\mathrm {d}\theta . \end{aligned}$$
(31)

The zero-scattering probability \(P_0(\theta )\) is computed by averaging \(P_0(z,\theta )\) over z. Figure 12 illustrates the resulting difference in the response function. The surplus energy range \({\mathcal {E}}< 10\hbox { eV}\) corresponds to the steep increase in the response function at low energies as shown in Fig. 11, where energy loss from inelastic scattering does not contribute.

Fig. 12
figure 12

The transmission edge of the response function. The dashed curve is calculated with an isotropic angular distribution, and the solid curve with a realistic angular distribution for unscattered electrons

4.7 Doppler effect

The thermal translational motion and the bulk gas flow of the \(\upbeta \)-emitting tritium molecules in the WGTS lead to a Doppler broadening of the electron energy spectrum, which further modifies the response function model that was derived in Sect. 4.2 and thereafter. These two effects can be expressed as a convolution of the differential spectrum \(\frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}\) with a broadening kernel g, denoted by the subscript \(\text {D}\):

$$\begin{aligned} \left( \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}\right) _\text {D}&= \left( g \otimes \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E} \right) (E_\text {lab}) \end{aligned}$$
(32)
$$\begin{aligned}&= \int \limits _{-\infty }^{+\infty } \; g(E_\text {cms}, E_\text {lab}) \; \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}(E_\text {cms}) \, \,\mathrm {d}E_\text {cms} \, , \end{aligned}$$
(33)

with \(E_\text {cms}\) being the electron kinetic energy in the \(\upbeta \)-emitter’s rest frame (which is approximately the center-of-mass system), and \(E_\text {lab}\) the electron energy in the laboratory frame.

The magnitude of the thermal tritium gas velocity follows a Maxwell-Boltzmann distribution. However, considering only the velocity component \(v_\text {M}\) that is parallel to the electron emission direction, the thermal velocity distribution of the tritium isotopologue mass M is described by a Gaussian

$$\begin{aligned} g(v_\text {M}) = \frac{1}{\sqrt{2\pi }\sigma _v} \cdot \text {e}^{ -\frac{1}{2} \left( \frac{v_\text {M}}{\sigma _v}\right) ^2 }, \end{aligned}$$
(34)

which centers around \(v_\text {M}= 0\) with a standard deviation \(\sigma _v = \sqrt{k_\text {B}T_\text {bt} / M}\). For the component of the bulk gas velocity u that is parallel to the electron emission direction with pitch angle \(\theta \), the mean \(v_\text {M}\) is shifted by \(\cos \theta \cdot u\). Integrating over all emission directions up to \(\theta _\text {max}\), the expression expands to

$$\begin{aligned} g(v_\text {M})&= \frac{1}{(1-\cos \theta _\text {max})} \nonumber \\&\quad \cdot \! \int \limits _{\cos \theta _\text {max}}^1 \! \frac{1}{\sqrt{2\pi }\sigma _v} \cdot \text {e}^{ -\frac{1}{2} \left( \frac{v_\text {M}- \cos \theta \cdot u}{\sigma _v}\right) ^2 } \, \,\mathrm {d}\cos \theta . \end{aligned}$$
(35)

Using the Gaussian error function this expression can be rewritten as

$$\begin{aligned} g(v_\text {M})&= \frac{1}{(1-\cos \theta _\text {max})\cdot 2u} \nonumber \\&\quad \cdot \text {erf}\left( \frac{v_\text {M}- \cos \theta _\text {max} \cdot u}{\sqrt{2}\,\sigma _v}, \frac{v_\text {M}- u}{\sqrt{2}\,\sigma _v}\right) \; . \end{aligned}$$
(36)

Finally, the tritium gas velocity distribution \(g(v_\text {M})\) can be translated into an electron energy distribution \(g(E_\text {cms}, E_\text {lab})\). Using the Lorentz factors and the electron velocities defined in the CMS and lab frames, we can write

$$\begin{aligned} g(E_\text {cms}, E_\text {lab}) = \frac{g(v_\text {M})}{\gamma _\text {cms}\, {m_\text {e}} \, v_{\text {e},\text {cms}}} \end{aligned}$$
(37)

with

$$\begin{aligned} v_\text {M}\approx \frac{v_{\text {e},\text {lab}}- v_{\text {e},\text {cms}}}{1 - v_{\text {e},\text {lab}}\cdot v_{\text {e},\text {cms}}/ c^2} \; . \end{aligned}$$

The standard deviation of this convolution kernel evaluates to

$$\begin{aligned} \sigma _E&= \sigma _v \; \gamma _\text {cms}\; {m_\text {e}} \; v_{\text {e},\text {cms}}\nonumber \\&= \sqrt{(E_\text {cms}+ 2 {m_\text {e}} ) \, E_\text {cms}\cdot k_\text {B}T_\text {bt} / M}. \end{aligned}$$
(38)
Fig. 13
figure 13

Convolution kernels describing the Doppler broadening of the \(\upbeta \)  spectrum due to the thermal motion and bulk velocity u of the source gas. A temperature of \(T_\text {bt} = 30\hbox { K}\) is assumed, leading to a Gaussian broadening with \(\sigma _E \approx 94\hbox { meV}\) at \(E_\text {cms}= 18{,}575\hbox { eV}\)

With \(\sigma _v \approx 203\hbox { m/s}\) for \(\mathrm {T_2}\) molecules at \(T_\text {bt}=30\hbox { K}\) and the weighted mean bulk velocity at nominal source conditions being \({\bar{u}} \approx 13\hbox { m/s}\), thermal Doppler broadening clearly is a dominating effect. The standard deviation of the broadening function \(g(E_\text {cms}, E_\text {lab})\) at a fixed bulk velocity \(u = 0\) for \(T_\text {bt} = 30\hbox { K}\) and \(E \approx E_0\) evaluates to \(\sigma _E \approx 94\hbox { meV}\) (also see Fig. 13). This value can be interpreted as a significant smearing of the energy scale. Its implication for the neutrino mass measurement is shown in Table 1.

4.8 Cyclotron radiation

As electrons move from the source to the spectrometer section in KATRIN, they lose energy through cyclotron radiation. In contrast to energy loss due to scattering with tritium gas (Sect. 4.5), this energy loss process applies to the entire trajectory of an electron as it traverses the experimental beamline [49].

For a particle with kinetic energy E spending a time \(\varDelta t\) in a fixed magnetic field B, the cyclotron energy loss is (in SI units):

$$\begin{aligned} \varDelta E_\perp ^\text {cycl} = -\frac{q^4}{3 \pi c^3 \varepsilon _0 {m_\text {e}} ^3} \cdot B^2 \cdot E_\perp \; \frac{\gamma +1}{2} \cdot \varDelta t. \end{aligned}$$
(39)

In general, cyclotron radiation reduces the transverse momentum component of the particle.Footnote 8 Consequently, the losses are maximal for large pitch angles and vanish completely at \(\theta = 0 ^{\circ }\).

For complex geometric and magnetic field configurations as in the KATRIN experiment, the overall cyclotron energy loss can be computed using a particle tracking simulation framework such as Kassiopeia [50]. By this means, the cyclotron energy loss from the source to the analyzing point in the main spectrometer can be obtained as a function of the electron’s starting position z and pitch angle \(\theta \). Particles starting in the rear of the source will lose more energy due to their longer path through the whole setup. The total cyclotron energy loss can be up to 85 meV for electrons with the maximum pitch angle \(\theta _\text {max} = 50.8 ^{\circ }\).

Because the resulting decrease in the angle \(\varDelta \theta \) due to the loss of transverse momentum is of order \(10^{-6}\) or less, it can be neglected. We thus consider the loss of cyclotron energy \(\varDelta E^\text {cycl}(\theta ,z)\) to be a decrease in the total electron kinetic energy E. Essentially, this effect causes a shift of the electron transmission condition (see Eq. (20))

$$\begin{aligned} {\mathcal {T}}^\text {cycl}_i(E,\theta ,U) = {\mathcal {T}}(E - \varDelta E^\text {cycl}(\theta ,z),\theta ,U) \end{aligned}$$
(40)

with the index i denoting the longitudinal slice where the electron starts from the source position z (see Fig. 10).

The influence of the cyclotron energy loss on the averaged response function is shown in Fig. 14.

Fig. 14
figure 14

The impact of energy losses due to cyclotron radiation on the shape of the response function near the transmission edge

4.9 Expected integrated spectrum signal rate

Earlier in this section we have laid out the different contributions to the response function of the experiment, which describes the probability for \(\upbeta \)-electrons to arrive at the detector where they contribute to the measured integrated spectrum. The response function describes the energy analysis at the spectrometer (Sects. 4.1 and 4.3), energy loss caused by scattering in the tritium source (Sects. 4.2 and 4.5), and additional corrections (Sect. 4.6 and following).

Combining the response function with the description of the differential spectrum that was developed in Sect. 2, the integrated spectrum signal rate observed on a single detector pixel j for a retarding potential setting U can finally be expressed as

$$\begin{aligned} {\dot{N}}_j^\text {sig}(U)&= \frac{1}{2} \, \epsilon _{\text {det},j} \cdot \sum _{i=0}^{N_L-1} \; N_{\text {T,}i} \nonumber \\&\quad \cdot \, \int \limits _{qU}^{\infty } \left( \frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}\right) _\text {C,D}(m_\upnu ^2, E_0) \cdot R_{i,j}(E, U) \, \,\mathrm {d}E. \nonumber \\ \end{aligned}$$
(41)

This expression incorporates all theoretical corrections (see Eq. (12) with subscript C) and the Doppler broadening (see Eq. (33) with subscript D) of the differential spectrum \(\frac{\,\mathrm {d}\varGamma }{\,\mathrm {d}E}\) (see Eq. (10)), and the full response function which incorporates the energy loss as a result of source scattering and cyclotron radiation:

$$\begin{aligned} R_{i,j}(E, U)&= \int \limits _{\epsilon =0}^{{\mathcal {E}}} \, \int \limits _{\theta =0}^{\theta _\mathrm {max}} \, \sum _{s} \; {\mathcal {T}}_{s,i,j}^\text {cycl}(E - \epsilon , \theta , U) \nonumber \\&\quad \cdot \, P_{s,i}(\theta ) \; f_s(\epsilon ) \, \,\mathrm {d}\epsilon \, \sin \theta \,\mathrm {d}\theta . \end{aligned}$$
(42)

The response function depends on the path traversed by the \(\upbeta \)-electron between its origin in source segment (ij) and the target detector pixel j (see Fig. 10 for the segmentation schema). The detection efficiency \(\epsilon _{\text {det},j}\) is an energy-dependent quantity, which needs to be measured for each pixel j. Its value is between \(\approx 90 \%\) and \(95 \%\) [36].

To first order (due to nearly constant magnetic field and tritium concentration in the source), the integrated signal rate in Eq. (41) depends on \(\mathcal {N}\sigma \) – which can be accurately determined by calibration measurements with a photoelectron source – but is independent of the longitudinal gas density profile \(\rho (z)\) which cannot be measured directly (see [47, 48] for simulation results).

4.10 Scan of the integrated spectrum

A scan of the integrated \(\upbeta \)  spectrum comprises a set of detector pixel event counts \(N_j(U_k)\), observed at various retarding potential settings \(U_k\) for the duration of \(\varDelta t_k\) each, with \(k \in \{1 \ldots n_k\}\). In the following, the indices j and k are condensed by writing \(N_{jk} = N_j(U_k)\), with \(N_{jk}\) denoting the event count on a single detector pixel j for a specific retarding potential setting k.

The observed event count \(N^\text {obs}_{jk}\) is a Poisson-distributed quantity with the expectation value given by

$$\begin{aligned} \text {E}[N^\text {obs}_{jk}] = \varDelta t_k \cdot \left( \, \dot{N}_j^\text {sig}(U_k) + \dot{N}_j^\text {bg} \, \right) , \end{aligned}$$
(43)

where \(\dot{N}^\text {bg}_j\) is an energy-independent background rate component (possibly with a radial dependency indicated by the index j).

KATRIN will be operated for a duration of 5 calendar years in order to collect 3 live years of spectrum data over multiple runs.

4.11 Energy uncertainties

At the end of this section we will briefly discuss the influence of energy uncertainties on the neutrino mass measurement. In general, any fluctuation with variance \(\sigma ^2\) induces a spectrum shape deformation which – if not considered in the analysis – is indistinguishable to first order from a shift of the measured value of \( {m_{\upnu }^2} \) in the negative direction with \(\varDelta {m_{\upnu }^2} = -2 \sigma ^2\) [11]. This shift of \(\varDelta {m_{\upnu }^2} \) also holds if an accounted fluctuation or distribution of true variance \(\sigma ^2_\mathrm {true}\) is described wrongly in the analysis by the variance \(\sigma ^2_\mathrm {ana} = \sigma ^2_\mathrm {true} - \sigma ^2\).

Different sources of fluctuations and distributions with uncertainties can be distinguished. One group comprises \(\upbeta \)-decay and source physics, such as molecular final states, scattering processes and the Doppler effect (all discussed in this work). Others are experimental systematics originating in the energy measurement, which have to be studied during commissioning of the setup and then incorporated into the model. An example is the distortion of the spectrometer transmission function due to retarding-voltage fluctuations [51, 52].

4.12 Impact of theoretical and experimental corrections

In Table 1 we review and quantify the impact of theoretical corrections to the differential \(\upbeta \)-spectrum, discussed in Sect. 2, and of experimental corrections which have been introduced above. Many individual model components can be safely neglected, while others need to be considered more accurately, such as the radial dependence of retarding potentials (Sect. 4.3), energy loss due to cyclotron radiation (Sect. 4.8) or the Doppler effect (Sect. 4.7).

5 Measurement of the neutrino mass

Having compiled a complete description of the theoretical \(\upbeta \)-decay spectrum and the response function of KATRIN into a parameterizable model, we will now outline the statistical terms and methods required for actual neutrino mass measurements. In the next (Sect. 5.1 and 5.2) we review the process of parameter inference (model fitting) and the construction of confidence intervals in the case of a KATRIN neutrino mass analysis, and we explain the relation between observed data, fit parameters and their uncertainties. After introducing Frequentist methods of inferring \( {m_{\upnu }^2} \) we give an example of a Bayesian approach in Sect. 5.3. We briefly list statistical and systematic uncertainty contributors for KATRIN in Sect. 5.4 and in that context discuss the relevance of the choice of the energy analysis interval in Sect. 5.5 and the distribution of accounted measuring time among that interval in Sect. 5.6. In Sect. 5.7 we give an explanation of negative \( {m_{\upnu }^2} \) estimates and provide a non-physical extension of the \(\upbeta \)-decay spectrum model.

5.1 Parameter inference

The statistical technique for analyzing \(\upbeta \)-decay spectrum data is well established. By comparing the observed number of counts \(N^\text {obs}_{jk}\) on each pixel j for each experimental setting k with the prediction from the spectrum and response model \(N_{jk}(U_k, {m_{\upnu }^2} , E_0, \dots )\) (see Eqs. (41) and (43)), \( {m_{\upnu }^2} \) and other unknown model parameters can be inferred. In the case of a KATRIN-like neutrino mass measurement, a continuous model that depends on \( {m_{\upnu }^2} \) is fit to unbinned spectral shape data. The method of least squares is most commonly applied.

The probability to have an observed outcome \(\varvec{N}^\text {obs} = \left( N^\text {obs}_{1,1} \ldots N^\text {obs}_{n_j,n_k}\right) \), given the predicted number of counts \(\varvec{N}^\text {pre}(\varvec{\theta })\) defined by a set of model parameters \(\varvec{\theta } = (m_\upnu ^2, E_0, \dots )\), is the likelihood function

$$\begin{aligned} L(\varvec{\theta } | \varvec{N}^\text {obs}) = \prod _{jk} \; \text {Poisson}\left( N^\text {obs}_{jk} | N^\text {pre}_{jk}(\varvec{\theta }) \right) . \end{aligned}$$
(44)

A set of parameter point estimates \({{\hat{\varvec{\theta }}}}\) is obtained by maximizing the likelihood L. Equivalently, a minimization of the negative log-likelihood \(- \ln L\) can be performed, which is often more practical numerically.

If the number of observed events \(N_{jk}^\text {obs}\) is large enough (\(\gtrsim 25\)), so that the Poisson distribution can be approximated by a Gaussian, that expression is approximately a \(\chi ^2\) function:

$$\begin{aligned} - 2 \ln L \approx \chi ^2 = \sum _{jk} \left( \frac{ N^\text {obs}_{jk} - N^\text {pre}_{jk}(\varvec{\theta }) }{\sigma _{jk}}\right) ^2. \end{aligned}$$
(45)

In case of \(\sigma _{jk} = \sqrt{N_{jk}^\text {pre}}\), the above \(\chi ^2\) equals the Pearson’s chi-square statistic [53].

Our parameter of interest is \( {m_{\upnu }^2} \), which distorts the spectrum shape close to the endpoint. Because the fitted \(\upbeta \)-spectrum shape essentially only depends on \( {m_{\upnu }^2} \), with \(\chi ^2\) being approximately parabolic in \( {m_{\upnu }^2} \), it is the preferred fit parameter over \( {m_{\upnu }} \) [54].

Other model parameters are nuisance parameters. In KATRIN-like experiments typically three such quantities are treated as free fit parameters:

  • The tritium endpoint energy \(E_0\), the maximum electron energy assuming a vanishing neutrino mass, has to be estimated from the data, due to uncertainties in the measured \(\mathrm {T^+}\)/\(\mathrm {^3He^+}\) mass difference [55] and in the experimental energy scale.

  • The signal amplitude \(A_\text {sig}\), a multiplicative factor close to 1, is applied to the predicted signal rateFootnote 9 \(\dot{N}_j^\text {sig}\) to correct for any energy-independent model uncertainty. \(E_0\) and \(A_\text {sig}\) are estimated from the slope of the spectrum at lower energies of the analysis interval (\(\approx 30 - 40 \hbox { eV}\) below the endpoint), where the absolute signal rate is highest.

  • The background rate amplitude \(A_\text {bg}\) is another normalization factor, which is applied to the background model component \(\dot{N}_j^\text {bg}\). It is estimated using the data from retarding potentials above the tritium endpoint, where no signal is expected. Note that we assume a constant background rate without retarding potential dependence in the energy interval near the tritium endpoint. However, such an energy dependence could be incorporated into the model using additional data above the endpoint.

Considering only the aforementioned four model parameters, the predicted number of electrons on a detector pixel j for a retarding potential setting k in a counting period \(\varDelta t_k\) is given by

$$\begin{aligned}&N^\text {pre}_{jk}( {m_{\upnu }^2} , E_0, A_\text {sig}, A_\text {bg}) \nonumber \\&\qquad =\varDelta t_k \cdot \left( \, A_\text {sig} \cdot \dot{N}_j^\text {sig}(U_k, m_\upnu ^2, E_0) \, + \, A_\text {bg} \cdot \dot{N}_j^\text {bg} \, \right) . \end{aligned}$$
(46)

A point estimate for this set of parameters, obtained from maximizing the likelihood (or minimizing \(\chi ^2\)) is denoted in the following as \(( \widehat{m_{\upnu }^2} , \widehat{E_0}, \widehat{A_\text {sig}}, \widehat{A_\text {bg}})\).

Depending on the method of treating systematic uncertainties, the number of free (or constrained) model parameters can be higher.

Fig. 15
figure 15

Frequentist confidence belt (\(95 \%\,\hbox {C.L.}\)) constructed according to the unified approach by Feldman and Cousins [57]. In this example, the horizontal ranges (green dashed lines) are constructed by choosing \(95 \%\) of the \( {m_{\upnu }^2} \) estimates from an ensemble test with fixed true \( {m_{\upnu }^2} \), following the ordering principle. These horizontal ranges define the edges of the confidence belt (blue solid lines). The subsequent result of an actual neutrino mass measurement (x-axis, indicated by red dotted lines) is used to select the vertical intersections with the confidence belt to determine the reporting of an upper limit (e.g. in case of \( {m_{\upnu }^2} = 0\hbox { eV}^2\)) or a two-sided confidence interval (e.g. in case of \( {m_{\upnu }^2} = 0.07\hbox { eV}^2\))

5.2 Confidence intervals

Due to the stochastic nature of the observed data, a single parameter point estimate by itself cannot relate to the unknown true value of a parameter. In parameter inference, a confidence interval defines an interval of parameter values that contain the true value of the parameter to a certain proportion (confidence level), assuming an infinite number of independent experiments. Various methods of constructing such intervals exist.

Using the Neyman construction [56] (a Frequentist method), ensembles of pseudo-experiments are sampled for a range of true values of \( {m_{\upnu }^2} \), leading to the construction of a confidence belt (see Fig. 15). Incorporating an ordering principle proposed by Feldman and Cousins [57], empty confidence intervals for non-physical estimates of \( {m_{\upnu }^2} \) can be avoided, while ensuring correct Frequentist coverage.

When parameter point estimates are constructed following the maximum likelihood ordering principle, the profile likelihood ratio [58] can be used to estimate their uncertainties. With this method the \(1 \sigma \) uncertainty of a parameter estimate is identified by those parameter values where the likelihood has decreased to half its maximum value, while profiling (maximizing) with respect to any involved nuisance parameter. Equivalently, a chi-square curve can be scanned for parameter values with \(\varDelta \chi ^2 = 1\), again profiling over nuisance parameters.

Fig. 16
figure 16

Scatter plots for pair-wise parameter combinations (\( {m_{\upnu }^2} \), \(E_0\), \(A_\text {sig}\), \(A_\text {bg}\)) and their respective marginalized posterior distributions as the diagonal elements for 3 years of live measurement time. The solid contours indicate \(95 \%\,\hbox {C.L.}\) regions. Instead of randomized data, the likelihood sampled in this MCMC example was formulated based on a null hypothesis with fiducial input values \( {m_{\upnu }^2} = 0\hbox { eV}^2\), \(E_0 = 18{,}575\hbox { eV}\), \(A_\text {sig} = 1.0\), \(A_\text {bg}\cdot \dot{N}^\text {bg} = 10\hbox { mcps}\). Flat priors were used with \( {m_{\upnu }^2} \ge 0\hbox { eV}\)

5.3 Bayesian statistics

Bayesian inference is typically based on the posterior PDF (probability density function) of a parameter of interest. Using Bayes’ theorem, the posterior distribution \(p(\varvec{\theta })\) of a set of parameters \(\varvec{\theta }\) is given by the likelihood \(L(\varvec{\theta })\) and a prior probability \(\pi (\varvec{\theta })\):

$$\begin{aligned} p(\varvec{\theta }) \propto L(\varvec{\theta }) \cdot \pi (\varvec{\theta }). \end{aligned}$$
(47)

In contrast to Frequentist approaches, which make a statement about the repeatability of an experiment, Bayesian statistics inevitably introduce the concepts of probability, belief and credibility. The prior probability \(\pi (\varvec{\theta })\) has to be chosen by the analyst, based on prior belief. In the case of \( {m_{\upnu }^2} \), an objective option is the flat uniform prior (possibly zero for \( {m_{\upnu }^2} < 0\hbox { eV}^2\)), or a normalizable Gaussian distribution that reflects the results from previous measurements.

Fortunately, KATRIN’s \( {m_{\upnu }^2} \) posterior PDF is rather insensitive to the choice of prior on \( {m_{\upnu }^2} \). Assuming, for instance, a true value of \( {m_{\upnu }^2} = 0\hbox { eV}^2\), a Gaussian prior with mean \(\mu _\pi = 0\hbox { eV}^2\) and \(\sigma _\pi = 1\hbox { eV}^2\) (or a value on the order of the Mainz or Troitsk upper limits) will be outweighed by the KATRIN likelihood function. It will thus have no significant effect on the derived Bayesian upper limit compared to a prior that is flat in \( {m_{\upnu }^2} \). This underlines the improved sensitivity of the experiment.

The posterior distributions can be obtained practically with Markov-chain Monte Carlo (MCMC) methods [59]. With proper adjustments, this class of algorithms is capable of efficiently traversing high-dimensional parameter spaces and sampling from posterior probability distributions of an unknown quantity such as \( {m_{\upnu }^2} \). From these distributions, any choice of credibility interval \([\theta _1, \theta _2]\), with \(P = \int _{\theta _1}^{\theta _2} p(\varvec{\theta }) \,\mathrm {d}\theta \) being the confidence level, can be constructed.

When considering the distribution of only a subspace of all parameters, one speaks of a marginal posterior distribution. To determine the one-dimensional posterior distribution of \( {m_{\upnu }^2} \), the four-dimensional posterior distribution of \(( {m_{\upnu }^2} , E_0, A_\text {sig}, A_\text {bg})\) is marginalized over the three nuisance parameters.

Figure 16 shows the result of a MCMC sampling of the posterior distribution that uses the basic Metropolis-Hastings [60] algorithm. The underlying model is based on Eq. (44) with its standard four model parameters \(( {m_{\upnu }^2} , E_0, A_\text {sig}, A_\text {bg})\), using flat priors and the constraint \( {m_{\upnu }^2} \ge 0\hbox { eV}^2\). In this representation, the correlations between these parameters can be assessed easily. The correlation matrix of this particular example evaluates to:

 

\( {m_{\upnu }^2} \)

\(E_0\)

\(A_\text {sig}\)

\(A_\text {bg}\)

\( {m_{\upnu }^2} \)

1

   

\(E_0\)

0.698

1

  

\(A_\text {sig}\)

\(-\)0.581

\(-\)0.953

1

 

\(A_\text {bg}\)

0.396

\(-\)0.022

0.077

1

A comparison of Bayesian and Frequentist confidence intervals for various estimates of \( {m_{\upnu }^2} \) is given in Fig. 17. For positive estimates, the different methods yield similar results.

Fig. 17
figure 17

Marginalized likelihood functions for various estimates of \( {m_{\upnu }^2} \) from representative Asimov data sets(In a representative Asimov data set [61] statistical fluctuations are suppressed, effectively replacing the number of (generated) observed events \(N^\text {obs}_{jk}\) by their expectation value \(N^\text {pre}_{jk}\).). Top panel: \( \widehat{m_{\upnu }^2} = -0.05\hbox { eV}^2\). Middle panel: \( \widehat{m_{\upnu }^2} = 0.0\hbox { eV}^2\). Bottom panel: \( \widehat{m_{\upnu }^2} = 0.05\hbox { eV}^2\) (\( \widehat{m_{\upnu }} = 225\hbox { meV}\)). The horizontal bars indicate 95 % C.L.Frequentist central confidence intervals (Classic), Feldman and Cousins (Unified) respecting the physical boundary \( {m_{\upnu }^2} \ge 0\hbox { eV}^2\), and Bayesian credibility intervals (Bayesian) with a flat prior for \( {m_{\upnu }^2} \ge 0\hbox { eV}^2\). In the non-physical region the likelihood is calculated relying on Eq. (51)

5.4 Statistical and systematic uncertainties

Traditionally, the statistical uncertainty \(\sigma _\text {stat}( {m_{\upnu }^2} )\) is identified with the spread of an \( {m_{\upnu }^2} \) estimate caused by the randomness of the observed data (spectrum count rates \(N^\text {obs}_{k}\)), and usually decreases when data are taken (as \(1/\sqrt{N_k}\) or \(1/\sqrt{\varDelta t_k}\)). A systematic uncertainty \(\sigma _\text {syst}( {m_{\upnu }^2} )\), by contrast, represents an uncertainty in the \( {m_{\upnu }^2} \) estimate due to an uncertainty in the spectrum or response model which does not scale with the amount of data taken in general.

Providing a comprehensive review of all systematics of KATRIN – some of which are not adequately quantifiable until final commissioning and characterization of the experimental apparatus – is beyond the scope of this article. Among the major systematic contributors are the final state distribution (Sect. 2.4), the shape of the energy loss function and the inelastic scattering cross section (Sect. 4.2), the source-gas column density (Sect. 4.4), and high-voltage fluctuations (Sect. 4.11).

The total systematics budget of KATRIN is conservatively evaluated to a maximum value of \(\sigma _\text {syst}( {m_{\upnu }^2} ) \approx 0.017\hbox { eV}^2\) [4]. Accordingly, KATRIN’s setup and configuration are chosen in such a way that the statistical uncertainty, after an envisaged data-taking period of five calendar years, reaches \(\sigma _\text {stat}( {m_{\upnu }^2} ) \approx \sigma _\text {syst}( {m_{\upnu }^2} ) \approx 0.017\hbox { eV}^2\), as depicted in Fig. 18. These values are commonly translated into a \(90 \%\,\hbox {C.L.}\) sensitivity of

$$\begin{aligned} S( {m_{\upnu }} ) = \sqrt{ 1.645 \cdot \sigma _\text {tot}( {m_{\upnu }^2} ) } \approx 200\hbox { meV} \end{aligned}$$
(48)

with the total uncertainty on \( {m_{\upnu }^2} \)

$$\begin{aligned} \sigma _\text {tot}( {m_{\upnu }^2} ) = \sqrt{ \sigma _\text {stat}^2( {m_{\upnu }^2} ) \, + \, \sigma _\text {syst}^2( {m_{\upnu }^2} ) }. \end{aligned}$$
(49)
Fig. 18
figure 18

Statistical, systematic and total \(1 \sigma \) uncertainty of \( {m_{\upnu }^2} \) on the left vertical axis, and \(90 \%\,\hbox {C.L.}\) sensitivities of \( {m_{\upnu }} \) on the right vertical axis, plotted over the effective measuring time. Thirty-six live months (3 live years) correspond to 5 calendar years of KATRIN operation

5.5 Choice of the analysis energy interval

The optimal choice of the lower spectrum energy threshold for analysis is primarily determined by the ratio of the statistical and systematic uncertainties. Neither one should dominate. With the differential spectrum rising quadratically as the filter energy qU is lowered (for \(E_0-E \gg {m_{\upnu }} \)), the statistical uncertainty on the observed number of signal electrons \(\sigma _\text {stat}\left( N_{j}^\text {sig}(qU_k)\right) \) decreases. On the other hand, systematic uncertainties due to energy-loss processes or electronic excitations of the daughter molecule increase at lower energies. Assuming the design operational configuration of KATRIN (see Table 2), a lower threshold of \(E_0-30\hbox { eV}\) will lead to the desired alignment of statistical and total systematic uncertainties (\(\sigma _\text {stat}( {m_{\upnu }^2} ) \approx \sigma _\text {syst}( {m_{\upnu }^2} )\) ). As shown in Fig. 19, the spectrum in this energy range is mainly populated with electrons that have scattered off the source gas at most once.

Fig. 19
figure 19

The expected \(\upbeta \)-spectrum rate with different shaded areas depicting the fraction of scattered and unscattered electrons. The lower baseline comprises the 10 mcps energy-independent background component. Starting from the right, the shaded areas comprise signal \(\upbeta \)-electrons that are unscattered, scattered once, twice, and thrice

5.6 Measuring time distribution

Fig. 20
figure 20

An illustration of a hypothetical neutrino mass signal, using toy data simulated for \( {m_{\upnu }} = 350\hbox { meV}\) (red points + stat. error bars), compared against the theoretical model expectations for \( {m_{\upnu }} = 0\hbox { meV}\) (blue solid line), \( {m_{\upnu }} = 350\hbox { meV}\) (green dashed line) at nominal background of \(R_\text {bg} = 10\hbox { mcps}\), and \( {m_{\upnu }} = 350\hbox { meV}\) at elevated background \(R_\text {bg} = 100\hbox { mcps}\) (orange dash-dotted line). Top panel: The absolute rate \(\sum _{j}\dot{N}_{jk}(U_k) = N_{k}(U_k)\) is plotted against the retarding energy \(qU_k\) relative to the endpoint energy \(E_0\). Middle panel: The relative rate difference near the endpoint energy. Under the nominal background conditions, the largest deficit in rate due to a non-zero neutrino mass is expected to be about 4 eV below the endpoint, where the signal-to-background ratio is \(\approx 1\). For the scenario of a higher background rate, this point of maximal distortion is shifted to lower energies. The shaded bands indicate the statistical uncertainties. Bottom panel: The measuring time \(\varDelta t_k\) attributed to each retarding potential setting \(U_k\). The Poisson uncertainty of the generated toy rates \(\dot{N}_k\) is directly related to the measuring time through \(\sigma (\dot{N}_k) = \sqrt{\dot{N}_k \, / \, \varDelta t_k}\)

The distribution of measuring time \(\varDelta t_k\) over a range of retarding potentials is of particular importance. Because the statistical uncertainties of the observed Poissonian rates are given by

$$\begin{aligned} \sigma (\dot{N}) = \sqrt{N} / \, t = \sqrt{\dot{N} / \, t}, \end{aligned}$$
(50)

more measuring time should be allocated to those regions of the spectrum that are most effective for estimating the parameters of interest and the correlated nuisance parameters.

Figure 20 illustrates the relative spectrum rates with a measuring time distribution in the energy interval of \([E_0-30\hbox { eV}, E_0+5\hbox { eV}]\). In the case of \( \widehat{m_{\upnu }^2} \), sufficient measuring time must be spent on the region slightly below the endpoint, where the spectral distortion due to a non-zero \( {m_{\upnu }} \) is most prominent. This is also the region with a signal-to-background ratio between 2:1 and 1:1. Accordingly, for scenarios of elevated background, this feature of the measuring time distribution must be adapted and shifted to slightly lower energies.

The measuring time distribution can be further optimized to provide even better statistical leverage on the model parameters fit to the spectrum shape (see Sect. 5.1), reducing the statistical uncertainty \(\sigma _\text {stat}^\text {opti}\left( {m_{\upnu }^2} \right) < 0.015\hbox { eV}^2\) for nominal experimental conditions [62]. An example is shown in Fig. 21, which describes a rather sparse measuring time distribution with only four features, covering distinct retarding energy regions qU. The peak at the lower end of the analysis energy interval (\(\approx -30\hbox { eV}\)) is best suited to measure \(E_0\) and \(A_\text {sig}\) due to the higher absolute spectrum rates. At \(qU-E_0\approx -14.0\hbox { eV}\) the correlation between \(E_0\) and \(A_\text {sig}\) is broken. \( {m_{\upnu }^2} \) is measured through the \(\upbeta \)  spectrum shape distortion around \(qU-E_0\approx -4.5\hbox { eV}\), where about one third of the overall measuring time is invested. \(A_\text {bg}\) is measured using data beyond the endpoint energy \(E_0\), where no \(\upbeta \)-decay signal is expected. Note that all four of these parameters are correlated, so the measuring time cannot be shifted arbitrarily between these four regions of retarding energy.

This more focused model allows a lower statistical uncertainty of the measured \( {m_{\upnu }^2} \), however, it bears a higher risk of overseeing unexpected spectrum shape distortions in the neglected regions of the \(\upbeta \)-decay spectrum. To safeguard against such spectral deviations from the model and against unexpected systematics, a more uniform distribution, such as the one first shown in Fig. 20, seems more appropriate, at least for the initial data-taking period.

5.7 Negative \( {m_{\upnu }^2} \) estimates

The true value of \( {m_{\upnu }^2} \) is expected to be very close to \( {m_{\upnu }^2} = 0\hbox { eV}^2\) [63]. Assuming non-tachyonic neutrinos, the physical lower limit of the effective neutrino mass squared is given by the neutrino mass eigenstate splittings, measured by neutrino oscillation experiments [11].

Fig. 21
figure 21

The measuring time \(\varDelta t_k\) attributed to various retarding potential settings \(U_k\) in a more sparse, statistically optimized distribution

Fig. 22
figure 22

Extrapolation of the differential \(\upbeta \)  spectrum model for different values of the measured neutrino mass squared, including an non-physical value of \( \widehat{m_{\upnu }^2} = -1\hbox { eV}^2\) (dashed red line)

In order to allow the estimator \( \widehat{m_{\upnu }^2} \) to follow statistical fluctuations of the data beyond the physical boundary in a \(\chi ^2\) parameter fit, a non-physical continuous extension of the spectrum model can be introduced. It modifies the differential \(\upbeta \)  spectrum in Eq. (12) by

$$\begin{aligned} \epsilon _f \, \sqrt{\epsilon _f^2- {m_{\upnu }^2} } \; \longrightarrow \; \biggl ( \epsilon _f + \mu \, \text {e}^{ -\epsilon _f / \mu - 1 } \biggr ) \; \sqrt{\epsilon _f^2- {m_{\upnu }^2} } \end{aligned}$$
(51)

with \(\mu = k \, \sqrt{ - {m_{\upnu }^2} }\) for \( {m_{\upnu }^2} < 0\) and \(\mu = 0\) for \( {m_{\upnu }^2} > 0\) (see Fig. 22). The factor \(k \approx 0.72\) is adjusted based on numerical calculations to make the \(\chi ^2( {m_{\upnu }^2} )\) function (and negative log-likelihood respectively) symmetric around its minimum. A similar extrapolation scheme was used in the analysis of the Mainz and Troitsk neutrino mass experiments [5, 6].

For the construction of physical \( {m_{\upnu }} \) confidence intervals such a non-physical continuation of the model is not required. The unified approach ensures correct Frequentist coverage while allowing to respect parameter boundaries in the fit [57]. In a Bayesian framework the physical constraint is typically realized through a prior \(\pi ( {m_{\upnu }^2} ) = 0\) for \( {m_{\upnu }^2} < 0\hbox { eV}^2\).

6 Conclusion

Using \(\upbeta \)  spectroscopy, the KATRIN experiment aims to probe the absolute neutrino mass scale with an unprecedented sub-eV sensitivity. Both the statistical and systematic uncertainties of the model parameter of interest, the squared electron neutrino mass \( {m_{\upnu }^2} \), are required to be on the order of \({\mathcal {O}}(0.01\hbox { eV}^2)\). This demands a solid understanding and consistent implementation of the theoretical \(\upbeta \)-decay spectrum model and the experimental response function.

With this work, an effort was made to summarize the \(\upbeta \)  spectrum calculation with all known theoretical corrections relevant for spectroscopy in the endpoint region. Furthermore, a response function model of the KATRIN experiment was outlined, including its dependencies on source-gas dynamics and the spectrometer electromagnetic configuration. Finally, the statistical methods applicable to the intended measurement were investigated and concrete examples of their application to the KATRIN neutrino mass measurement were given.

In Sect. 4.12, an overview of the impact of various model components on the measured squared neutrino mass was given. The purpose is to provide a quantitative measure of their relative importance, indicating components that are negligible in the neutrino mass analysis. Among the most important effects are the radial dependencies of analyzing magnetic field and retarding potential, energy loss of signal electrons due to cyclotron motion and the Doppler broadening of the electron \(\upbeta \)-spectrum due to the source gas thermal motion.

The calculations presented here are implemented as part of a common C++ simulation and analysis software framework called Kasper, which is used by the KATRIN collaboration to investigate the effect of model corrections and possible systematics, and to optimize the operational parameters of the setup for the neutrino mass measurement [39, 62, 64,65,66].

During the ongoing commissioning measurement campaign of the KATRIN experiment, many aspects of the current response model will be verified with experimental data. The results of recent investigations are described in [38, 49, 67]. This thorough characterization of the complex setup will allow a quantitative evaluation of the systematic effects in the neutrino mass analysis at KATRIN.