1 Introduction

The question of the Neutrino Mass Ordering (NMO) is one of the main drivers of the field of neutrino oscillation physics. The NMO describes the ordering of the three neutrino mass eigenstates \(m_1\), \(m_2\), and \(m_3\). The two possible scenarios depend on the sign of \(\Delta m_{31}^2 = m_3^2 - m_1^2\), often referred to as the atmospheric mass splitting, where negative values are known as Inverted Ordering (IO) and positive values as Normal Ordering (NO).

The three neutrino mass states do not correspond directly to the three neutrino flavor states \(\nu _{e}\), \(\nu _{\mu }\), and \(\nu _{\tau }\). Instead, each mass state is a superposition of the flavour states, with the mixing described by the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix U [1,2,3], such that

$$\begin{aligned} \nu _\alpha = \sum _{i=1}^3 U_{\alpha ,i} \nu _i, \end{aligned}$$
(1)

where \(\alpha \in \left\{ e,\mu ,\tau \right\} \) labels the flavor states and \(i \in \left\{ 1,2,3 \right\} \) labels the mass states. By convention, \(\nu _{1}\) is the state containing the most electron flavor, and \(\nu _{3}\) is the state containing the least.

The mixing matrix U can be parameterized by a CP-violating phase \(\delta _\mathrm {CP}\) and three mixing angles \(\theta _{12}\), \(\theta _{13}\), and \(\theta _{23}\). In the case of Majorana neutrinos, two additional phases are included, which are of no relevance for this work. Since U is non-diagonal, flavor changes are observed depending on the energy and propagation distance of a neutrino, which are commonly known as neutrino oscillations. The oscillations are described by the mass splittings, mixing angles, and the CP-violating phase  [4].

For propagation through dense matter, the neutrino oscillations are modulated by interactions with electrons, which give rise to matter effects [5] such as the so-called MSW effect and parametric enhancement  [6,7,8,9,10]. Depending on the NMO, these modulations arise mainly in the neutrino (NO) or anti-neutrino channel (IO)  [11]. In measurements of solar neutrino oscillations, they were used to determine the ordering of the neutrino states \(\nu _1\) and \(\nu _2\) by finding \(m_2 > m_1\). Moreover, these modulations can be observed for atmospheric neutrinos that undergo matter effects during their propagation through the Earth. In contrast to long-baseline accelerator experiments, the signature observed in IceCube is largely independent of the value of \(\delta _\mathrm {CP}\), which allows for a complementary measurement of the NMO at higher energies, using atmospheric neutrinos  [12].

Atmospheric neutrinos are produced in the Earth’s atmosphere by interactions of cosmic rays with the nucleons of the air, generating mesons. These mesons decay generating electron and muon (anti-)neutrinos, which propagate through the Earth and can eventually be detected by an underground neutrino detector, such as IceCube  [13]. The baseline of propagation through Earth can be inferred by measuring the incoming zenith angle of the neutrino. The highest-energy oscillation maximum arises at \(E_\nu \sim 25\,\mathrm {GeV}\) for vertically up-going neutrinos, moving to lower energy at shorter baselines towards the horizon. For energies above a few GeV, the oscillations are mostly driven by the parameters \(\theta _{23}\) and \(\Delta m_{31}^2\), which are therefore referred to as atmospheric oscillation parameters  [4], while for vacuum only oscillations the value of \(\theta _{13}\) is too small for any detectable effect. Considering matter effects, however, the effective value of \(\theta _{13}\) under the right conditions can become sizeable, resulting in oscillation with electron flavors as shown in Fig. 2.

In atmospheric oscillations, the impact of the presence of matter arises mainly below \(E_\nu \sim 15\,\mathrm {GeV}\). The strength of these matter effects depends on the Earth’s matter profile, which we take as given by the Preliminary Reference Earth Model (PREM), shown in Fig. 1 [14].

Fig. 1
figure 1

Earth density profile, according to the Preliminary Reference Earth Model (PREM) and its approximation by 4- and 12-layers of constant density (commonly called PREM4 and PREM12, respectively) [14]

The oscillation probabilities for muon-flavored atmospheric neutrinos and anti-neutrinos to be found in the flavor state \(\alpha \in \left\{ e, \mu , \tau \right\} \) for a given zenith angle \(\theta _\nu \), and neutrino energy \(E_\nu \), are shown in Fig. 2. They are calculated with the PROB3++  [15] package and the PREM12 approximation (cf. Fig. 1), which are consistently used throughout this work. Due to the Earth’s geometry and its core-mantle structure, the visible modulations of atmospheric neutrino oscillations feature a clear zenith-dependence.

Fig. 2
figure 2

Oscillation probabilities for an atmospheric \(\nu _{\mu }\) or \({\overline{\nu }}_{\mu }\) upon reaching the IceCube detector, as a function of the cosine of the zenith angle, \(\theta _\nu \), and the energy, \(E_\nu \), of the neutrino, for the NO (a) and the IO (b) hypotheses. The probabilities are shown for the neutrino appearing as each of the three possible flavors, with the neutrino and anti-neutrino cases shown as the top and bottom rows in each panel. The dominant mixing of \(\nu _\mu \) and \(\nu _\tau \) is clearly visible, while the \(\nu _e\) flavor is mostly decoupled, except for a small contribution from matter effects below \(E_\nu \sim 15\,\mathrm {GeV}\)

Note that the oscillation patterns for neutrinos and anti-neutrinos flip between the two orderings. Thus, the NMO can be determined by finding the enhancement in transition probabilities from matter effects either in the neutrino channel (NO) or anti-neutrino channel (IO). For detectors insensitive to distinguishing neutrinos from anti-neutrinos on an event-by-event level, the NMO still leads to a visible net-effect in the amplitude of the observed matter effects, because the atmospheric fluxes and the cross sections for neutrinos and anti-neutrinos differ  [16, 17]. These differences mean that atmospheric neutrinos are measured at higher rates than the corresponding anti-neutrinos. Due to this rate difference, the strength of observed matter effects in a combined sample of neutrinos and anti-neutrinos is increased in case of NO and decreased in case of IO, which is the main signature targeted in this work.

Table 1 Overview of the main differences between the two NMO analyses in terms of the total number of observed events, the selection strategy, the reconstruction likelihood, the reconstructed energy range, the number of analysis bins (given as number of \(E_\nu ^\mathrm {reco}, \vartheta _\nu ^\mathrm {reco}, \mathrm {PID}\) bins), the background (atmospheric muon) description, the template generation, and the estimated fractions of the data sample from each contribution

The determination of the NMO has important implications for searches for neutrinoless double-\(\beta \) decay, where the entire mass region allowed in the case of IO is in reach of the next generation of experiments  [18, 19]. The NMO must also be determined as part of the search for CP-violation in the lepton sector, where the sensitivity to \(\delta _\mathrm {CP}\) depends strongly on the ordering  [20, 21]. Therefore, a measurement of the NMO is targeted by several future long-baseline, reactor, and atmospheric neutrino experiments, such as DUNE  [22], JUNO  [23], PINGU  [16, 24], ORCA [25], and Hyper-Kamiokande [26]. Moreover, current neutrino experiments such as T2K  [27], NOvA  [28], and Super-Kamiokande  [29] provide first indications of the NMO. Combining the results from several experiments, recent global fits prefer Normal over Inverted Ordering at \(\sim \!\! 2 - 3.5\,\sigma \) with a small preference for the upper octant (i.e. \(\sin ^2(\theta _{23})>0.5\))  [30,31,32,33].

2 The IceCube neutrino observatory

The IceCube Neutrino Observatory  [13] is a \(\sim \!\! 1\,\hbox {km}^3\) neutrino detector at the Geographic South Pole, optimized for detecting atmospheric and astrophysical neutrinos above \(E_\nu \sim 100 \,\mathrm {GeV}\). It consists of 86 strings running through the ice vertically from the surface almost to the bedrock, carrying a total of 5160 Digital Optical Modules (DOMs) at depths between 1450 and \(2450\,\mathrm {m}\) [34]. Each DOM houses a 10” photomultiplier tube and digitizing electronics, surrounded by a glass sphere  [13, 35, 36].

In the center of the detector, some of these strings form a more densely instrumented volume called DeepCore  [37]. It consists of 8 strings with an increased vertical density of DOMs with higher quantum-efficiency, surrounding one IceCube string. Due to the denser instrumentation and the higher quantum-efficiency DOMs, the DeepCore infill has a lower energy threshold than the surrounding IceCube array. The corresponding detection efficiency of DeepCore increases steeply between \(\sim 3\,\mathrm {GeV}\) and \(\sim 10\,\mathrm {GeV}\) and flattens for higher energies  [13, 37].

Neutrinos are detected by the Cherenkov emissions of their charged secondary particles, which are generated by Charged Current (CC) and Neutral Current (NC) interactions with the nucleons of the ice. In the case of CC muon-neutrino interactions, a hadronic cascade is initiated at the primary vertex, combined with an outgoing muon. The muon can propagate large distances through the detector, leading to an elongated shape of the energy deposition and thus of the Cherenkov light emission. Such events are called track-like signatures. In contrast, CC electron-neutrino, NC, and the majority of CC tau-neutrino interactions, do not produce a muon that can travel large distances. Instead, they initiate an electromagnetic and/or hadronic cascade that develops over a distance of a few meters. The light emission of this cascade is considerably smeared around the Cherenkov angle of the shower direction. Such events are called cascade-like. At low energies below a few tens of GeV, the separation of track- and cascade-like events becomes increasingly difficult, due to the short muon track and the coarse detector granularity. For oscillation measurements with DeepCore, this separation of track-like and cascade-like events is used to partially distinguish neutrino flavors  [37].

For the analyses presented here, we use the Honda 3D atmospheric neutrino simulation [38], and the GENIE neutrino interaction generator [39] with KNO [40] and PYTHIA [41]. For quasielastic and resonance events, the axial masses are set to \(M_{A}^\mathrm {qe}=0.99\,\mathrm {GeV}\) and \(M_{A}^\mathrm {res}=1.12\,\mathrm {GeV}\), respectively. Simulation of the atmospheric muon background uses CORSIKA [42], with the Polygonato-Hörandel model of the muon energy spectrum [43]. Muons are propagated through the ice using PROPOSAL [44]; the propagation of all other particles is based on GEANT4 [45, 46]. Cherenkov photons are propagated throught the ice using a GPU-based code [47]. More details of the simulation can be found in [48].

3 Data samples and reconstruction

In this work, two independent likelihood analyses are used to extract information about the NMO from DeepCore data. They are henceforth labelled Analysis \(\mathcal {A}\) and \(\mathcal {B}\), and the main differences between the two analyses are summarized in Table 1. Analysis \(\mathcal {A}\) is designed to optimize the sensitivity to the NMO with DeepCore and considered the main result of this work, while Analysis \(\mathcal {B}\) is designed to resemble the proposed PINGU analysis from [24], using only events that are fully-contained in the DeepCore detector, and is used as a confirmatory result here. Further details about Analyses \(\mathcal {A}\) and \(\mathcal {B}\) can be found in [49] and [50], respectively. The use of two independent analyses with partially complementary data sets gives great confidence in the quantitative conclusions of the analysis presented here and the treatment and impact of the systematic uncertainties.

The analyses are based on DeepCore data taken between May 2012 and April 2014, comprising a total livetime of 1006 (1022) days for Analysis \(\mathcal {A}\) (\(\mathcal {B}\)). The difference in livetime arises from slightly different criteria on the stability of data acquisition. The data is run through two largely independent processing chains, where both samples are acquired by filtering the data in several successive steps of selection. These steps include the application of selection criteria on well-understood variables, as well as machine-learning methods, namely Boosted Decision Trees  [51]. The selections are aiming for a reduction of the background of atmospheric muons and triggered noise, while maintaining a large fraction of well-reconstructed, low-energy neutrino events below \(E_{\nu }\sim 100\,\mathrm {GeV}\). Both samples are described in more detail in [48]. Compared to [48], the samples used in this work differ by the following modifications:

First, events with a reconstructed vertex outside the detector that enter from below are not vetoed in Analysis \(\mathcal {A}\) using the lower part of the DeepCore detector, as it is done for downgoing and horizontal events using the surrounding IceCube detector. This increases the statistics at the expense of a reduced energy resolution for these uncontained events, especially at high energies. The loss in energy resolution is due to the unobserved fraction of deposited energy outside the detector volume. Second, the range of reconstructed energies considered is extended for both analyses compared to [48], from 56 to \(90\,\mathrm {GeV}\) (\(80\,\mathrm {GeV}\)) for Analysis \(\mathcal {A}\) (\(\mathcal {B}\)), allowing us to constrain nuisance parameters outside the strongest oscillation region. Third, both analyses use exclusively upgoing events (i.e. \(\cos (\theta _\nu ^\mathrm {reco}) < 0\)) to reduce the background from atmospheric muons.

The final samples are reconstructed with the same algorithm for Analyses \(\mathcal {A}\) and \(\mathcal {B}\)   [48, 49]. It is based on a likelihood function that links the number and the arrival times of the observed Cherenkov photons in all DOMs to a physics hypothesis. The physics hypothesis is given by the position and time of the interaction vertex, the neutrino direction, and the neutrino energy, which are the parameters of the likelihood optimization. The reconstruction is run separately for a starting track and a cascade-only hypothesis, where the starting track hypothesis features a cascade at the primary vertex with an additional parameter L for the length of an outgoing muon track. Since the track hypothesis allows for fitting the track length to \(L=0\), the 7-dimensional cascade-only-hypothesis is nested within the 8-dimensional track-hypothesis. The log-likelihood difference between track and cascade-only hypothesis is used as the flavor-separating variable, called Particle Identification (PID). Besides the reconstructed neutrino zenith angle \(\theta _\nu ^\mathrm {reco}\) and neutrino energy \(E_\nu ^\mathrm {reco}\), the PID is used as a third observable entering the likelihood analyses described in Sect. 4. The distribution of the PID variable for Analysis \(\mathcal {A}\) is shown in Fig. 3.

Fig. 3
figure 3

The distribution of the particle identification variable for Analysis \(\mathcal {A}\). The blue band on the data/MC ratio is the statistical uncertainty

In the reconstruction, the optimized likelihood function differs between the two analyses: For Analysis \(\mathcal {B}\), the reconstruction likelihood is defined using the observed charge binned in time for each DOM as a proxy for the observed number of Cherenkov photons. Since some deviations were found between data and Monte Carlo in charge-related quantities, the likelihood was reformulated in a charge-independent way for Analysis \(\mathcal {A}\), such that the charge amplitude information was removed and the only information used is whether a DOM is hit or not hit in multiple bins of time. In terms of the resolutions in reconstructed zenith angle \(\theta _\nu ^\mathrm {reco}\) and neutrino energy \(E_\nu \), the impact of the likelihood reformulation was found to be small. Moreover for Analysis \(\mathcal {B}\), the impact of the charge mismatch is estimated to be small in comparison to the statistical uncertainty on the observed NMO.

After the data selection, the number of events in Sample \(\mathcal {A}\) exceed the number of events in Sample \(\mathcal {B}\) by a factor of 1.87, while providing similar resolutions in energy and zenith angle.

Note that for Analysis \(\mathcal {B}\), the atmospheric muon background is estimated from data in an off-signal region, while for Analysis \(\mathcal {A}\), it is obtained from Monte Carlo simulations (cf. Table 1). As a result, there is no a priori Monte Carlo prediction for the atmospheric muon contamination in Sample \(\mathcal {B}\). However, the fraction of atmospheric muons is fitted in the analysis as discussed in Sect. 6. The contamination of triggered noise was found to be only \(\lesssim 0.1\%\) for both samples. It is included into the likelihood fit for Analysis \(\mathcal {A}\), while it is neglected for Analysis \(\mathcal {B}\).

The final samples consist of CC muon neutrino, CC electron neutrino, CC tau neutrino, NC, and atmospheric muon events. These different components are called contributions in the following and are simulated separately in Monte Carlo except for the atmospheric muon contribution used in Analysis \(\mathcal {B}\) that is parametrized from an off-signal data region.

The estimated fraction of the data samples from each contribution is shown in Table 1. These fractions are calculated using the best-fit values for all systematic parameters, discussed in Sect. 4.

4 Analyses

Both Analyses \(\mathcal {A}\) and \(\mathcal {B}\) use a binned likelihood method to determine the NMO by observing the signature from Fig. 2 in reconstructed variables. Since a separation of all flavors cannot be done with DeepCore, the PID is used to distinguish track- and cascade-like events, while neutrino energy and zenith angle are obtained from the reconstruction described in Sect. 3.

Fig. 4
figure 4

Comparison of Monte Carlo template for atmospheric muons in Analysis \(\mathcal {A}\), generated as histogram (top) and KDE (bottom): the latter one is used in the analysis, due to the reduced impact of limited Monte Carlo statistics

For both analyses, the binning is summarized in Table 1. For Analysis \(\mathcal {B}\) only two PID bins are used to separate track- and cascade-like events, analogously to [16], while Analysis \(\mathcal {A}\) uses three PID bins. This is motivated by the weak separation power at low energies, where the confidence in the separation can be taken into account by including an additional, intermediate PID bin. The binning in neutrino energy and zenith angle is chosen to be uniform in \(\log _{10}(E_\nu ^\mathrm {reco})\) and \(\cos (\theta ^\mathrm {reco}_\nu )\) for Analysis \(\mathcal {B}\). For Analysis \(\mathcal {A}\), it is also uniform in \(\cos (\theta ^\mathrm {reco}_\nu )\), while it is optimized in \(\log _{10}(E_\nu ^\mathrm {reco})\) to roughly follow the available statistics and maintain a large number of bins in the most interesting region at \(E_\nu \sim 10\,\mathrm {GeV}\).

In Analysis \(\mathcal {B}\), the binning is used to generate Monte Carlo distributions, called templates, in \(E_{\nu }^\mathrm {reco}\), \(\theta _{\nu }^\mathrm {reco}\), and PID for each contribution to the data sample, using histograms. In contrast, Analysis \(\mathcal {A}\) applies an adaptive Kernel Density Estimation (KDE) method to produce these templates, which smooths the fluctuations from limited Monte Carlo statistics. These uncertainties arise mainly from the atmospheric muon template, where the available Monte Carlo statistics are similar to those from experimental data, due to the time-intensive simulation of atmospheric muons.

The KDE method is analogous to the one used in [52] and based on [53]. However, the method from [53] is extended by reflecting the KDE at the boundaries of the binned parameter space and integrating the resulting distribution to obtain a prediction for the bin content  [54]. For the atmospheric muons, this is illustrated in Fig. 4, where the Monte Carlo template for atmospheric muons is generated with histograms (top) and the above mentioned KDE method (bottom). In the case of histograms, the fluctuations in the bin content, arising from limited Monte Carlo statistics, are clearly visible.

The uncertainties on the KDE prediction are estimated using bootstrapping for every contribution from Sect. 3 separately  [55]. For each contribution, which consists of N MC events, events are drawn randomly from this sample, replacing the event each time so that it can be drawn again, until N events have been drawn. This new sample of N events is called a bootstrapped sub-sample, and from this a new KDE template is generated. This process is repeated several times and the uncertainty on each bin content in the original KDE template is estimated from the resulting distribution of bin contents in the bootstrapped samples.

For Analysis \(\mathcal {A}\), the three-dimensional template obtained from the combination of all Monte Carlo contributions is shown in Fig. 5. Additionally, the expected pulls on each bin are shown in the case that the true ordering is inverted but the NO hypothesis is tested. This is used as a metric to visualize the signature of the NMO  [16]. As can be seen in Fig. 5, the expected pulls between NO and IO are small, which already indicates the low sensitivity due to the limited resolution and statistics of DeepCore at energies \(E_\nu \lesssim 15\,\mathrm {GeV}\).

Fig. 5
figure 5

Top: the distribution in PID, zenith angle, and neutrino energy for Analysis \(\mathcal {A}\) that enters the likelihood calculation; bottom: corresponding signature of the NMO, given as expected pull on the bin content in case IO is observed but NO is tested, using Poissonian statistics

Using these distributions, likelihoods are defined for both analyses. For Analysis \(\mathcal {A}\), the negative log-likelihood LLH is given by

$$\begin{aligned} \mathrm {LLH} = \left[ - \sum _{i \in \left\{ \mathrm {bins} \right\} } \ln \left( \frac{p_i^\mathrm {tot}(N^\mathcal {A} _i, \mu ^\mathcal {A} _i, \sigma ^\mathcal {A} _{\mu _i})}{p_i^\mathrm {tot}(N^\mathcal {A} _i, N^\mathcal {A} _i, \sigma ^\mathcal {A} _{\mu _i})} \right) \right] + \frac{1}{2} S, \end{aligned}$$
(2)

where the term S is common to the likelihood of both analyses and will be defined after discussing the other terms. The term \(p_i^\mathrm {tot}(N^\mathcal {A} _i, \mu ^\mathcal {A} _i, \sigma ^\mathcal {A} _{\mu _i})\) gives the probability of observing \(N^\mathcal {A} _i\) events in bin i, if \(\mu ^\mathcal {A} _i\) events are expected. It is obtained by a convolution of a Poissonian distribution and a narrow log-normal probability density function that describes the uncertainty \(\sigma ^\mathcal {A} _{\mu _i}\) on the Monte Carlo prediction \(\mu _i\). The uncertainty \(\sigma _{\mu _i}^\mathcal {A} \) is obtained from a quadratic combination of the individual template uncertainties for every contribution, obtained from bootstraping.

Due to the KDE method used in Analysis \(\mathcal {A}\), the dominant template uncertainties in the description of atmospheric muons are strongly reduced, such that the uncertainties on the total template are typically \(\sim 10\%\) of the Poissonian error expected from data fluctuations. Thus, for Analysis \(\mathcal {A}\) these template uncertainties contribute only marginally to the following results.

For Analysis \(\mathcal {B}\), the likelihood is adapted from [56], where a \(\chi ^2\)-value is calculated by quadratically combining the Poissonian error on the predicted bin content \( \mu ^{\mathcal {B}}_i \) with the uncertainty \(\sigma ^{\mathcal {B}}_{\mu _i}\) on the combined template of all contributions. It is given by

$$\begin{aligned} \chi ^2 = 2\mathrm {LLH} = \sum _{i \in \left\{ \mathrm {bins} \right\} } \frac{\left( N^\mathcal {B} _i - \mu ^\mathcal {B} _i\right) ^2}{ \mu ^\mathcal {B} _i +(\sigma ^\mathcal {B} _{\mu _i})^2 } + S, \end{aligned}$$
(3)

where the labels are analogous to Eq. (2). Here, the uncertainties \(\sigma ^{\mathcal {B}}_{\mu _i}\) on the templates are estimated from the statistical error due to limited Monte Carlo and an uncertainty on the atmospheric muon template, estimated from off-signal data.

Table 2 Systematics treated as nuisance parameters in the likelihood analysis, including normalization (N), detector response (D), oscillation (O), flux (F), and neutrino-nucleon interaction (I) uncertainties. These parameters are discussed in more detail in [48]. The table gives the baseline value and, if the parameter is used with a prior in the likelihood, the standard deviation of the Gaussian prior, as well as the experimental best-fit values for both analyses and ordering hypotheses

The dominant systematic uncertainties are included in both likelihood fits using nuisance parameters. These nuisance parameters comprise uncertainties in the atmospheric neutrino flux, the atmospheric oscillation parameters, the neutrino-nucleon cross sections, and the detector response. All systematic parameters are allowed to vary simultaneously and independently in the fit; we assume there are no correlations between the pulls on the various parameters. The parameters are listed in Table 2. To account for external constraints on these systematic parameters, Gaussian priors are included into the likelihood by the term S,

$$\begin{aligned} S =\sum _{s \in \left\{ \mathrm {sys} \right\} } \left( \frac{ s-s_0}{\sigma _s} \right) ^2, \end{aligned}$$
(4)

where the sum runs over all systematic parameters. For each parameter, the tested value s is compared to the expected baseline value \(s_0\) with respect to its estimated uncertainty \(\sigma _s\). The baseline value \(s_0\) and width \(\sigma _s\) of each prior are identical for both analyses, and are stated in Table 2; the central value and the width are motivated by the provided references where possible. As indicated in Table 2, the prior for some parameters was removed in Analysis \(\mathcal {B}\). Due to the small sensitivity to the NMO, the prior assumption was found to imply a preference on the NMO in case the true parameter value differs from the baseline value, which is avoided by removing the corresponding priors from the likelihood. Thus, no external knowledge is included on these parameters, allowing for larger deviations from the baseline value.

The parameters \(N_\nu \), \(N_{\nu _e}\), \(N_\mathrm {NC}\), and \(N_\mu \) are used to vary the normalizations of the different contributions from Table 1. Thus, they account for uncertainties in interaction cross sections, the total neutrino and muon fluxes, the \(\nu _e/\nu _\mu \) production ratio, and detection efficiencies.

Additional uncertainties on the neutrino fluxes predicted in [38] are modelled by the parameters \(\gamma _\nu \), \(\sigma _\nu ^\mathrm {zenith}\), and \(\Delta (\nu /{\bar{\nu }})\). Here, \(\gamma _\nu \) incorporates uncertainties in the neutrino energy spectrum, arising from flux, and cross section uncertainties, according to a reweighting of Monte Carlo events \(\propto \!(E_\nu /\mathrm {GeV})^{\gamma _\nu }\), while \(\sigma _\nu ^\mathrm {zenith}\) and \(\Delta (\nu /{\bar{\nu }})\) incorporate the dominant uncertainties from   [58] in an ad hoc parametrization. The uncertainties on the production of atmospheric muons arising from the spectrum and compositions of the cosmic ray primary flux are represented by the parameter \(\gamma _\mu \). Note that \(\gamma _\mu \) is only included as an uncertainty for Analysis \(\mathcal {A}\), since the atmospheric muon template in Analysis \(\mathcal {B}\) is estimated from data.

Uncertainties in neutrino-nucleon interactions are represented by the parameters \(M_A^\mathrm {res}\) and \(M_A^\mathrm {qe}\), which model the axial mass of resonant and quasi-elastic interactions. Note that uncertainties on the cross section for deep inelastic scattering were also parametrized, but found to be negligible and therefore not included into the likelihood fit.

Detector uncertainties are modelled by the parameters \(\epsilon _\mathrm {opt}\), \(\epsilon _\mathrm {lateral}\), and \(\epsilon _\mathrm {head-on}\), which describe the optical detection efficiency of the DOMs. The value of \(\epsilon _\mathrm {opt}\) gives the total detection efficiency per photon, relative to the baseline scenario. In contrast, the parameters \(\epsilon _\mathrm {lateral}\) and \(\epsilon _\mathrm {head-on}\) describe the dependence of the photon detection efficiency on the inclination angle of the incoming photon. Here, \(\epsilon _\mathrm {lateral}\) changes the slope of the acceptance curve, while \(\epsilon _\mathrm {head-on}\) controls the acceptance of very vertically upgoing photons independently. Besides actual uncertainties in the DOMs’ detection efficiency, these parameters incorporate uncertainties with respect to the optical properties of the ice in the refrozen drill holes that surround the DOMs.

All of the systematic parameters mentioned above are described in more detail in [48]. Besides the parameters included in the fit, additional uncertainties have been investigated and tested for their possible effect on the analysis [49, 50]. These parameters are the normalizations of sub-dominant experimental backgrounds (detector noise and event pile-up from coincident atmospheric muons), additional uncertainties on the optical properties of the ice, the oscillation parameters (\(\theta _{12}\), \(\theta _{13}\), \(\Delta m_{21}^2\), and \(\delta _{\rm CP}\)), and Bjorken-x dependent uncertainties in the cross section for deep-inelastic neutrino-nucleon scattering. Two types of tests were performed to determine the impact of these parameters. In the first test, a parameter is injected into a MC fake dataset, shifted from its nominal value by \(\pm \,\, 1\sigma \) in the case of a detector systematic and by \(\pm \,\, 3\sigma \) in the case of an oscillation parameter. This MC fake data is fit using the same MC set, but with the parameter in question fixed to its unshifted nominal value to assess whether the uncertainty in the systematic parameter can bias the measured ordering hypothesis. This test is repeated for MC fake datasets generated with both mass orderings. For none of these systematic or oscillation parameters is a bias observed in the measured preference for the mass ordering of more than \(0.05\sigma \). For the case of \(\delta _{\rm CP}\), the value of \(\delta _{\rm CP}=180^{\circ }\) is being used as the ‘nominal’ value for the MC fake dataset, with the value of \(\delta _{\rm CP}=270^{\circ }\) injected into the MC used in the fit; a negligible (less than \(0.01\sigma \)) bias in the ordering preference is observed. In the second test, the same shifted values as above are injected into the MC fake dataset, but now the parameter under test is allowed to vary in the fit. This allows us to determine if the inclusion of these parameters into the fit causes any loss of sensitivity to the mass ordering. None of the parameters in question cause a loss of sensitivity of more than \(0.05\sigma \); the inclusion of \(\delta _{\rm CP}\) reduces the sensitivity by less than \(0.03\sigma \). Since all the parameters described in this paragraph are shown to have no impact on the mass ordering sensitivity, or the potential to cause a bias, they have been set to their nominal values (in the case of oscillation parameters, to the NuFit [32] best-fit values), and are not included in the final fit in order to minimise the computing time required for the multi-parameter minimisation.

For Analysis \(\mathcal {A}\) (\(\mathcal {B}\)), the negative log-likelihood from Eqs. (2) (3) is optimized. To do this, \(\mathrm {LLH} \equiv -0.5 \chi ^2\) is used as the negative log-likelihood for Analysis \(\mathcal {B}\). During this optimization, the first and the second octant in \(\theta _{23}\) are fitted separately for both orderings, allowing all the parameters listed in Table 2 to vary, and the fit optimizing the \(\mathrm {LLH}\) is taken as the best-fit for this ordering. The resulting difference, \(2\Delta \mathrm {LLH}_\mathrm {NO-IO} \equiv \Delta \chi ^{2}_{\mathrm {NO}-\mathrm {IO}}\), between the NO and IO hypotheses is then calculated for both analyses.

Finally, \(2\Delta \mathrm {LLH}_\mathrm {NO-IO}\) (\(\chi ^2_\mathrm {NO-IO}\)) is used as a test-statistic (TS) in Sect. 6 for Analyses \(\mathcal {A}\) (\(\mathcal {B}\)) to derive the experimental result from the fit to the data.

5 Sensitivity to the neutrino mass ordering

The determination of the Neutrino Mass Ordering is a binary hypothesis test, which requires the test of two non-nested hypotheses. This is different from most other applications in particle physics, where a general hypothesis \({\mathcal {H}}_G\) is tested against a specific one, \({\mathcal {H}}_S\), in the sense that the specific hypothesis is obtained for a certain realization of the parameters of \({\mathcal {H}}_G\). For such nested hypotheses, Wilks’ Theorem is commonly used to derive sensitivities and to estimate limits on fitted parameters  [59]. In contrast, Wilks’ Theorem does not apply to the determination of the Neutrino Mass Ordering, since the discrete choice of Normal or Inverted Ordering is not related to the fixing of degrees of freedom  [60].

Due to the subtleties involved in the statistical treatment and since a determination of the NMO is expected within the next decade, the correct method to quantify the preference is object of many discussions  [60,61,62]. Here, two methods are used to estimate the sensitivity, which are described in the following.

The first method is a statistically rigorous analysis of the resulting likelihood values, using the obtained value of \(2\Delta \mathrm {LLH}_\mathrm {NO-IO}\) as a TS. It derives the resulting sensitivity, given by the expected confidence in the determination of the NMO, from a frequentist coverage test. To do this, the data is fit with both ordering hypotheses giving a value for the TS and two sets of best-fit systematic parameters, \(\eta ^{\mathrm {NO}}\) and \(\eta ^{\mathrm {IO}}\). These fits are called fiducial fits (FD) in the following.

From these parameters, the resulting best-fit templates are generated for NO and IO. Then, these templates are used to generate Pseudo-Experiments or Pseudo-Trials (PT)s by adding Poissonian fluctuations on the bin-contents, as expected in a real-world experiment; in this analysis, which has a sensitivity dominated by the statistical uncertainty, it is unnecessary to fluctuate each PT according to the systematic uncertainties. Afterwards, each PT is fitted with both ordering hypotheses, resulting in a new value for the \(\mathrm {TS} = \Delta \chi ^2_\mathrm {NO-IO}= 2\Delta \mathrm {LLH}_\mathrm {NO-IO}\) . From these PTs, two distributions of the TS are obtained for the two sets of injected parameters \(\eta ^{\mathrm {NO}}\) and \(\eta ^{\mathrm {IO}}\).

This process of creating PTs for \(\eta ^{\mathrm {NO}}\) and \(\eta ^{\mathrm {IO}}\) and fitting them with both hypotheses is repeated several times to estimate a TS distribution for each of the ordering hypotheses. The TS distributions for NO and IO are then used to estimate the analysis sensitivity, i.e. the expected p-values for the exclusion of each hypothesis. To do this, the fraction of PTs for NO (IO) that is to the right (left) of the median of the IO (NO) distribution is taken as the expected p-value for the exclusion of the NO (IO) hypothesis, if IO (NO) is the true ordering. This is sketched in Fig. 6 for two generic distributions.

Fig. 6
figure 6

Sketch of the frequentist method, using idealized distributions to illustrate the concepts. The red (blue) distribution will be derived from from PTs assuming the \({\mathcal {H}}_\mathrm {NO}\) (\({\mathcal {H}}_\mathrm {IO}\)) hypothesis. The black vertical line represents a hypothetically observed value of \(\Delta \mathrm {LLH}_\mathrm {NO-IO}\). The resulting p-values (right, vertical axis) for the hypotheses are derived from the cumulative density distributions, marked as red (blue) solid lines for NO (IO)

The frequentist method is summarized as a flow-chart in Fig. 7. Note that this procedure is similar to the treatment of data, described in Sect. 6, where the experimental fit is used as fiducial fit to produce PTs. Unfortunately, the frequentist method is computationally very expensive. Thus, for performing more detailed parameter studies, a second, faster method is used.

Fig. 7
figure 7

Flow-chart representing the procedure of the frequentist method used to derive p-values for the NO and IO hypotheses. Abbreviations, as defined in the text, are FD (fiducial fits), PD (pseudo-dataset), H (hypothesis), and TS (test statistic)

The second method for deriving sensitivities is an Asimov approach adapted from [60]. Instead of generating PTs, the total MC template, with no Poissonian fluctuations, is fitted directly for both hypotheses. In the following, we refer to this MC template as the generated-ordering (GO) hypothesis, \({\mathcal {H}}_\mathrm {GO}\), where the GO can be either NO or IO. This is then fitted under assumptions of both hypotheses, NO and IO, where the hypothesis used in the fit is called the fitted-ordering (FO) hypothesis, \({\mathcal {H}}_\mathrm {FO}\). The negative log-likelihood value obtained from the fit is \(\overline{\mathrm {LLH}}_\mathrm {FO}({\mathcal {H}}_\mathrm {GO}) = 0\) if \({\mathcal {H}}_\mathrm {FO} = {\mathcal {H}}_\mathrm {GO}\) and \(\overline{\mathrm {LLH}}_\mathrm {FO}({\mathcal {H}}_\mathrm {GO}) > 0\) otherwise, where the bars indicate that the values were obtained by injecting the template of the GO directly.

The resulting value of \(2\overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}\) is assumed to be representative for the behavior obtained using PTs. The sensitivity to the generated ordering, \(n_\sigma ^\mathrm {GO} \), in terms of one-sided Gaussian standard deviations is

$$\begin{aligned} n^\mathrm {GO}_{\sigma } = \frac{\overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}({\mathcal {H}}_\mathrm {GO}) - \overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}({\mathcal {H}}_{\widetilde{\mathrm {GO}}}) }{ \sqrt{ 2\overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}({\mathcal {H}}_{\widetilde{\mathrm {GO}}}) } }, \end{aligned}$$
(5)

where \({\widetilde{GO}} \in \{ \mathrm {IO},\mathrm {NO} \}\) is the opposite hypothesis to GO, generated with the best-fit set of systematic parameters \( \eta ^{\widetilde{\mathrm {GO}}} \in \{ \eta ^{\mathrm {IO}}, \eta ^{\mathrm {NO}} \}\) corresponding to the set \( \eta ^{\mathrm {GO}}\in \{\eta ^{\mathrm {NO}}, \eta ^{\mathrm {IO}} \}\) used for \({\mathcal {H}}_\mathrm {GO}\). Note that the sensitivity \(n_\sigma ^\mathrm {GO} \) describes the expected p-value for the exclusion of the \(\widetilde{\mathrm {GO}} \) hypothesis in the case that the true ordering is the GO  [60].

The choice of one- instead of two-sided Gaussian standard deviations is motivated by the fact that an experiment with no sensitivity to the NMO, i.e. if the two distributions for NO and IO in Fig. 6 were identical, would lead to a \(50\%\) chance of obtaining the correct ordering by random chance. This should not be misinterpreted as sensitivity and thus should give \(n_\sigma ^\mathrm {NO,\,IO} = 0\), which is the case for one-sided but not two-sided Gaussians.

The resulting sensitivities for both methods are shown in Fig. 8, as a function of the true value of \(\sin ^2(\theta _{23})\). The blue and red lines indicate the result from the Asimov method for Analysis \(\mathcal {A}\) (solid lines) and Analysis \(\mathcal {B}\) (dashed lines). The sensitivities are validated at certain values of \(\sin ^2(\theta _{23})\) using the frequentist method, as indicated by the circle (\(\mathcal {A}\)) and star (\(\mathcal {B}\)) markers, where the uncertainties arise from the finite number of PTs.

Fig. 8
figure 8

Sensitivities of Analyses \(\mathcal {A}\) and \(\mathcal {B}\) to the NMO in terms of one-sided Gaussian sigmas (left vertical axis) and p-values (right vertical axis) derived by the Asimov-method (lines), and validated at certain values of \(\sin ^2(\theta _{23})\) using the frequentist method (markers). The statistical errors on the frequentist points arise from the finite number of PTs used due to the computationally intensive nature of the frequentist method

As visible in Fig. 8, the resulting sensitivity is \(< 1\sigma \) for both orderings and analyses. Moreover, Analysis \(\mathcal {A}\) is more sensitive to the NMO than Analysis \(\mathcal {B}\), which is due to the increased statistics, the additional bins in PID, energy and zenith, and the reduced impact from limited Monte Carlo statistics, due to the usage of KDEs in the generation of Monte Carlo templates.

Note that a characteristic shape is found for the \(\sin ^{2}(\theta _{23})\)-dependence of \(n_\sigma ^\mathrm {NO,\,IO}\), which is different for the NO and IO hypotheses. The observed features are similar to those found for the PINGU sensitivity in [16]. They arise from the interplay of the two independent octant fits for \(\overline{\mathrm {LLH}}_\mathrm {GO}\) and \(\overline{\mathrm {LLH}}_{\widetilde{\mathrm {GO}}}\), used to calculate the values of \(\overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}({\mathcal {H}}_\mathrm {GO})\) and \(\overline{\Delta \mathrm {LLH}}_\mathrm {NO-IO}({\mathcal {H}}_{\widetilde{\mathrm {GO}}})\) in Eq. (5), where the preferred octant is not necessarily the true one in the case that \(\widetilde{\mathrm {GO}}\) is fitted. As a result, the behavior of \(n_\sigma ^\mathrm {NO,\,IO}\) changes each time the octant is flipped for one of the two negative log-likelihood differences (\(\Delta \mathrm {LLH}\)) in Eq. (5).

The observed sensitivities for the Asimov method agree roughly with the PTs, while perfect agreement is not expected due to several approximations used in the Asimov-method  [60]. However, the Asimov method is used as an estimator for the true sensitivity.

Note that for some observed values in Fig. 6 the p-values for both hypotheses can be small, in case the observed data agrees with neither the NO nor IO hypotheses. For example, this could be the case for \(\Delta \mathrm {LLH}_\mathrm {NO-IO}> 2\) or \(\Delta \mathrm {LLH}_\mathrm {NO-IO}< -2\), which is in the tail of both distributions in Fig. 6. In this case, the small p-value might lead to the wrong impression that the data clearly favors the alternative over the null hypothesis. To properly account for this, the \(p\)-values are combined into a \(\mathrm {CL}_S\)-value,

$$\begin{aligned} \mathrm {CL}^{\mathcal {A}/\mathcal {B}}_S\left( {\mathcal {H}}_\mathrm {TO}\right) =\frac{p_{\mathcal {A}/\mathcal {B}}({\mathcal {H}}_\mathrm {TO})}{1-p_{\mathcal {A}/\mathcal {B}} ({\mathcal {H}}_{\widetilde{\mathrm {TO}}})}\,, \end{aligned}$$
(6)

where TO is the tested ordering and \(\widetilde{\mathrm {TO}}\) is the opposite ordering hypotheses. This equation is taken from [63] where a more detailed discussion of its derivation can be found. Its value is limited to \(\mathrm {CL}_S \in [0,1]\), where \(\mathrm {CL}_S \approx 1 \) indicates no preference for one hypothesis over the other and \(\mathrm {CL}_S \approx 0 \) indicates a strong disfavoring of the given hypothesis. The \(\mathrm {CL}_S\) value can be interpreted as confidence in the result with a confidence level of \(1-\mathrm {CL}_S\). More illustratively, the \(\mathrm {CL}_S\) value describes how much less likely the observed value would occur under the disfavored hypothesis, compared to the favored one.

Finally, potential improvements of the sensitivity are tested for Analysis \(\mathcal {A}\). By fixing individual and combinations of systematic parameters in the Asimov fit, the absolute gain in sensitivity from an improved understanding of systematic uncertainties is found to be small, except for the oscillation parameters. This is due to the weak NMO signature, which barely pulls the systematic parameters and thus is only weakly affected by fixing them. Instead, it is found that the sensitivity could be improved in the future by additional data statistics and improvements on the event reconstruction, which reduce the smearing-out of the NMO signature due to the low resolution in energy, zenith, and PID at the lowest energies  [49].

6 Results

For both analyses, the experimental data is fitted with the likelihood method, described in Sect. 4. The data, along with the best-fit predictions, are shown for Analysis \(\mathcal {B}\) in Fig. 9. The resulting best-fit values for all systematic parameters are shown in Table 2. The observed pulls are within the expected ranges for all parameters, taking statistical fluctuations and the uncertainties of the true value of each parameter into account. The corresponding values of the metric for the NO (IO) hypothesis are \(2\mathrm {LLH} = 293.38\) (294.12) for Analysis \(\mathcal {A}\) and \(\chi ^2=107.82\) (107.50) for Analysis \(\mathcal {B}\). The metric is used as a goodness-of-fit estimator for the agreement of data and Monte Carlo by comparing these values to the expectation from PTs. The resulting p-values for Analyses \(\mathcal {A}\) and \(\mathcal {B}\) are \(p_\mathrm {gof}^\mathcal {A} =43.5\%\) and \(p_\mathrm {gof}^\mathcal {B} =11.0\%\), indicating the data to be well-described by the MC templates.

Fig. 9
figure 9

The energy and zenith-angle distributions of the data from Analysis \(\mathcal {B}\). Also shown are the best-fit simulations for both orderings, where the red and blue lines fall almost on top of each other

For Analyses \(\mathcal {A}\) and \(\mathcal {B}\), the observed values of the test-statistic are \(2\Delta \mathrm {LLH}_\mathrm {NO-IO}=-0.738\) and \(\Delta \chi ^2_\mathrm {NO-IO}=0.3196\). Thus, the fits for the main result (\(\mathcal {A}\)) and the confirmatory result (\(\mathcal {B}\)) prefer NO and IO, respectively, while both results are compatible within their statistical uncertainties, i.e. both results have a test statistic within one unit of zero.

Fig. 10
figure 10

Distribution of the TS from PTs, generated with the best-fit systematic parameters \(\eta ^{\mathrm {NO}}\) and \(\eta ^{\mathrm {IO}}\) from Table 2 for Analysis \(\mathcal {A}\) (top) and Analysis \(\mathcal {B}\) (bottom). The red and blue distributions are obtained for the NO and IO hypotheses, respectively, while the black, solid vertical line shows the observed value in data, giving the p-values for the NO and IO hypotheses stated in the legends

To estimate the corresponding p-values, PTs are generated with the best-fit parameters \(\eta ^\mathrm {NO}\) and \(\eta ^\mathrm {IO}\) from Table 2; for each PT, both ordering hypotheses are fitted. The resulting distributions of \(\mathrm {TS} = 2\Delta \mathrm {LLH}_\mathrm {NO-IO}\) and \(\mathrm {TS} = \chi ^2_\mathrm {NO-IO}\) are shown in Fig. 10. The experimentally observed value is indicated by the solid, vertical black line, indicating the preference for Normal over Inverted Ordering in Analysis \(\mathcal {A}\) and Inverted over Normal Ordering in Analysis \(\mathcal {B}\).

The resulting p- and \(\mathrm {CL}_S\)-values for the main result are

$$\begin{aligned} p_\mathcal {A} ({\mathcal {H}}_\mathrm {NO})&= 71.1\% ~(\mathrm {CL}^\mathcal {A} _{S}({\mathcal {H}}_\mathrm {NO})=83.0\%), \end{aligned}$$
(7)
$$\begin{aligned} p_\mathcal {A} ({\mathcal {H}}_\mathrm {IO})\,&= 15.7\% ~(\mathrm {CL}^\mathcal {A} _S({\mathcal {H}}_\mathrm {IO})=53.3\%), \end{aligned}$$
(8)

while for the confirmatory result we find

$$\begin{aligned} p_\mathcal {B} ({\mathcal {H}}_\mathrm {NO})&= 11.4\% ~(\mathrm {CL}^\mathcal {B} _S({\mathcal {H}}_\mathrm {NO})=73.5\%), \end{aligned}$$
(9)
$$\begin{aligned} p_\mathcal {B} ({\mathcal {H}}_\mathrm {IO})\,&= 84.5\% ~(\mathrm {CL}^\mathcal {B} _S({\mathcal {H}}_\mathrm {IO})=95.4\%). \end{aligned}$$
(10)

In addition to testing the NMO with PTs, the likelihood is scanned across \(\sin ^2(\theta _{23})\) for the more sensitive Analysis \(\mathcal {A}\) and both ordering hypotheses. The resulting scan is shown in Fig. 11, where the LLH is shown with respect to its global minimum. The vertical offset between the NO and IO curves indicates the preference for NO over IO, which is visible at all values of \(\sin ^2(\theta _{23})\). The observed minimum is in the lower octant, near \(\sin ^2(\theta _{23})= 0.455\), for both orderings, while maximal mixing is separated from the best-fit point by only \(2\Delta \mathrm {LLH}_\mathrm {NO-IO} = 0.128\) (0.681) for NO (IO). As a result, the preference for the lower octant is small, such that a substantial range of \(\sin ^2(\theta _{23})>0.5\) is still compatible with the observed data for NO and IO.

Fig. 11
figure 11

The negative log-likelihood (LLH) as a function of \(\sin ^2(\theta _{23})\) for Analysis \(\mathcal {A}\), relative to the global minimum \(\mathrm {LLH}_\mathrm {min}\). The preference for NO over IO is visible over all the range of \(\sin ^2(\theta _{23})\) with the best-fit for both orderings being in the lower octant (\(\sin ^2(\theta _{23}) < 0.5\))

Note that the preference for NO over IO in Analysis \(\mathcal {A}\) already indicates an observed preference for matter effects in data (cf. Sect. 1), i.e. a preference for matter effects over vacuum oscillations. To quantify this preference, the fit is repeated assuming vacuum oscillations. The resulting log-likelihood difference between matter effects (MA) and vacuum oscillations (VA) is \(\Delta \mathrm {LLH}_\mathrm {MA-VA} = -0.869\) (\(-0.500\)) in case NO (IO) is assumed. Thus, matter effects are preferred over vacuum oscillations, independent of the assumption on the NMO. The p-values and \(\hbox {CL}_{S}\)-values that quantify the preference for matter effects (Mat) and vacuum oscillations (Vac) are

$$\begin{aligned} p({\mathcal {H}}_{\mathrm {Mat}}|{\mathcal {H}}_{\mathrm {NO}})= & {} 62.3\%, \quad \mathrm {CL}_{S}({\mathcal {H}}_{\mathrm {Mat}}|{\mathcal {H}}_{\mathrm {NO}}) = 71.0\%,\nonumber \\ \end{aligned}$$
(11)
$$\begin{aligned} p({\mathcal {H}}_{\mathrm {Vac}}|{\mathcal {H}}_{\mathrm {NO}})= & {} 12.3\%, \quad \mathrm {CL}_{S}({\mathcal {H}}_{\mathrm {Vac}}|{\mathcal {H}}_{\mathrm {NO}}) = 32.6\%,\nonumber \\\end{aligned}$$
(12)
$$\begin{aligned} p({\mathcal {H}}_{\mathrm {Mat}}|{\mathcal {H}}_{\mathrm {IO}})= & {} 53.2\%, \quad \mathrm {CL}_{S}({\mathcal {H}}_{\mathrm {Mat}}|{\mathcal {H}}_{\mathrm {IO}}) = 68.4\%,\nonumber \\\end{aligned}$$
(13)
$$\begin{aligned} p({\mathcal {H}}_{\mathrm {Vac}}|{\mathcal {H}}_{\mathrm {IO}})= & {} 22.2\%, \quad \mathrm {CL}_{S}({\mathcal {H}}_{\mathrm {Vac}}|{\mathcal {H}}_{\mathrm {IO}}) = 47.4\%.\nonumber \\ \end{aligned}$$
(14)

7 Conclusion

We have developed two independent likelihood analyses to demonstrate the extraction of the neutrino mass ordering from atmospheric neutrino data. We have applied these analyses to three years of IceCube DeepCore data. The first analysis aims for an optimized sensitivity with DeepCore, the second for an analysis chain as similar as possible to the proposed NMO analysis with PINGU  [16]. The sensitivities were estimated with two independent methods. For the more sensitive, main analysis, the sensitivity was found to be \(\sim 0.45-0.65\,\sigma \) (one-sided Gaussian), within the most interesting region close to maximum mixing (\(\sin ^2(\theta _{23}) \in \left[ 0.45, 0.55\right] \)) for both orderings, while for the confirmatory analysis, the sensitivity was found to be \(\sim 50\%\) smaller.

Due to the weak signature of the NMO in DeepCore, the sensitivity is found to be mostly unaffected by improvements in the understanding of systematic uncertainties. Instead, a future gain in sensitivity might come from additional statistics or potential improvements in the resolution of the event reconstruction.

The analyses presented here find the data to be fully compatible with both mass orderings. The main analysis observes a preference for NO over IO at \(2\Delta \mathrm {LLH}_\mathrm {NO-IO} =-0.738\), which corresponds to a p-value of \(15.3\%\) (\(\mathrm {CL}_\mathrm {s}=53.3\%\)) for the IO hypothesis, based on the presented frequentist method. This result is in line with recently reported preferences for the NO by Super-Kamiokande  [29], T2K  [27], NO\(\nu \)A  [28], MINOS  [64], and recent global best fits  [31, 32]. However, it complements these results due to the higher energy range used for determining the NMO (\(E_\nu \gtrsim 5\,\mathrm {GeV}\)) and the fact that it is independent of the value of \(\delta _\mathrm {CP}\). Finally, the data indicates a preference for matter effects over vacuum oscillations, independent of the assumption on the NMO.

The study presented here allows us to consider what future steps will allow a determination of the NMO with atmospheric neutrino data. Given the statistically-limited nature of this result, it is clear that a reduction of systematic uncertainties is not a priority, and we have performed studies to show that even the most optimistic reduction of systematic uncertainties can achieve at most a 10% improvement in the NMO sensitivity of this dataset [49]. The same study also showed that a removal of backgrounds (atmospheric muons and triggered noise) delivers at most a 5% improvement in sensitivity. In the coming years, a factor of four more statistics is expected from DeepCore (including both additional data and expected data-selection improvements), and this can result in a factor of two improvement in sensitivity. A more significant improvement that can be made is in the measurement resolutions: our studies [49] show that a 50% improvement in resolution on both neutrino direction and \(\log _{10}(E_{\nu })\) would produce a factor of two improvement in the sensitivity of this dataset. To achieve an NMO determination in a reasonable timescale, a final necessary improvement is a lowering of the neutrino energy threshold; this, along with the improved resolutions, can be achieved by the PINGU concept [16, 24] that reduces the energy threshold to below 10 GeV to enable a \(3\sigma \) determination of the NMO for even the least optimistic values of the oscillation parameters.

Besides the experimental result, the presented analyses provide a proof-of-concept for determining the NMO from matter effects in atmospheric neutrino oscillations with the IceCube Upgrade  [65] or PINGU  [16]. They test the full analysis chain by means of real DeepCore data and validate the understanding and treatment of systematic uncertainties, which are largely consistent with those that will be encountered by future IceCube extensions.