Development of an analysis to probe the neutrino mass ordering with atmospheric neutrinos using three years of IceCube DeepCore data

The Neutrino Mass Ordering (NMO) remains one of the outstanding questions in the ﬁeld of neutrino physics. One strategy to measure the NMO is to observe matter effects in the oscillation pattern of atmospheric neutrinos above ∼ 1 GeV, as proposed for several next-generation neutrino experiments. Moreover, the existing IceCube DeepCore detector can already explore this type of measurement. We present the development and application of two independent analyses to search for the signature of the NMO with three years of DeepCore data. These analyses include a full treatment of systematic uncertainties and a statistically-rigorous method to determine the signiﬁcance for the NMO from a ﬁt to the data. Both analyses show that the dataset is fully compatible with both mass orderings. For the more sensitive analysis, we observe a preference for normal ordering with a p -value of p IO = 15 . 3% and CL s = 53 . 3% for the inverted ordering hypothesis, while the experimental results from both analyses are consistent within their uncertainties. Since the result is independent of the value of δ CP and obtained from energies E ν (cid:2) 5 GeV, it is complementary to recent results from long-baseline experiments. These analyses set the groundwork for the future of this measurement with more capable detectors, such as the IceCube Upgrade and the proposed PINGU detector.

Abstract The Neutrino Mass Ordering (NMO) remains one of the outstanding questions in the field of neutrino physics. One strategy to measure the NMO is to observe matter effects in the oscillation pattern of atmospheric neutrinos above ∼ 1 GeV, as proposed for several next-generation neutrino experiments. Moreover, the existing IceCube DeepCore detector can already explore this type of measurement. We present the development and application of two independent analyses to search for the signature of the NMO with three years of DeepCore data. These analyses include a full treatment of systematic uncertainties and a statistically-rigorous method to determine the significance for the NMO from a fit to the data. Both analyses show that the dataset is fully compatible with both mass orderings. For the more sensitive analysis, we observe a preference for normal ordering with a p-value of p IO = 15.3% and CL s = 53.3% for the inverted ordering hypothesis, while the experimental results from both analyses are consistent within their uncertainties. Since the result is independent of the value of δ CP and obtained from energies E ν 5 GeV, it is complementary to recent results from long-baseline experiments. These analyses set the groundwork for the future of this measurement with more capable detectors, such as the IceCube Upgrade and the proposed PINGU detector.

Introduction
The question of the Neutrino Mass Ordering (NMO) is one of the main drivers of the field of neutrino oscillation physics. The NMO describes the ordering of the three neutrino mass eigenstates m 1 , m 2 , and m 3  The three neutrino mass states do not correspond directly to the three neutrino flavor states ν e , ν μ , and ν τ . Instead, each mass state is a superposition of the flavour states, with the mixing described by the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix U [1][2][3], such that where α ∈ {e, μ, τ } labels the flavor states and i ∈ {1, 2, 3} labels the mass states. By convention, ν 1 is the state contain- * e-mail: analysis@icecube.wisc.edu a e-mail: justin.evans@manchester.ac.uk b Earthquake Research Institute, University of Tokyo, Bunkyo, Tokyo 113-0032, Japan ing the most electron flavor, and ν 3 is the state containing the least.
The mixing matrix U can be parameterized by a CPviolating phase δ CP and three mixing angles θ 12 , θ 13 , and θ 23 . In the case of Majorana neutrinos, two additional phases are included, which are of no relevance for this work. Since U is non-diagonal, flavor changes are observed depending on the energy and propagation distance of a neutrino, which are commonly known as neutrino oscillations. The oscillations are described by the mass splittings, mixing angles, and the CP-violating phase [4].
For propagation through dense matter, the neutrino oscillations are modulated by interactions with electrons, which give rise to matter effects [5] such as the so-called MSW effect and parametric enhancement [6][7][8][9][10]. Depending on the NMO, these modulations arise mainly in the neutrino (NO) or anti-neutrino channel (IO) [11]. In measurements of solar neutrino oscillations, they were used to determine the ordering of the neutrino states ν 1 and ν 2 by finding m 2 > m 1 . Moreover, these modulations can be observed for atmospheric neutrinos that undergo matter effects during their propagation through the Earth. In contrast to longbaseline accelerator experiments, the signature observed in IceCube is largely independent of the value of δ CP , which allows for a complementary measurement of the NMO at higher energies, using atmospheric neutrinos [12].
Atmospheric neutrinos are produced in the Earth's atmosphere by interactions of cosmic rays with the nucleons of the air, generating mesons. These mesons decay generating electron and muon (anti-)neutrinos, which propagate through the Earth and can eventually be detected by an underground neutrino detector, such as IceCube [13]. The baseline of propagation through Earth can be inferred by measuring the incoming zenith angle of the neutrino. The highest-energy oscillation maximum arises at E ν ∼ 25 GeV for vertically up-going neutrinos, moving to lower energy at shorter baselines towards the horizon. For energies above a few GeV, the oscillations are mostly driven by the parameters θ 23 and m 2 31 , which are therefore referred to as atmospheric oscillation parameters [4], while for vacuum only oscillations the value of θ 13 is too small for any detectable effect. Considering matter effects, however, the effective value of θ 13 under the right conditions can become sizeable, resulting in oscillation with electron flavors as shown in Fig. 2.
In atmospheric oscillations, the impact of the presence of matter arises mainly below E ν ∼ 15 GeV. The strength of these matter effects depends on the Earth's matter profile, which we take as given by the Preliminary Reference Earth Model (PREM), shown in Fig. 1 [14].
The oscillation probabilities for muon-flavored atmospheric neutrinos and anti-neutrinos to be found in the flavor state α ∈ {e, μ, τ } for a given zenith angle θ ν , and neutrino energy E ν , are shown in Fig. 2. They are calculated Fig. 1 Earth density profile, according to the Preliminary Reference Earth Model (PREM) and its approximation by 4-and 12-layers of constant density (commonly called PREM4 and PREM12, respectively) [14] with the PROB3++ [15] package and the PREM12 approximation (cf. Fig. 1), which are consistently used throughout this work. Due to the Earth's geometry and its core-mantle structure, the visible modulations of atmospheric neutrino oscillations feature a clear zenith-dependence.
Note that the oscillation patterns for neutrinos and antineutrinos flip between the two orderings. Thus, the NMO can be determined by finding the enhancement in transition probabilities from matter effects either in the neutrino channel (NO) or anti-neutrino channel (IO). For detectors insensitive to distinguishing neutrinos from anti-neutrinos on an eventby-event level, the NMO still leads to a visible net-effect in the amplitude of the observed matter effects, because the atmospheric fluxes and the cross sections for neutrinos and anti-neutrinos differ [16,17]. These differences mean that atmospheric neutrinos are measured at higher rates than the corresponding anti-neutrinos. Due to this rate difference, the strength of observed matter effects in a combined sample of neutrinos and anti-neutrinos is increased in case of NO and decreased in case of IO, which is the main signature targeted in this work.
The determination of the NMO has important implications for searches for neutrinoless double-β decay, where the entire mass region allowed in the case of IO is in reach of the next generation of experiments [18,19]. The NMO must also be determined as part of the search for CP-violation in the lepton sector, where the sensitivity to δ CP depends strongly on the ordering [20,21]. Therefore, a measurement of the NMO is targeted by several future long-baseline, reactor, and atmospheric neutrino experiments, such as DUNE [22], JUNO [23], PINGU [16,24], ORCA [25], and Hyper-Kamiokande [26]. Moreover, current neutrino experiments such as T2K [27], NOvA [28], and Super-Kamiokande [29] provide first indications of the NMO. Combining the results from several Oscillation probabilities for an atmospheric ν μ or ν μ upon reaching the IceCube detector, as a function of the cosine of the zenith angle, θ ν , and the energy, E ν , of the neutrino, for the NO (a) and the IO (b) hypotheses. The probabilities are shown for the neutrino appearing as each of the three possible flavors, with the neutrino and anti-neutrino cases shown as the top and bottom rows in each panel. The dominant mixing of ν μ and ν τ is clearly visible, while the ν e flavor is mostly decoupled, except for a small contribution from matter effects below E ν ∼ 15 GeV experiments, recent global fits prefer Normal over Inverted Ordering at ∼2 − 3.5 σ with a small preference for the upper octant (i.e. sin 2 (θ 23 ) > 0.5) [30][31][32][33].

The IceCube neutrino observatory
The IceCube Neutrino Observatory [13] is a ∼1 km 3 neutrino detector at the Geographic South Pole, optimized for detecting atmospheric and astrophysical neutrinos above E ν ∼ 100 GeV. It consists of 86 strings running through the ice vertically from the surface almost to the bedrock, carrying a total of 5160 Digital Optical Modules (DOMs) at depths between 1450 and 2450 m [34]. Each DOM houses a 10" photomultiplier tube and digitizing electronics, surrounded by a glass sphere [13,35,36].
In the center of the detector, some of these strings form a more densely instrumented volume called DeepCore [37]. It consists of 8 strings with an increased vertical density of DOMs with higher quantum-efficiency, surrounding one Ice- Table 1 Overview of the main differences between the two NMO analyses in terms of the total number of observed events, the selection strategy, the reconstruction likelihood, the reconstructed energy range, the number of analysis bins (given as number of E reco ν , ϑ reco ν , PID bins), the background (atmospheric muon) description, the template generation, and the estimated fractions of the data sample from each contribution Cube string. Due to the denser instrumentation and the higher quantum-efficiency DOMs, the DeepCore infill has a lower energy threshold than the surrounding IceCube array. The corresponding detection efficiency of DeepCore increases steeply between ∼ 3 GeV and ∼ 10 GeV and flattens for higher energies [13,37]. Neutrinos are detected by the Cherenkov emissions of their charged secondary particles, which are generated by Charged Current (CC) and Neutral Current (NC) interactions with the nucleons of the ice. In the case of CC muonneutrino interactions, a hadronic cascade is initiated at the primary vertex, combined with an outgoing muon. The muon can propagate large distances through the detector, leading to an elongated shape of the energy deposition and thus of the Cherenkov light emission. Such events are called track-like signatures. In contrast, CC electron-neutrino, NC, and the majority of CC tau-neutrino interactions, do not produce a muon that can travel large distances. Instead, they initiate an electromagnetic and/or hadronic cascade that develops over a distance of a few meters. The light emission of this cascade is considerably smeared around the Cherenkov angle of the shower direction. Such events are called cascade-like. At low energies below a few tens of GeV, the separation of trackand cascade-like events becomes increasingly difficult, due to the short muon track and the coarse detector granularity. For oscillation measurements with DeepCore, this separation of track-like and cascade-like events is used to partially distinguish neutrino flavors [37].
For the analyses presented here, we use the Honda 3D atmospheric neutrino simulation [38], and the GENIE neutrino interaction generator [39] with KNO [40] and PYTHIA [41]. For quasielastic and resonance events, the axial masses are set to M qe A = 0.99 GeV and M res A = 1.12 GeV, respectively. Simulation of the atmospheric muon background uses CORSIKA [42], with the Polygonato-Hörandel model of the muon energy spectrum [43]. Muons are propagated through the ice using PROPOSAL [44]; the propagation of all other particles is based on GEANT4 [45,46]. Cherenkov photons are propagated throught the ice using a GPU-based code [47]. More details of the simulation can be found in [48].

Data samples and reconstruction
In this work, two independent likelihood analyses are used to extract information about the NMO from DeepCore data. They are henceforth labelled Analysis A and B, and the main differences between the two analyses are summarized in Table 1. Analysis A is designed to optimize the sensitivity to the NMO with DeepCore and considered the main result of this work, while Analysis B is designed to resemble the proposed PINGU analysis from [24], using only events that are fully-contained in the DeepCore detector, and is used as a confirmatory result here. Further details about Analyses A and B can be found in [49] and [50], respectively. The use of two independent analyses with partially complementary data sets gives great confidence in the quantitative conclusions of the analysis presented here and the treatment and impact of the systematic uncertainties.
The analyses are based on DeepCore data taken between May 2012 and April 2014, comprising a total livetime of 1006 (1022) days for Analysis A (B). The difference in livetime arises from slightly different criteria on the stability of data acquisition. The data is run through two largely independent processing chains, where both samples are acquired by filtering the data in several successive steps of selection. These steps include the application of selection criteria on well-understood variables, as well as machine-learning methods, namely Boosted Decision Trees [51]. The selections are aiming for a reduction of the background of atmospheric muons and triggered noise, while maintaining a large fraction of well-reconstructed, low-energy neutrino events below E ν ∼ 100 GeV. Both samples are described in more detail in [48]. Compared to [48], the samples used in this work differ by the following modifications: First, events with a reconstructed vertex outside the detector that enter from below are not vetoed in Analysis A using the lower part of the DeepCore detector, as it is done for downgoing and horizontal events using the surrounding Ice-Cube detector. This increases the statistics at the expense of a reduced energy resolution for these uncontained events, especially at high energies. The loss in energy resolution is due to the unobserved fraction of deposited energy outside the Fig. 3 The distribution of the particle identification variable for Analysis A. The blue band on the data/MC ratio is the statistical uncertainty detector volume. Second, the range of reconstructed energies considered is extended for both analyses compared to [48], from 56 to 90 GeV (80 GeV) for Analysis A (B), allowing us to constrain nuisance parameters outside the strongest oscillation region. Third, both analyses use exclusively upgoing events (i.e. cos(θ reco ν ) < 0) to reduce the background from atmospheric muons.
The final samples are reconstructed with the same algorithm for Analyses A and B [48,49]. It is based on a likelihood function that links the number and the arrival times of the observed Cherenkov photons in all DOMs to a physics hypothesis. The physics hypothesis is given by the position and time of the interaction vertex, the neutrino direction, and the neutrino energy, which are the parameters of the likelihood optimization. The reconstruction is run separately for a starting track and a cascade-only hypothesis, where the starting track hypothesis features a cascade at the primary vertex with an additional parameter L for the length of an outgoing muon track. Since the track hypothesis allows for fitting the track length to L = 0, the 7dimensional cascade-only-hypothesis is nested within the 8-dimensional track-hypothesis. The log-likelihood difference between track and cascade-only hypothesis is used as the flavor-separating variable, called Particle Identification (PID). Besides the reconstructed neutrino zenith angle θ reco ν and neutrino energy E reco ν , the PID is used as a third observable entering the likelihood analyses described in Sect. 4. The distribution of the PID variable for Analysis A is shown in Fig. 3.
In the reconstruction, the optimized likelihood function differs between the two analyses: For Analysis B, the reconstruction likelihood is defined using the observed charge binned in time for each DOM as a proxy for the observed number of Cherenkov photons. Since some deviations were found between data and Monte Carlo in charge-related quantities, the likelihood was reformulated in a chargeindependent way for Analysis A, such that the charge amplitude information was removed and the only information used is whether a DOM is hit or not hit in multiple bins of time. In terms of the resolutions in reconstructed zenith angle θ reco ν and neutrino energy E ν , the impact of the likelihood reformulation was found to be small. Moreover for Analysis B, the impact of the charge mismatch is estimated to be small in comparison to the statistical uncertainty on the observed NMO.
After the data selection, the number of events in Sample A exceed the number of events in Sample B by a factor of 1.87, while providing similar resolutions in energy and zenith angle.
Note that for Analysis B, the atmospheric muon background is estimated from data in an off-signal region, while for Analysis A, it is obtained from Monte Carlo simulations (cf. Table 1). As a result, there is no a priori Monte Carlo prediction for the atmospheric muon contamination in Sample B. However, the fraction of atmospheric muons is fitted in the analysis as discussed in Sect. 6. The contamination of triggered noise was found to be only 0.1% for both samples. It is included into the likelihood fit for Analysis A, while it is neglected for Analysis B.
The final samples consist of CC muon neutrino, CC electron neutrino, CC tau neutrino, NC, and atmospheric muon events. These different components are called contributions in the following and are simulated separately in Monte Carlo except for the atmospheric muon contribution used in Analysis B that is parametrized from an off-signal data region.
The estimated fraction of the data samples from each contribution is shown in Table 1. These fractions are calculated using the best-fit values for all systematic parameters, discussed in Sect. 4.

Analyses
Both Analyses A and B use a binned likelihood method to determine the NMO by observing the signature from Fig. 2 in reconstructed variables. Since a separation of all flavors cannot be done with DeepCore, the PID is used to distinguish track-and cascade-like events, while neutrino energy and zenith angle are obtained from the reconstruction described in Sect. 3.
For both analyses, the binning is summarized in Table 1. For Analysis B only two PID bins are used to separate trackand cascade-like events, analogously to [16], while Analysis A uses three PID bins. This is motivated by the weak separation power at low energies, where the confidence in the separation can be taken into account by including an additional, intermediate PID bin. The binning in neutrino energy and zenith angle is chosen to be uniform in log 10 (E reco ν ) and cos(θ reco ν ) for Analysis B. For Analysis A, it is also uniform in cos(θ reco ν ), while it is optimized in log 10 (E reco ν ) to roughly follow the available statistics and maintain a large number of bins in the most interesting region at E ν ∼ 10 GeV.
In Analysis B, the binning is used to generate Monte Carlo distributions, called templates, in E reco ν , θ reco ν , and PID for each contribution to the data sample, using histograms. In contrast, Analysis A applies an adaptive Kernel Density Estimation (KDE) method to produce these templates, which smooths the fluctuations from limited Monte Carlo statistics. These uncertainties arise mainly from the atmospheric muon template, where the available Monte Carlo statistics are similar to those from experimental data, due to the time-intensive simulation of atmospheric muons.
The KDE method is analogous to the one used in [52] and based on [53]. However, the method from [53] is extended by reflecting the KDE at the boundaries of the binned parameter space and integrating the resulting distribution to obtain a prediction for the bin content [54]. For the atmospheric muons, this is illustrated in Fig. 4, where the Monte Carlo template for atmospheric muons is generated with histograms (top) and the above mentioned KDE method (bottom). In the case of histograms, the fluctuations in the bin content, arising from limited Monte Carlo statistics, are clearly visible.
The uncertainties on the KDE prediction are estimated using bootstrapping for every contribution from Sect. 3 separately [55]. For each contribution, which consists of N MC Top: the distribution in PID, zenith angle, and neutrino energy for Analysis A that enters the likelihood calculation; bottom: corresponding signature of the NMO, given as expected pull on the bin content in case IO is observed but NO is tested, using Poissonian statistics events, events are drawn randomly from this sample, replacing the event each time so that it can be drawn again, until N events have been drawn. This new sample of N events is called a bootstrapped sub-sample, and from this a new KDE template is generated. This process is repeated several times and the uncertainty on each bin content in the original KDE template is estimated from the resulting distribution of bin contents in the bootstrapped samples.
For Analysis A, the three-dimensional template obtained from the combination of all Monte Carlo contributions is shown in Fig. 5. Additionally, the expected pulls on each bin are shown in the case that the true ordering is inverted but the NO hypothesis is tested. This is used as a metric to visualize the signature of the NMO [16]. As can be seen in Fig. 5, the expected pulls between NO and IO are small, which already indicates the low sensitivity due to the limited resolution and statistics of DeepCore at energies E ν 15 GeV.
Using these distributions, likelihoods are defined for both analyses. For Analysis A, the negative log-likelihood LLH is given by where the term S is common to the likelihood of both analyses and will be defined after discussing the other terms. The term gives the probability of observing N A i events in bin i, if μ A i events are expected. It is obtained by a convolution of a Poissonian distribution and a narrow lognormal probability density function that describes the uncertainty σ A μ i on the Monte Carlo prediction μ i . The uncertainty σ A μ i is obtained from a quadratic combination of the individual template uncertainties for every contribution, obtained from bootstraping.
Due to the KDE method used in Analysis A, the dominant template uncertainties in the description of atmospheric muons are strongly reduced, such that the uncertainties on the total template are typically ∼ 10% of the Poissonian error expected from data fluctuations. Thus, for Analysis A these template uncertainties contribute only marginally to the following results.
For Analysis B, the likelihood is adapted from [56], where a χ 2 -value is calculated by quadratically combining the Poissonian error on the predicted bin content μ B i with the uncertainty σ B μ i on the combined template of all contributions. It is given by where the labels are analogous to Eq. (2). Here, the uncertainties σ B μ i on the templates are estimated from the statistical error due to limited Monte Carlo and an uncertainty on the atmospheric muon template, estimated from off-signal data.
The dominant systematic uncertainties are included in both likelihood fits using nuisance parameters. These nuisance parameters comprise uncertainties in the atmospheric neutrino flux, the atmospheric oscillation parameters, the neutrino-nucleon cross sections, and the detector response. All systematic parameters are allowed to vary simultaneously and independently in the fit; we assume there are no correlations between the pulls on the various parameters. The parameters are listed in Table 2. To account for external constraints on these systematic parameters, Gaussian priors are included into the likelihood by the term S, where the sum runs over all systematic parameters. For each parameter, the tested value s is compared to the expected baseline value s 0 with respect to its estimated uncertainty σ s . The baseline value s 0 and width σ s of each prior are identical for both analyses, and are stated in Table 2; the central value and the width are motivated by the provided references where possible. As indicated in Table 2, the prior for some parameters was removed in Analysis B. Due to the small sensitivity to the NMO, the prior assumption was found to imply a preference on the NMO in case the true parameter value differs from the baseline value, which is avoided by removing the corresponding priors from the likelihood. Thus, no exter-nal knowledge is included on these parameters, allowing for larger deviations from the baseline value. The parameters N ν , N ν e , N NC , and N μ are used to vary the normalizations of the different contributions from Table 1. Thus, they account for uncertainties in interaction cross sections, the total neutrino and muon fluxes, the ν e /ν μ production ratio, and detection efficiencies.
Additional uncertainties on the neutrino fluxes predicted in [38] are modelled by the parameters γ ν , σ zenith ν , and (ν/ν). Here, γ ν incorporates uncertainties in the neutrino energy spectrum, arising from flux, and cross section uncertainties, according to a reweighting of Monte Carlo events ∝ (E ν /GeV) γ ν , while σ zenith ν and (ν/ν) incorporate the dominant uncertainties from [58] in an ad hoc parametrization. The uncertainties on the production of atmospheric muons arising from the spectrum and compositions of the cosmic ray primary flux are represented by the parameter γ μ . Note that γ μ is only included as an uncertainty for Analysis A, since the atmospheric muon template in Analysis B is estimated from data.
Uncertainties in neutrino-nucleon interactions are represented by the parameters M res A and M qe A , which model the axial mass of resonant and quasi-elastic interactions. Note that uncertainties on the cross section for deep inelastic scattering were also parametrized, but found to be negligible and therefore not included into the likelihood fit.
Detector uncertainties are modelled by the parameters opt , lateral , and head−on , which describe the optical detection efficiency of the DOMs. The value of opt gives the total detection efficiency per photon, relative to the baseline scenario. In contrast, the parameters lateral and head−on describe the dependence of the photon detection efficiency on the inclination angle of the incoming photon. Here, lateral changes the slope of the acceptance curve, while head−on controls the acceptance of very vertically upgoing photons independently. Besides actual uncertainties in the DOMs' detection efficiency, these parameters incorporate uncertainties with respect to the optical properties of the ice in the refrozen drill holes that surround the DOMs.
All of the systematic parameters mentioned above are described in more detail in [48]. Besides the parameters included in the fit, additional uncertainties have been investigated and tested for their possible effect on the analysis [49,50]. These parameters are the normalizations of sub-dominant experimental backgrounds (detector noise and event pile-up from coincident atmospheric muons), additional uncertainties on the optical properties of the ice, the oscillation parameters (θ 12 , θ 13 , m 2 21 , and δ CP ), and Bjorken-x dependent uncertainties in the cross section for deep-inelastic neutrino-nucleon scattering. Two types of tests were performed to determine the impact of these parameters. In the first test, a parameter is injected into a MC fake dataset, shifted from its nominal value by ± 1σ in the case of Table 2 Systematics treated as nuisance parameters in the likelihood analysis, including normalization (N), detector response (D), oscillation (O), flux (F), and neutrino-nucleon interaction (I) uncertainties. These parameters are discussed in more detail in [48]. The table gives the baseline value and, if the parameter is used with a prior in the likelihood, the standard deviation of the Gaussian prior, as well as the experimental best-fit values for both analyses and ordering hypotheses Label Parameter allowed to vary freely (no prior) in both analyses a detector systematic and by ± 3σ in the case of an oscillation parameter. This MC fake data is fit using the same MC set, but with the parameter in question fixed to its unshifted nominal value to assess whether the uncertainty in the systematic parameter can bias the measured ordering hypothesis. This test is repeated for MC fake datasets generated with both mass orderings. For none of these systematic or oscillation parameters is a bias observed in the measured preference for the mass ordering of more than 0.05σ . For the case of δ CP , the value of δ CP = 180 • is being used as the 'nominal' value for the MC fake dataset, with the value of δ CP = 270 • injected into the MC used in the fit; a negligible (less than 0.01σ ) bias in the ordering preference is observed. In the second test, the same shifted values as above are injected into the MC fake dataset, but now the parameter under test is allowed to vary in the fit. This allows us to determine if the inclusion of these parameters into the fit causes any loss of sensitivity to the mass ordering. None of the parameters in question cause a loss of sensitivity of more than 0.05σ ; the inclusion of δ CP reduces the sensitivity by less than 0.03σ . Since all the parameters described in this paragraph are shown to have no impact on the mass ordering sensitivity, or the potential to cause a bias, they have been set to their nominal values (in the case of oscillation parameters, to the NuFit [32] best-fit values), and are not included in the final fit in order to minimise the computing time required for the multi-parameter minimisation.
For Analysis A (B), the negative log-likelihood from Eqs. (2) (3) is optimized. To do this, LLH ≡ −0.5χ 2 is used as the negative log-likelihood for Analysis B. During this optimization, the first and the second octant in θ 23 are fitted separately for both orderings, allowing all the parameters listed in Table 2 to vary, and the fit optimizing the LLH is taken as the best-fit for this ordering. The resulting difference, 2 LLH NO−IO ≡ χ 2 NO−IO , between the NO and IO hypotheses is then calculated for both analyses.
Finally, 2 LLH NO−IO (χ 2 NO−IO ) is used as a test-statistic (TS) in Sect. 6 for Analyses A (B) to derive the experimental result from the fit to the data.

Sensitivity to the neutrino mass ordering
The determination of the Neutrino Mass Ordering is a binary hypothesis test, which requires the test of two non-nested hypotheses. This is different from most other applications in particle physics, where a general hypothesis H G is tested against a specific one, H S , in the sense that the specific hypothesis is obtained for a certain realization of the parameters of H G . For such nested hypotheses, Wilks' Theorem is commonly used to derive sensitivities and to estimate limits on fitted parameters [59]. In contrast, Wilks' Theorem does not apply to the determination of the Neutrino Mass Order-ing, since the discrete choice of Normal or Inverted Ordering is not related to the fixing of degrees of freedom [60].
Due to the subtleties involved in the statistical treatment and since a determination of the NMO is expected within the next decade, the correct method to quantify the preference is object of many discussions [60][61][62]. Here, two methods are used to estimate the sensitivity, which are described in the following.
The first method is a statistically rigorous analysis of the resulting likelihood values, using the obtained value of 2 LLH NO−IO as a TS. It derives the resulting sensitivity, given by the expected confidence in the determination of the NMO, from a frequentist coverage test. To do this, the data is fit with both ordering hypotheses giving a value for the TS and two sets of best-fit systematic parameters, η NO and η IO . These fits are called fiducial fits (FD) in the following.
From these parameters, the resulting best-fit templates are generated for NO and IO. Then, these templates are used to generate Pseudo-Experiments or Pseudo-Trials (PT)s by adding Poissonian fluctuations on the bin-contents, as expected in a real-world experiment; in this analysis, which has a sensitivity dominated by the statistical uncertainty, it is unnecessary to fluctuate each PT according to the systematic uncertainties. Afterwards, each PT is fitted with both ordering hypotheses, resulting in a new value for the TS = χ 2 NO−IO = 2 LLH NO−IO . From these PTs, two distributions of the TS are obtained for the two sets of injected parameters η NO and η IO .
This process of creating PTs for η NO and η IO and fitting them with both hypotheses is repeated several times to estimate a TS distribution for each of the ordering hypotheses. The TS distributions for NO and IO are then used to estimate the analysis sensitivity, i.e. the expected p-values for the exclusion of each hypothesis. To do this, the fraction of PTs for NO (IO) that is to the right (left) of the median of the IO (NO) distribution is taken as the expected p-value for the exclusion of the NO (IO) hypothesis, if IO (NO) is the true ordering. This is sketched in Fig. 6 for two generic distributions.
The frequentist method is summarized as a flow-chart in Fig. 7. Note that this procedure is similar to the treatment of data, described in Sect. 6, where the experimental fit is used as fiducial fit to produce PTs. Unfortunately, the frequentist method is computationally very expensive. Thus, for performing more detailed parameter studies, a second, faster method is used.
The second method for deriving sensitivities is an Asimov approach adapted from [60]. Instead of generating PTs, the total MC template, with no Poissonian fluctuations, is fitted directly for both hypotheses. In the following, we refer to this MC template as the generated-ordering (GO) hypothesis, H GO , where the GO can be either NO or IO. This is then fitted under assumptions of both hypotheses, NO and IO, where the The resulting value of 2 LLH NO−IO is assumed to be representative for the behavior obtained using PTs. The sensitivity to the generated ordering, n GO σ , in terms of one-sided Gaussian standard deviations is where G O ∈ {IO, NO} is the opposite hypothesis to GO, generated with the best-fit set of systematic parameters η GO ∈ {η IO , η NO } corresponding to the set η GO ∈ {η NO , η IO } used for H GO . Note that the sensitivity n GO σ describes the expected p-value for the exclusion of the GO hypothesis in the case that the true ordering is the GO [60]. The choice of one-instead of two-sided Gaussian standard deviations is motivated by the fact that an experiment with no sensitivity to the NMO, i.e. if the two distributions for NO and IO in Fig. 6 were identical, would lead to a 50% chance of obtaining the correct ordering by random chance. This should not be misinterpreted as sensitivity and thus should give n NO, IO σ = 0, which is the case for one-sided but not two-sided Gaussians.
The resulting sensitivities for both methods are shown in Fig. 8, as a function of the true value of sin 2 (θ 23 ). The blue and red lines indicate the result from the Asimov method for Analysis A (solid lines) and Analysis B (dashed lines). The sensitivities are validated at certain values of sin 2 (θ 23 ) using the frequentist method, as indicated by the circle (A) and star (B) markers, where the uncertainties arise from the finite number of PTs.
As visible in Fig. 8, the resulting sensitivity is < 1σ for both orderings and analyses. Moreover, Analysis A is more sensitive to the NMO than Analysis B, which is due to the increased statistics, the additional bins in PID, energy and zenith, and the reduced impact from limited Monte Carlo statistics, due to the usage of KDEs in the generation of Monte Carlo templates.
Note that a characteristic shape is found for the sin 2 (θ 23 )dependence of n NO, IO σ , which is different for the NO and IO hypotheses. The observed features are similar to those found for the PINGU sensitivity in [16]. They arise from the interplay of the two independent octant fits for LLH GO and LLH GO , used to calculate the values of LLH NO−IO (H GO ) and LLH NO−IO (H GO ) in Eq. (5), where the preferred octant is not necessarily the true one in the case that GO is fitted. As a result, the behavior of n NO, IO σ changes each time the octant is flipped for one of the two negative log-likelihood differences ( LLH) in Eq. (5).
The observed sensitivities for the Asimov method agree roughly with the PTs, while perfect agreement is not expected due to several approximations used in the Asimov-method [60]. However, the Asimov method is used as an estimator for the true sensitivity.
Note that for some observed values in Fig. 6 the pvalues for both hypotheses can be small, in case the observed data agrees with neither the NO nor IO hypotheses. For example, this could be the case for LLH NO−IO > 2 or LLH NO−IO < −2, which is in the tail of both distributions in Fig. 6. In this case, the small p-value might lead to the wrong impression that the data clearly favors the alternative over the null hypothesis. To properly account for this, the p-values are combined into a CL S -value, where TO is the tested ordering and TO is the opposite ordering hypotheses. This equation is taken from [63] where a more detailed discussion of its derivation can be found. Its value is limited to CL S ∈ [0, 1], where CL S ≈ 1 indicates no preference for one hypothesis over the other and CL S ≈ 0 indicates a strong disfavoring of the given hypothesis. The CL S value can be interpreted as confidence in the result with a confidence level of 1 − CL S . More illustratively, the CL S value describes how much less likely the observed value would occur under the disfavored hypothesis, compared to the favored one. Finally, potential improvements of the sensitivity are tested for Analysis A. By fixing individual and combinations of systematic parameters in the Asimov fit, the absolute gain in sensitivity from an improved understanding of systematic uncertainties is found to be small, except for the oscillation parameters. This is due to the weak NMO signature, which barely pulls the systematic parameters and thus is only weakly affected by fixing them. Instead, it is found that the sensitivity could be improved in the future by additional data statistics and improvements on the event reconstruction, which reduce the smearing-out of the NMO signature due to the low resolution in energy, zenith, and PID at the lowest energies [49].

Results
For both analyses, the experimental data is fitted with the likelihood method, described in Sect. 4. The data, along with the best-fit predictions, are shown for Analysis B in Fig. 9. The resulting best-fit values for all systematic parameters are shown in Table 2. The observed pulls are within the expected ranges for all parameters, taking statistical fluctuations and the uncertainties of the true value of each parameter into account. The corresponding values of the metric for the NO (IO) hypothesis are 2LLH = 293. 38 (294.12) for Analysis A and χ 2 = 107.82 (107.50) for Analysis B. The metric is used as a goodness-of-fit estimator for the agreement of data and Monte Carlo by comparing these values to the expectation from PTs. The resulting p-values for Analyses A and B are p A gof = 43.5% and p B gof = 11.0%, indicating the data to be well-described by the MC templates.
For Analyses A and B, the observed values of the teststatistic are 2 LLH NO−IO = −0.738 and χ 2 NO−IO = 0.3196. Thus, the fits for the main result (A) and the confirmatory result (B) prefer NO and IO, respectively, while both results are compatible within their statistical uncertainties, i.e. both results have a test statistic within one unit of zero.
To estimate the corresponding p-values, PTs are generated with the best-fit parameters η NO and η IO from Table 2 while for the confirmatory result we find In addition to testing the NMO with PTs, the likelihood is scanned across sin 2 (θ 23 ) for the more sensitive Analysis A and both ordering hypotheses. The resulting scan is shown in Fig. 11, where the LLH is shown with respect to its global minimum. The vertical offset between the NO and IO curves indicates the preference for NO over IO, which is visible at all values of sin 2 (θ 23 ). The observed minimum is in the lower octant, near sin 2 (θ 23 ) = 0.455, for both orderings, while maximal mixing is separated from the best-fit point by only 2 LLH NO−IO = 0.128 (0.681) for NO (IO). As a result, the preference for the lower octant is small, such that a substantial range of sin 2 (θ 23 ) > 0.5 is still compatible with the observed data for NO and IO. Note that the preference for NO over IO in Analysis A already indicates an observed preference for matter effects in data (cf. Sect. 1), i.e. a preference for matter effects over vacuum oscillations. To quantify this preference, the fit is repeated assuming vacuum oscillations. The resulting loglikelihood difference between matter effects (MA) and vacuum oscillations (VA) is LLH MA−VA = −0.869 (−0.500) in case NO (IO) is assumed. Thus, matter effects are preferred over vacuum oscillations, independent of the assumption on the NMO. The p-values and CL S -values that quantify the Fig. 10 Distribution of the TS from PTs, generated with the best-fit systematic parameters η NO and η IO from Table 2 for Analysis A (top) and Analysis B (bottom). The red and blue distributions are obtained for the NO and IO hypotheses, respectively, while the black, solid vertical line shows the observed value in data, giving the p-values for the NO and IO hypotheses stated in the legends Fig. 11 The negative log-likelihood (LLH) as a function of sin 2 (θ 23 ) for Analysis A, relative to the global minimum LLH min . The preference for NO over IO is visible over all the range of sin 2 (θ 23 ) with the best-fit for both orderings being in the lower octant (sin 2 (θ 23 ) < 0.5) preference for matter effects (Mat) and vacuum oscillations (Vac) are p(H Mat |H NO ) = 62.3%, CL S (H Mat |H NO ) = 71.0%, (11) p(H Vac |H NO ) = 12.3%, CL S (H Vac |H NO ) = 32.6%, (12)

Conclusion
We have developed two independent likelihood analyses to demonstrate the extraction of the neutrino mass ordering from atmospheric neutrino data. We have applied these analyses to three years of IceCube DeepCore data. The first analysis aims for an optimized sensitivity with DeepCore, the second for an analysis chain as similar as possible to the proposed NMO analysis with PINGU [16]. The sensitivities were estimated with two independent methods. For the more sensitive, main analysis, the sensitivity was found to be ∼ 0.45 − 0.65 σ (one-sided Gaussian), within the most interesting region close to maximum mixing (sin 2 (θ 23 ) ∈ [0.45, 0.55]) for both orderings, while for the confirmatory analysis, the sensitivity was found to be ∼ 50% smaller.
Due to the weak signature of the NMO in DeepCore, the sensitivity is found to be mostly unaffected by improvements in the understanding of systematic uncertainties. Instead, a future gain in sensitivity might come from additional statistics or potential improvements in the resolution of the event reconstruction.
The analyses presented here find the data to be fully compatible with both mass orderings. The main analysis observes a preference for NO over IO at 2 LLH NO−IO = −0.738, which corresponds to a p-value of 15.3% (CL s = 53.3%) for the IO hypothesis, based on the presented frequentist method. This result is in line with recently reported preferences for the NO by Super-Kamiokande [29], T2K [27], NOνA [28], MINOS [64], and recent global best fits [31,32]. However, it complements these results due to the higher energy range used for determining the NMO (E ν 5 GeV) and the fact that it is independent of the value of δ CP . Finally, the data indicates a preference for matter effects over vacuum oscillations, independent of the assumption on the NMO.
The study presented here allows us to consider what future steps will allow a determination of the NMO with atmospheric neutrino data. Given the statistically-limited nature of this result, it is clear that a reduction of systematic uncertain-ties is not a priority, and we have performed studies to show that even the most optimistic reduction of systematic uncertainties can achieve at most a 10% improvement in the NMO sensitivity of this dataset [49]. The same study also showed that a removal of backgrounds (atmospheric muons and triggered noise) delivers at most a 5% improvement in sensitivity. In the coming years, a factor of four more statistics is expected from DeepCore (including both additional data and expected data-selection improvements), and this can result in a factor of two improvement in sensitivity. A more significant improvement that can be made is in the measurement resolutions: our studies [49] show that a 50% improvement in resolution on both neutrino direction and log 10 (E ν ) would produce a factor of two improvement in the sensitivity of this dataset. To achieve an NMO determination in a reasonable timescale, a final necessary improvement is a lowering of the neutrino energy threshold; this, along with the improved resolutions, can be achieved by the PINGU concept [16,24] that reduces the energy threshold to below 10 GeV to enable a 3σ determination of the NMO for even the least optimistic values of the oscillation parameters.
Besides the experimental result, the presented analyses provide a proof-of-concept for determining the NMO from matter effects in atmospheric neutrino oscillations with the IceCube Upgrade [65] or PINGU [16]. They test the full analysis chain by means of real DeepCore data and validate the understanding and treatment of systematic uncertainties, which are largely consistent with those that will be encountered by future IceCube extensions. Foundation (SNSF); UK -Science and Technology Facilities Council (STFC), part of UK Research and Innovation. The IceCube collaboration acknowledges the significant contributions to this manuscript from Martin Leuermann and Steven Wren.

Data Availability Statement
This manuscript has associated data in a data repository. [Authors' comment: Data are publicly released on a regular basis by IceCube at https://icecube.wisc.edu/science/data/access/. The data used in this publication will be made available at this URL.] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3 .