Mass hierarchy discrimination with atmospheric neutrinos in large volume ice/water Cherenkov detectors

Large mass ice/water Cherenkov experiments, optimized to detect low energy (1-20 GeV) atmospheric neutrinos, have the potential to discriminate between normal and inverted neutrino mass hierarchies. The sensitivity depends on several model and detector parameters, such as the neutrino flux profile and normalization, the Earth density profile, the oscillation parameter uncertainties, and the detector effective mass and resolution. A proper evaluation of the mass hierarchy discrimination power requires a robust statistical approach. In this work, the Toy Monte Carlo, based on an extended unbinned likelihood ratio test statistic, was used. The effect of each model and detector parameter, as well as the required detector exposure, was then studied. While uncertainties on the Earth density and atmospheric neutrino flux profiles were found to have a minor impact on the mass hierarchy discrimination, the flux normalization, as well as some of the oscillation parameter (\Delta m^2_{31}, \theta_{13}, \theta_{23}, and \delta_{CP}) uncertainties and correlations resulted critical. Finally, the minimum required detector exposure, the optimization of the low energy threshold, and the detector resolutions were also investigated.


Introduction
The field of neutrino physics has witnessed important theoretical and experimental progress in the past decade. A variety of experiments with solar, atmospheric, reactor and accelerator neutrinos spanning energies from the MeV up to tens of GeV have provided compelling evidence that the known flavour eigenstates (ν e , ν µ , ν τ ) mix, implying the existence of nonzero neutrino masses; see e.g. the review by Nakamura and Petcov in ref. [1].
In the standard 3ν scheme, the mixing matrix which relates the neutrino flavour eigenstates to the mass eigenstates (ν 1 , ν 2 , ν 3 ) can be parameterized in terms of 3 mixing angles θ 12 , θ 13 and θ 23 , and a Dirac CP-violation phase δ CP . Oscillation experiments are not sensitive to the absolute value of neutrino masses but they provide measurements of the squared-mass splittings ∆m 2 ij = m 2 i − m 2 j (i, j = 1, 2, 3). In this scheme, there are only two independent squared-mass differences: one (δm 2 ≃ 7.5 × 10 −5 eV 2 ) is associated to the mass splitting arising from solar (and reactor) observations, while the other (∆m 2 ≃ 2.5 × 10 −3 eV 2 ) is extracted from the atmospheric neutrino sector.
The values of these mass splittings, as well as those of the three mixing angles, are now extracted from global fits of available data with a reasonable precision [2][3][4]. Last but not least, the recent observation of ν e disappearance in several short-baseline reactor experiments [5][6][7] has provided the first high significance measurement of the mixing angle θ 13 which drives the ν µ − ν e transition amplitude. The relatively large value of this parameter, sin 2 (2θ 13 ) ≃ 0.1, is an asset for the subsequent searches for the remaining major unknowns of the model, in particular the possible presence of a CP-violating phase in the neutrino sector, and the neutrino mass hierarchy (NMH). The ordering of neutrino mass eigenstates is indeed not univocally determined so far; two solutions remain possible, usually dubbed as normal hierarchy (NH: m 1 < m 2 < m 3 , with ∆m 2 21 ≡ δm 2 and ∆m 2 32 ≃ ∆m 2 31 ≡ ∆m 2 ) and inverted hierarchy (IH: m 3 < m 1 < m 2 , with ∆m 2 21 ≡ δm 2 and ∆m 2 23 ≃ ∆m 2 13 ≡ ∆m 2 ). The question of NMH discrimination is on the agenda of most current and nextgeneration neutrino experiments in the GeV energy domain 1 . Its determination at high confidence level (5 σ or more), however, remains challenging and could take as long as 15 to 20 years even within optimistic estimates [12][13][14][15][16][17][18][19][20]. The reach of accelerator experiments is indeed strongly dependent on the still unknown value of δ CP [21], while atmospheric neutrino experiments suffer from the large uncertainties in the fluxes and from the relatively low statistics. The latter experiments have nonetheless the advantage of probing ν µ − ν e oscillations over a wide range of propagation distances (or baselines) and neutrino energies, accessing thereby the richness of matter effects that arise in the propagation of neutrinos through the Earth [22] (see also collective references in ref. [17] for example).
In this realm, it has been pointed out recently that large water/ice Cherenkov detectors, such as the ones currently used for neutrino astronomy, could also be competitive in providing a measurement of the NMH based on the study of GeV atmospheric neutrinos [23][24][25]. These instruments were primarily designed for the detection of extra-terrestrial neutrinos in the TeV to PeV energy range, but they also measure atmospheric neutrinos. They consist in 3-dimensional arrays of photomultipliers (PMTs) that detect the Cherenkov light originating from the products of the neutrino interaction in and around the instrumented volume. The arrival direction of the parent neutrino and its energy can be inferred from the timing, position and amplitude of the hits recorded by the PMTs. The directional accuracy is best for ν µ -induced muon tracks, which can be reconstructed even if they are not fully contained inside the detector volume. The detection of contained showers is also possible, with a relatively accurate estimation of the neutrino energy based on the measurement of the total amount of light deposited in the detector. The optimization of such instruments for a given neutrino energy range is therefore a trade-off between the total target volume and the density of photosensors. Although they do not have charge identification capabilities, these detectors could observe the imprint of the NMH in the pattern of events of a given flavor detected at different energies and baselines, provided their angular and energy accuracies are sufficient.
The largest Cherenkov neutrino telescopes currently operating are IceCube [26], with an instrumented volume of 1 km 3 of South Polar ice, and ANTARES [27], a smaller array located in the Mediterranean Sea off the coast of France, which is also a prototype for the future multi-km 3 neutrino Cherenkov detector to be built in the Mediterranean, KM3NeT [28,29]. While both detectors reach their full capabilities for neutrinos of energy of the TeV and beyond, significant effort has been made to improve their performances at lower energies as well. ANTARES has recently proven its capability to detect atmospheric muon neutrinos with energies down to about 20 GeV, providing the first measurement of neutrino oscillation parameters obtained with a large-volume neutrino telescope [30]. Ice-Cube has been complemented by a denser infill dubbed as DeepCore, allowing to lower its energy threshold down to about 10 GeV [31].
A study of neutrino hierarchy in the atmospheric sector, at energies in the range 1 -50 GeV, would however require a new generation of detectors with a different layout, and in particular a smaller but denser array of photosensors. Muons with energies of a few GeV indeed travel distances of the order of ten meters. For this reason the distance between the optical modules has to be of the order of a few meters, much less than in currently operating detectors. A denser array provides better energy and angular resolution; however, for the same total number of PMTs, the fiducial volume gets reduced. Therefore a compromise for an optimal spacing of the optical modules has to be found. Such optimization studies are underway for both detector sites.
At the South Pole, the PINGU project [32,33] studies the possibility to deploy an even denser array of about 20 lines of photosensors within DeepCore, to benefit from the vetoing capabilities of the surrounding detectors. For the Mediterranean site, the ORCA project [34] proposes to build a similar, dense array of about 50 lines as a first phase of construction of the KM3NeT detector.At this stage it is not clear yet which performances can be achieved in terms of effective volume, angular and energy reconstruction, and flavor identification, which are the main ingredients for the NMH discrimination. As no official numbers exist so far for PINGU or ORCA effective volumes, in this study the assumption made in ref. [24] for the effective mass M eff has been followed, namely: (1.1) Note that the dependence of M eff on the neutrino energy E ν could vary according to the actual performances of the detector, in particular at low energies. Throughout this paper, the effective mass normalization has been anchored at 40 GeV, meaning that all results will be presented in terms of effective exposure defined in units of megaton year (Mt × year) at 40 GeV. With this convention, the effective volume quoted for PINGU in ref. [24] corresponds to 34.1 Mt. The intrinsic detector capabilities are not the only factor that could spoil the measurement of NMH. Uncertainties in the incident atmospheric neutrino flux will affect the initial ν µ /ν e ratio, while those related to the neutrino oscillation parameters and the Earth density profile can further modify the flavor content of oscillated neutrinos that are observed at the detector level. Last but not least, at such low energies the kinematics of neutrino interaction also induces an intrinsic uncertainty on the incident neutrino direction, even for µ-like events. All these effects have to be carefully studied in order to evaluate their potential impact on the NMH measurement. In addition, an appropriate statistical method has to be used to properly estimate the confidence level with which one can hope to discriminate between NH and IH.
In this paper a method to evaluate the discrimination power of large Cherenkov neutrino detectors for NMH is presented, based on a Toy Monte Carlo (MC) approach and, as test statistic, an extended unbinned log-likelihood ratio. The details of the statistical method are presented in section 2, while section 3 describes the MC chain and its main ingredients. Details on the choice of the reference oscillation parameters can be found in section 4. Results in terms of NMH discrimination power as a function of the effective exposure of the detectors are given in section 5. To further illustrate the method, studies are conducted to quantify the impact of the uncertainties listed hereabove on the discrimination power of the experiments. Sections 6, 7 and 8 discuss the systematics related to the atmospheric neutrino flux, Earth density profile and oscillation parameters respectively. Preliminary hints on the potential impact of the detector energy and angular resolution are also provided in section 9. Conclusions are drawn in section 10.

Statistical method
The Toy Monte Carlo approach is a flexible tool to test the NMH discrimination power of future ice/water detectors. It allows to investigate the discrimination power dependence on oscillation parameter uncertainties, neutrino flux and Earth profile models, as well as detector exposure and systematic effects. The work scheme requires first to fix the true hypothesis under investigation, NH or IH, for a given set of parameters and models. On the basis of the true hypothesis, 1000 test experiments are generated, event-by-event, with the corresponding event statistics. Each test experiment is then compared with the model hypothesis, by evaluating the following extended unbinned likelihood: where µ j is the expected number of events with j = {NH, IH}, n the number of observed events, and pdf j (E i , θ i ) the probability of observing the i th event with energy E i and zenith angle θ i . The probability density function pdf j (E, θ) represents the model hypothesis, and it is produced with high-statistics Monte Carlo simulation (1000 times the expected statistics). The test statistic η used to evaluate the mass hierarchy discrimination power is the logarithm of the likelihood ratio between IH and NH: The η distribution is produced for each true hypothesis, NH and IH, each entry corresponding to a test experiment. In order to attenuate the statistical fluctuations, each distribution is then fitted with a Gaussian function. The Gaussianity of the so-produced distributions was demonstrated with dedicated high-statistics tests. Finally, the probability (p-value) to achieve the confidence level α (in this work equivalent  to 3 or 5 σ) in the hierarchy discrimination, was defined as the fraction of test experiments yielding a value of η satisfying: for either t = NH or t = IH, where N t (η) is the number of experiments corresponding to the true hypothesis t. For illustrative purposes, the η distributions obtained for a standard case are shown in figure 1. The blue (red) shaded region corresponds to the fraction of areas where NH (IH) is identified at more than 3 σ C.L.. On the contrary, there is no sensitivity to the NMH discrimination in the grey shaded region of overlap of the two distributions. Furthermore, it was assumed that the testing model is strongly affected by systematics (and hence the result unphysical), if the measurement is more than 5 σ away from any of the Gaussian mean values, which corresponds to the hatched blue vertical bands in figure 1.
In order to investigate the impact of the model parameters and their uncertainties, on the hierarchy discrimination, biases are introduced in the true hypothesis, maintaining unvaried the model hypothesis. The bias can lead to false positives, when one of the two hierarchies is wrongly recognized. An example is shown in figure 1, where the value of ∆m 2 31 is reduced by 1 σ in the true hypothesis. In order to quantify the fraction of false positives, η distributions are evaluated with respect to both the unbiased and biased hypotheses. As net effect, the biased hypothesis distributions are shifted with respect to the unbiased ones. Small variations in the Gaussian sigmas are also observed. The fraction of false positives is evaluated after subtracting the unphysical region. A general description of the adopted statistical method can be found in ref. [35]. Another statistical approach can be found in ref. [36].
In the next sections, p-value is quoted arbitrarily at the confidence level (3 or 5 σ) that best highlights its variations.

The Toy Monte Carlo approach
In order to evaluate the mass hierarchy sensitivity, a specific Monte Carlo simulation was set up. The full MC chain is made up of different steps that rely on the following ingredients: the neutrino fluxes, the oscillation probabilities, the Earth density profile and the neutrino cross sections. In addition, detector-specific information on the event reconstruction has to be implemented for a realistic sensitivity evaluation.
As far as the neutrino flux is concerned, several models are available. As a base option the Honda model [37] was used. A detailed discussion on the fluxes and a comparison between the different models can be found in section 6.
To compute the neutrino oscillation probability when propagating through the Earth, the GLoBES software tool [38,39] was used. Given a set of oscillation parameters, the code provides the oscillation probability either in vacuum or when traveling through matter. An option to calculate oscillation probabilities when neutrinos traverse the Earth is already available. The distance travelled through the Earth (baseline) is split into a number of given steps, each with a mean constant density computed using the Preliminary Reference Earth Model (PREM) [40,41], according to the distance from the Earth's core. In the simulation 1000 steps were fixed for each baseline. The baseline depends on the zenith angle θ of the incoming neutrinos, therefore oscillation probabilities were generated for fixed values of cos θ ranging from -1 to 0 with a step of 0.02.
To compute the number of expected events, the charged current cross sections for ν µ andν µ already provided in the GLoBES simulation toolkit have been used [42,43].  Two-dimensional matrices (energy versus cos θ) containing the number of expected ν µ +ν µ events were computed, both for NH and IH, using the selected atmospheric neutrino flux as a function of cos θ, the oscillation probability for several sets of oscillation parameters and assuming a fixed effective mass of 1 Mt. Note that no discrimination was assumed between neutrinos and antineutrinos and that the following components were considered: ν µ → ν µ , ν e → ν µ ,ν µ →ν µ andν e →ν µ . In addition, downgoing neutrinos were not taken into account in the MC.
To include the correct neutrino-muon kinematics, ν µ interactions were generated in an energy range from 1 GeV to 40 GeV (40000 events for each chosen energy in steps of 0.5 GeV) on a water target using the GENIE [44] simulation code. The zenith angle between the incoming neutrino and the outgoing muon in charged current interactions as well as its energy were computed. An example of the results for 10 GeV neutrinos is shown in figure 2. According to the obtained distributions the muon energy and angle were randomly extracted for each neutrino event.
Although several studies are ongoing aiming at the exploitation of some coarse hadronic energy reconstruction in the neutrino energy determination, the conservative assumption that only the muon energy can be reconstructed is made in this work. Hence, the analysis was done using muon energy (E µ ) instead of the neutrino one (E ν ). A threshold at 5 GeV was set in order to guarantee a reasonable energy and direction reconstruction. In addition, at lower energy the uncertainty on the cross sections increases up to 20% and this additional systematics, neglected in the present work, should be considered. The energy range used in this work is therefore 5 to 40 GeV. The matrices used in the MC Toy, to evaluate the NMH sensitivity, were obtained from those generated by GLoBES, applying the kinematical smearing and using the selected effective exposure. An example obtained using the central values for the oscillation parameters is shown in figure 3. Note that although those matrices are displayed in terms of cos θ for a direct comparison with previous works, the MC works directly in θ in order to treat correctly the angular smearing.
Based on the difference between the matrix generated in case of NH and IH, it is possible to identify the region where the effect is larger and therefore the discrimination more powerful. The asymmetry defined as: shown in figure 4, was chosen as figure of merit, where M NH and M IH are the number of expected events at a given angle and energy for NH and IH respectively. The region where the effect is more evident is between 5 and 10 GeV. Hence, the development of a detector with high energy resolution at low energies is mandatory.  Table 1: Central values and 1 σ uncertainty of the oscillation parameters used in this work.

Reference oscillation parameters
The oscillation parameter central values, quoted in table 1, and to δ CP =0, are assumed in this work as reference values, except when specified. The mixing angle θ 23 has been extensively explored in the first octant, which is preferred at 1.5 σ by the global fit of ref. [4] in normal hierarchy. The results obtained in this work were cross checked by testing θ 23 values in the second octant, without observing notable differences.
In the formalism adopted in this work, the best fit in normal hierarchy of the largest ∆m 2 is assigned to ∆m 2 31 (NH). The value of ∆m 2 31 (IH) then differs from ∆m 2 31 (NH) by: = 2∆m 2 21 (cos 2 θ 12 − cos δ CP sin θ 13 sin 2θ 12 tan θ 23 ) as pointed out in ref. [45,46]. The dependence of δm 2 31 on δ CP makes its value nonunivocally assigned. However, contrarily to the standard studies on the mass hierarchy discrimination with fit procedures, where ∆m 2 31 (IH) is a free parameter, in the Monte Carlo Toy approach ∆m 2 31 (IH) must be fixed. To overcome this problem, a dedicated study has been performed, varying δm 2 31 from 0.05 to 0.18 × 10 −3 eV 2 , and evaluating the p-value at 3 σ with 34 Mt × year at 40 GeV of effective exposure (larger exposure would not allow to observe any effect). As it can be seen in figure 5, a minimum was found in 0.13×10 −3 eV 2 (p-value ∼ 0.76), slightly shifted with respect to the best value of ∼0.11×10 −3 eV 2 from Ref. [47].
The value of δm 2 31 was conservatively fixed in this work to the so-obtained minimum. The effect induced by the ∆m 2 31 shift between NH and IH becomes negligible (p-value ∼1) from an effective exposure of ∼100 Mt × year at 3 σ.

Exposure
In the following, the impact of the model and parameter assumptions on the NMH discrimination power is evaluated relatively to a starting ideal condition. For this ideal case, perfect energy and angle resolutions, the reference conditions, mentioned in the previous section, and the energy-dependent mass profile of eq. 1.1, shown in figure 6, are assumed. At this stage, no biases in the true models parameters are introduced. Setting the p-value threshold at 0.5 at 5 σ C.L., the minimal required effective exposure is 60 Mt × year, as shown in figure 7. As already mentioned, there is no currently available detailed studies on ORCA/PINGU mass profiles. Other references, e.g. [33], quote different profiles with respect to eq. 1.1 [24]. To understand the impact of the mass profile, the optimal case with neutrino detection efficiency equal to 1, independently on the energy, was tested. The corresponding effective exposure profile and p-value are shown in figure 6 and 7, respectively. In this case, the required effective exposure, for a p-value equal to 0.5 at 5 σ C.L., is reduced by a factor ∼3. This result demonstrates the significant impact of the detection efficiency in the lower energy region, namely at 5-10 GeV, where the NMH asymmetry is large, as shown in figure 4, and the expected flux is high. Future Monte Carlo studies should focus on identifying the detector configuration able to optimize the efficiency at lower energies.
In order to better appreciate the impact of the model uncertainties and of the detector resolutions, discussed in the next sections, an effective exposure of 170 Mt × year, corresponding to a p-value ∼1 at 5 σ C.L. is hereafter assumed.  Figure 7: p-value for 3 σ (dashed lines) and 5 σ (solid lines) C.L. as a function of the effective exposure for an ideal detector, assuming energy dependent (red) and optimal (blue) mass profiles. The horizontal dashed line is a guideline to read the required exposures for a 50% chance to discriminate between the two hierarchies at the corresponding C.L..

Flux models
Available predictions on the atmospheric neutrino fluxes show differences both in the absolute normalization and in the energy and angular profiles. In this section, both the maximum acceptable uncertainty on the flux amplitude and the impact of the flux profile are evaluated. The reference model used in this work, Honda 1995 [37], was compared with other predictions: FLUKA 2002 [49] and Bartol 1995 [50]. A detailed description of the three models and of their differences can be found in ref. [51]. Recent model updates [52] did not significantly modify the respective flux profile and am-   9) dependences of the three fluxes agree in shape, above 5 GeV, at the ∼5% level. In terms of amplitude, whereas the Honda and Bartol models are in agreement at the level of a few percents, the FLUKA model differs from the others by more than 20%.
To quantify the impact of the flux uncertainties on the p-value, a bias in the true hypothesis was applied, as described in section 2: the Honda flux was assumed as the model hypothesis, whereas the FLUKA and the Bartol ones were used as biased true hypotheses.  The slight discrepancy between Honda and Bartol fluxes lead to a relatively small loss in the discrimination power, at the level of p-value of 0.851, at 5 σ C.L.. On the contrary, the disagreement between Honda and FLUKA was found more significant, lowering the correspondent p-value down to 0. The two cases are shown in figure 10 (dashed lines), where the biased test statistics are compared with the unbiased ones.
To demonstrate that the impact is mostly due to the differences in normalization, the extended unbinned likelihood of eq. 2.1 was substituted with the following non-extended one: where j = {NH, IH}, and i = 1...n is the event index.
Removing the extended component of the unbinned likelihood from eq. 2.1, the test statistic becomes independent on the expected number of events. In general, this approach has the consequence to weaken the discrimination power, by limiting the constraints to the energy and angular shapes only. In the unbiased case, where the Honda flux is assumed in both the model and true hypotheses, p-value reduces to 0.985. When testing Bartol versus Honda in the non-extended case, the discrimination power lowers down to a p-value equals to 0.502, while when testing FLUKA versus Honda, pvalue rises up from 0 in the extended case to 0.655. These results, shown in figure 10, demonstrate that the FLUKA profile is relatively more similar to the Honda profile, than to the Bartol one. The natural question arising from this study regards the convenience of using the extended with respect to the non-extended unbinned likelihood. The answer depends on the uncertainty that can be reached in the normalization factor. The impact of the flux amplitude uncertainty on the discrimination power can be attenuated by anchoring the fit to a region in the parameter space where the effect of mass hierarchy is minimized. An example, as shown in figure 4, is provided by the region corresponding to almost horizontal (cos θ ∼ 0) high-energy (E ν > 20 GeV) neutrinos. Assuming the Honda flux in both the model and true hypotheses, the p-value of the extended unbinned likelihood was computed by biasing the normalization factor in the true case from 0 to 10%, as shown in figure 11. The p-value is not sensitive to variations in the normalization up to 2%. Above this value, the p-value rapidly decreases to 50% for a bias of 7.5%, and reaches almost 0 with a normalization error of 10%. Assuming a minimum threshold of the p-value at 50%, the extended unbinned likelihood estimator is considered convenient, if the uncertainty on the flux normalization is below 7.5%. Above this value, the non-extended estimator offers a 10% density model ± Figure 12: Earth matter density profile as a function of the distance from the Earth center for the PREM model (blue), the modified flat density model (green), the model with shifted zone boundaries (red) and the model with densities modulated by ±10% (dashed black). more robust discrimination power.

Earth density profile
Uncertainties on the Earth density profile can play a significant role in neutrino matter oscillations. Recent studies (see ref. [25] and references therein) indicate that density values predicted by the PREM, averaged over 100 km, can be assumed to be known at the level of a few per cent at all depths in the core and mantle (the crust can be neglected, as it represents a very small fraction of matter traversed by neutrinos), while global variations of the density are constrained at the 10 −4 level. The Earth ellipticity also has an impact on the matter density profile along the neutrino propagation.
In order to understand the effects of the PREM-related parameters on the p-value, the following modifications were independently introduced: the density was varied by an overall factor of ±10%, the boundaries of each zone shifted by 50 km, and a flat density profile between the main discontinuities was assumed, as shown in figure 12. The introduced biases are larger than the known uncertainties or even unphysical, but needed to appreciate the PREM parameters impact, otherwise below the sensitivity of the statistical method adopted here.
Both the 50 km limit shift and the assumption of the flat density profile have almost no impact on the mass hierarchy discrimination power, with a p-value at 3 σ equals to 0.999. The p-value is almost unchanged (0.996) when reducing the overall density by 10%, and it is slightly affected (0.872) when density is increased by same factor. No impact was observed by varying the overall density factor by 1%, in agreement with the error found in literature [25,53].  In conclusion, uncertainties on the PREM parameters have a negligible impact on the mass hierarchy determination.

Neutrino oscillation parameters
A careful study aimed to identify critical oscillation parameters, and their correlation, is a necessary condition for future analyses.
The sensitivity study follows the scheme adopted in the previous sections: each oscillation parameter value is biased in the true hypothesis, by ±1 σ from the central value, as quoted in table 1, while keeping unaltered the model hypothesis.
As already explained in section 4, the shift between the absolute values of ∆m 2 31 (NH) and ∆m 2 31 (IH), labelled as δm 2 31 has a negligible impact with an effective exposure of 170 Mt× year. However, the p-value can decrease, biasing simultaneously δm 2 31 and other oscillation parameters. The expected main correlation between the shift and ∆m 2 31 (NH) was therefore tested: with an effective exposure of 34 Mt × year, the p-value has a maximum variation of ±8%, while with an exposure of 170 Mt × year, the p-value spread is at the per-mill level. As a consequence, in the following analysis δm 2 31 will be fixed. From now on, ∆m 2 31 (NH) will be referred as ∆m 2 31 .
Other parameters, with an expected low impact, are the solar ones. The correlation study between θ 12 and ∆m 2 12 shows an almost constant p-value, with a maximum spread of 0.1%, around 1, at 3 σ C.L.. Moreover, no degradation in the hierarchy discrimination was observed by studying each of the solar parameters in combination with ∆m 2 31 . As for the δm 2 31 case, solar parameters can be fixed to their central values, assuming a negligible contribution of their uncertainties to the discrimination power.
The p-value dependence on parameters, in the atmospheric and reactor sectors, has been studied by varying θ 13 , θ 23 and ∆m 2 31 in all their possible combinations. The p-values at 3 σ, as well as the fraction of false positives, are shown in figure 13. The strongest impact, as foreseen, is due to ∆m 2 31 . Clear correlations are observed between θ 13 and θ 23 , for instance by shifting both the central values by -1 σ. The effects induced biasing ∆m 2 31 are, in some cases, compensated by shifts in θ 13 and θ 23 . This is explained by the relative shifts of the biased η distributions with respect to the unbiased one, as shown in figure 1. In particular, biases on ∆m 2 31 shift the distribution in the opposite direction with respect to θ 13 and θ 23 . When ∆m 2 31 is fixed to its central value, the impact of θ 13 and θ 23 biases are relatively modest (the minimum obtained p-value is equal to 0.774). However, the fraction of false positives for ∆m 2 31 ± 1 σ strongly depends on θ 13 and θ 23 values, varying from a few permill to almost 50%. This clearly demonstrates the non-negligible dependence of the mass hierarchy discrimination power on ∆m 2 31 , θ 13 and θ 23 .
Finally, the dependence on δ CP was studied by varying its value in steps of 45 degrees, in combination with θ 13 , θ 23 and ∆m 2 31 . The results for the p-value at 3 σ C.L., are shown in figure 14. The dependence on δ CP is relatively weak and can be appreciated only for ∆m 2 31 ± 1 σ and for θ 13 − 1 σ, where the p-value varied by ∼16% at the most.
In conclusion, a strong dependence of the NMH discrimination power on θ 13 , θ 23 and ∆m 2 31 was observed, whereas a weak dependence was seen on δ CP . The effects induced by θ 12 , ∆m 2 12 , and by δm 2 31 are negligible. [deg]  Figure 16: Asymmetry (as defined by eq. 3.1) between the number of ν µ andν µ induced muon events expected in case of NH and IH, expressed as a function of the energy and the cosine of the zenith angle. A detector resolution of 10 degrees on the reconstructed zenith and an energy resolution of 30% are assumed.

Detector resolution and energy threshold
Finite detector resolutions introduce a further weakening of the discrimination power. To study their effects, both the true and the model hypotheses were modified the same way, according to the angular and energy resolutions. A smearing was introduced on the energy and the zenith angle for each generated event in the Monte Carlo simulations, using Gaussian distributions. Events falling outside the analysis range, with cos θ > 0 or muon energy < 5 or > 40 GeV, were discarded.
Since no Monte Carlo based prediction has been published so far with dedicated simulations on ORCA or PINGU detectors in the 5 to 40 GeV energy range, the following resolutions were arbitrarily tested: σ θ = 5, 10 and 15 degrees, and σ E /E = 15%, 30% and 50%. The so-obtained p-values are shown in figure 15 for a C.L. of 3 and 5 σ. These values must be compared with the ones already evaluated in ideal conditions (see section 5): 1 and 0.992 for 3 and 5 σ C.L., respectively.
In figure 16 the NH-IH asymmetry is shown for σ θ = 10 degrees and σ E /E = 30%. Even if the asymmetry has a pattern similar to the ideal case (see Fig. 4), the p-value lowers down to 0.698 at 3 σ C.L., and to 0.056 at 5 σ C.L.. The discrimination power can be partially recovered increasing the effective exposure.
Another way to increase the p-value, when reduced by detector resolutions, would be the reconstruction of low energy muons below the 5 GeV threshold. The p-values obtained when the analysis range is 1 to 40 GeV are shown in figure 15. A maximum increase of ∼28% and ∼34% on the p-value can be reached at 3 and 5 σ C.L. respectively. Nonetheless, in the low energy region between 1 and 5 GeV, degradations of the resolution are expected. In addition, as mentioned in section 3, systematics coming from the cross sections uncertainty should be considered below 5 GeV.
For the sake of completeness, the exposure needed to reach a p-value of 0.5 at 3 and 5 σ C.L. is shown in figure 17 as a function of the energy threshold, for an ideal detector. As expected, thresholds larger than 5 GeV lead to a rapid reduction of the sensitivity.

Conclusions
The oscillation pattern of atmospheric neutrinos in the 1-20 GeV energy range can be exploited by future large volume ice/water Cherenkov detectors to discriminate between NH and IH. In this work, a Toy Monte Carlo, based on an unbinned likelihood ratio test statistic, was developed to quantitatively investigate their discrimination potential. This approach allows to compute the probability of NMH discrimination, at a given C.L., while keeping track of possible misidentifications between hierarchies. For this purpose, a complete simulation chain, from neutrino generation to propagation and interaction, was implemented based on GLoBES and GENIE software tools.
The discrimination power was evaluated as a function of the exposure in the ideal case of perfect detector resolutions and no systematics. Assuming the same energy dependence of the detector effective mass as in ref. [24], the minimum required exposure for a 50% discrimination probability at 5 σ was found to be 60 Mt × year at 40 GeV. This number C.L., 0.5 p-value σ 3 Optimal: C.L., 0.5 p-value σ 5 C.L., 0.5 p-value σ 3 Figure 17: Exposure needed to reach a p-value of 0.5 at 3 σ (dashed lines) and 5 σ (solid lines) C.L., as a function of the muon energy threshold for an ideal detector, assuming energy dependent (red) and optimal (blue) mass profiles.
can be significantly reduced by improving the detection efficiency in the 5-10 GeV muon energy region. Another way to increase the discrimination power, although with a weaker impact, relies on the lowering of the detection energy threshold down to 1 GeV. However, achieving the needed resolutions might be challenging below 5 GeV. A critical improvement could come from the reconstruction of the associated hadronic shower, not considered in this work.
The discrimination power can also be affected by the systematics on the neutrino production, oscillation, and interaction models. The effects of the uncertainties of the cross sections were not investigated in this study, since they are well constrained above 5 GeV. Uncertainties on the Earth density profile were shown to have a negligible impact on the NMH determination. The effect of differences in the predictions for the energy and angular dependence of the neutrino flux, from the Honda, Bartol, and FLUKA models, was also demonstrated to be small. However, these models significantly differ in the overall flux normalization, which can result in a critical reduction of the discrimination power. This loss can be attenuated by anchoring the flux at high energies (> 20 GeV), where the dependence on the NMH is small, or by removing the dependence of the likelihood ratio test statistic on the flux normalization.
Finally, the impact of the oscillation parameter uncertainties and correlations on the discrimination power was carefully investigated. An important dependence of the NMH determination on the values of θ 13 , θ 23 and ∆m 2 31 , and a weak one on δ CP and on the shift between ∆m 2 31 (NH) and ∆m 2 31 (IH), were observed. The effects induced by θ 12 and ∆m 2 12 were shown to be negligible.
An extension of this work is foreseen in order to include background events in the simulations, and more realistic detector resolution models, as soon as they will be available.