Reassessing the sensitivity to leptonic CP violation

We address the validity of the usual procedure to determine the sensitivity of neutrino oscillation experiments to CP violation. An explicit calibration of the test statistic is performed through Monte Carlo simulations for several experimental setups. We find that significant deviations from a $\chi^2$ distribution with one degree of freedom occur for experimental setups with low sensitivity to $\delta$. In particular, when the allowed region to which $\delta$ is constrained at a given confidence level is comparable to the whole allowed range, the cyclic nature of the variable manifests and the premises of Wilk's theorem are violated. This leads to values of the test statistic significantly lower than a $\chi^2$ distribution at that confidence level. On the other hand, for facilities which can place better constraints on $\delta$ the cyclic nature of the variable is hidden and, as the potential of the facility improves, the values of the test statistics first become slightly higher than and then approach asymptotically a $\chi^2$ distribution. The role of sign degeneracies is also discussed.


I. INTRODUCTION
Mixing in the lepton sector of the Standard Model is described by the unitary Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix [1][2][3][4][5]. In the standard three family scenario, it can be parametrized by three mixing angles (θ 12 , θ 23 , and θ 13 ), a Dirac CP violating phase (δ) and, if neutrinos are Majorana particles, two additional Majorana CP phases. Neutrino oscillations are sensitive to the values of all mixing angles and the CP violating phase δ, together with the two independent mass squared differences (∆m 2 21 and ∆m 2 31 , defined as ∆m 2 ij ≡ m 2 i − m 2 j ). Unlike for the CKM matrix, the mixing angles of the PMNS matrix have been experimentally found to be large, with θ 13 9 • being the smallest mixing angle [6][7][8][9][10] and the other two angles being much larger (θ 12 ∼ 33 • and θ 23 ∼ 45 • [11]). Thus, the Jarlskog invariant J = cos θ 13 sin 2θ 13 sin 2θ 12 sin 2θ 23 sin δ can be potentially as large as ∼ 0.035 for maximally CP violating values of δ, four orders of magnitude higher than the currently measured value for its counterpart in the quark sector: J CKM = (2.96 +0. 20 −0.16 )·10 −5 [12]. Since it has been shown that, within the context of Standard Model Electroweak Baryogenesis, J CKM is not large enough to account for the observed Baryon asymmetry of the Universe [13,14], the discovery of an additional source of CP violation (such as δ in the PMNS matrix) could open new possibilities for alternative generation mechanisms.
Given the current knowledge on the neutrino oscillation parameters, the focus for the next generation of neutrino oscillation experiments will be to determine the neutrino mass ordering, the CP violating phase δ, and the octant of the mixing angle θ 23 . For the determination of the mass ordering, there has been a lively discussion in the literature on whether or not the common way of assessing the sensitivity of a future experiment is applicable [15][16][17][18]. As shown in Ref. [17], the common approach does give a reasonable approximation of the median sensitivity, although not for the reasons typically used in the argumentation for it. The small correction was mainly based on the one-sided nature of the hypothesis test and only to a minor degree on the non-gaussianity of the statistical distributions. Similarly, deviations from a χ 2 distribution of the test statistic should be tested for in the search for δ. Indeed, one of the requirements for the validity of the common approach when making sensitivity analyses is that the change in number of events forms a linear space upon variations of δ. However, this requirement will necessarily be violated at some level since δ is periodic and a change by 2π will leave the number of events unchanged. In addition, there is no guarantee that the predicted data without statistical fluctuations, as used in the common approach, will be representative. In the present work, these assumptions are tested by explicit Monte Carlo simulations in order to find out exactly how much the sensitivity analyses are affected by these assumptions. To do so, we start from the basic frequentist definitions and apply the Feldman-Cousins approach [19] in order to determine the sensitivity of several experimental setups.

II. STATISTICAL APPROACH
The most common way of quantifying the experimental sensitivity to leptonic CP violation is to quote the confidence level at which the CP conserving values of δ (0 and π) can be rejected. In the literature, this is typically computed by constructing the test statistic where χ 2 = −2 log L, and L is the likelihood of observing the data given a particular set of oscillation parameters. It is then assumed that S is χ 2 -distributed with one degree of freedom, based on the implications of Wilks' theorem [20]. In addition, the Asimov data set 1 [21], i.e., the event rates without statistical fluctuations, is assumed to be representative for the experimental outcome and is thus used to determine the expected confidence level (CL) at which the CP conservation hypothesis would be rejected. However, as for the case of the neutrino mass ordering [15,17], it is not clear to what degree the assumptions underlying Wilks' theorem are violated when testing CP conservation in this fashion, resulting in a need to explicitly test this hypothesis. The procedure to test the CP conservation hypothesis at a given CL can be summarized as follows: First, the distribution of the test statistic S is found by simulating a large number of realizations of the experiments based on the predicted event rates under the assumption of CP conservation. The value of S is then computed for each realization, which provides the distribution of S. CP conservation will be rejected at CL x if the measured value of S is among the 1 − x fraction of largest values in the distribution. This automatically defines a critical value, S c (x), such that CP conservation is rejected at CL x if S > S c (x). By construction, S c (x) is the inverse of the cumulative distribution function (CDF) of S under the CP conserving hypothesis.
The above construction is only concerned with the test of CP conservation for a given data set, i.e., once the experiment has already taken data. The expected performance of future facilities will depend on the true value of δ. Thus, in a frequentist approach the performance of the facility must be quantified for each value of δ separately. In addition, due to statistical fluctuations, different realizations of a given facility will lead to a different significance at which CP conservation can be rejected. Therefore, the convention is to define the expected sensitivity of a given experiment as the CL obtained for the median of the distribution, and is typically shown as a function of the value of δ itself. This is usually referred to as the median sensitivity and it will not necessarily coincide with the significance computed with the Asimov data set.
A calibration of the χ 2 , following the procedure described above, was performed in Ref. [22]. The T2HK experiment [23] was used as an example, and the critical values were found to be significantly smaller than for the χ 2 distribution with 1 degree of freedom in the region sin 2 2θ 13 10 −2 . However, the study in Ref. [22] was restricted to values of sin 2 2θ 13 5 × 10 −2 , and was done using a different test statistic than the one considered in the present work.

A. External constraints
The common approach when dealing with external constraints on the parameters (such as previous determinations of oscillation parameters or systematic errors) is to include an additional term in the χ 2 of the form where ξ is the constrained parameter, χ 2 0 (ξ) is the χ 2 provided by the experiment itself for a fixed value of ξ, and σ ξ is the error in the determination of ξ which has a measured central value ofξ. In order to calibrate the χ 2 to do a proper hypothesis test, statistical fluctuations must also be considered for the experiment determining ξ. This can be approximated by considering the experiment to be measuring ξ with a Gaussian error of σ ξ . From this follows that, when calibrating the critical value S c for a fixed ξ, the values ofξ should be chosen according to a normal distribution with mean ξ and standard deviation σ ξ . 2 The test statistic is then again defined through Eq. (1) and the critical value is computed accordingly.
While the above is true for the calibration of the critical value for the test statistic S c , the known outcome of the external experiments should be taken into account when computing the median expected outcome by fixingξ to the measured value. This procedure will then reflect the probability of reaching a given confidence level based on the already known outcome of the external constraints. Note that, if external constraints from a hypothetical future experiment are implemented, the outcome of such an experiment is unknown and should therefore still be chosen according to the outcome distribution.
In our simulations, we have followed this prescription for the existing constraints on the neutrino oscillation parameters as well as for including constraints on the systematic errors. The calibration of the test statistic has been performed for the best fit values of the parameters, but is expected not to be crucially dependent on this.

III. SIMULATION DETAILS
We simulate the long-baseline experiments LBNE [24], T2HK [23], ESSνSB [25], and NOνA [26] using the GLoBES software [27,28] to find their respective Asimov data sets as functions of the parameter values. The simulation details for the LBNE, T2HK, ESSνSB, and NOνA experiments were implemented as in Refs. [17], [29], [25], and [17], respectively 3 . For the ESSνSB, we use the configuration with a 540 km baseline and 2.5 GeV protons. Once the Asimov data sets were computed, realizations of the experiments were constructed by applying Poisson statistics individually to the number of events in each bin based on the expected rates. This was implemented using the MonteCUBES software [30]. The value of S was then computed for each realization in order to find the distribution of S under 2 Note that this is not equivalent to picking ξ from a random distribution. The test statistic must still be separately calibrated for each simple hypothesis. 3 The only modification is that in the present work the beam power of the LBNE experiment has been increased to 1.2 MW, and the detector mass has been fixed to 34 kt. the CP conserving and CP violating hypotheses. In all cases, 5 % (10 %) systematic errors were used for the signal (background) event rates. These are bin-to-bin correlated, but uncorrelated between signal and backgrounds and between different oscillation channels. The true values of the oscillation parameters have been set according to the best fits in Ref. [11], with the exception of θ 23 which has been set to 45 • . Marginalization is performed on θ 12 , sin 2 2θ 13 , sin 2 2θ 23 , ∆m 2 21 and ∆m 2 31 , using gaussian priors in agreement with present experimental uncertainties from Ref. [11]. Sign degeneracies are fully taken into accound in the minimizations.
In order to compute the critical values of S, we simulate 10 5 realizations for δ = 0 and π and for both neutrino mass orderings. This is sufficient to reliably find the value of S c up to ∼ 3.5σ level, corresponding to a CL of ∼ 99.95 %. On the other hand, for the computation of the expected outcomes we are only interested in obtaining the value of S for the median of the distributions. Since this is much less sensitive to statistical fluctuations than the sampling of the tails, only 10 3 realizations of the experiments are simulated in order to obtain the median of S for these values.
IV. RESULTS Figure 1 shows our results for the distribution of the test statistic S for the experimental setups considered in this work. As can be seen from the figure, the CDFs are generally close to a χ 2 distribution with one degree of freedom for almost all experiments under consideration. The only exception is NOνA, for which large deviations are observed and the critical values corresponding to a given CL are considerably reduced in comparison to the values obtained under the assumption of a χ 2 -distributed test statistic. On the other hand, for the other experiments the CDFs fall slightly slower than a χ 2 distribution and therefore somewhat larger values are obtained. This outcome agrees with the naive expectation that the largest deviations from Wilks' theorem should be more relevant in the experiments with the poorest sensitivity to the value of δ. Indeed, one of the requirements for the applicability of Wilks' theorem is that, when varying the parameter that is being tested for, in this case δ, the subsequent change in the observables used to determine it should constitute a linear space. This requirement will obviously be violated at some level for δ since a change by 2π will render no change in the number of events. However, next generation facilities will be able to constrain very precisely the value of δ, and within a small range of values of δ the linearity condition for the number of events is expected to be approximately satisfied. Since the precision in the determination of δ by NOνA will be poorer than for any of the next generation facilities considered here, the larger deviation from the χ 2 distribution observed for NOνA could be attributed to a stronger violation of this requirement.
The other relatively significant deviation from a χ 2 distribution can be seen for T2HK. We have checked that this extra deviation with respect to the better agreement showed by LBNE and the ESS can be attributed to the sign degeneracies, that play a very important role in the determination of δ from T2HK. A parametric degeneracy will again lead to situations in which the change in the number of events does not span a linear space, implying the non-applicability of Wilks' theorem. When the mass hierarchy is assumed to be known, the distribution obtained for T2HK lies on top of those obtained for LBNE and the ESS as expected.
A priori, a variation on the critical values S c with respect to the ones obtained for a χ 2 distribution does not necessarily imply a change in the sensitivity of an experiment. This will depend on whether the Asimov data can be taken as a good approximation of the median result for values of δ / ∈ {0, π}. In Fig. 2, the median value of the distribution of S is shown as a function of δ for all experiments under consideration, and the results are compared to the values obtained from the Asimov data. We find that the Asimov data set is very close to the median value of S for all future experiments, while significant deviations are found for NOνA. It thus follows that the Asimov data may be taken as a good approximation of the median test statistic when considering the more sensitive experiments, while for NOνA Monte Carlo simulation would be advisable.
Finally, in Fig. 3 we compare the computed sensitivity to CP violation using a full analysis based on Monte Carlo simulation with the results obtained in the common approach (i.e., assuming the cut values of a χ 2 distribution with one degree of freedom and the Asimov data set). From this figure, a reasonable agreement in the sensitivity can be observed with respect to the results obtained by simply taking the √ S for the Asimov data. The only exception is for NOνA in the region around δ ∼ 90 • where, even if the critical value S c from Fig. 1 is significantly lower than for a χ 2 distribution, the Asimov data set considerably overestimates the median result from the Monte Carlo (see Fig. 2). We also note that the Monte Carlo results do not go to zero around the CP conserving values of δ. Instead the median sensitivity is around 0.67σ, corresponding to a confidence level of 50 %. This should be expected as the median confidence level obtained if CP is conserved should be 50 %. The Asimov data cannot reflect this since any fluctuations around it will increase the value of the test statistic. It is therefore not a good approximation of the median in a region close  to the CP conserving values of δ.

V. SUMMARY AND OUTLOOK
In this letter, we have studied the validity of the common approach used to compute the sensitivity of future long baseline experiments to leptonic CP violation. By explicit Monte Carlo simulation we have found that the test statistic (defined in Eq. (1)) is close to being χ 2 -distributed for almost all the experiments under consideration. We have also found that the median value of the distribution is well approximated by the Asimov data set in most cases. This results in a sensitivity which is very similar, although slightly worse, than what is typically quoted in the literature.
An exception to the above rule is the NOνA experiment. While we find that the test statistic is still not χ 2 -distributed and is peaked towards significantly smaller values, we also find that the Asimov data is not representative for the distribution of the test statistic. Thus, when the sensitivity was computed using the median values from the Monte Carlo simulations instead of the values coming from the Asimov data, the results were found to be similar to the ones obtained in the common approach.