On the determination of the leptonic CP phase

The combination of data from long-baseline and reactor oscillation experiments leads to a preference of the leptonic CP phase $\delta_{\rm CP}$ in the range between $\pi$ and $2\pi$. We study the statistical significance of this hint by performing a Monte Carlo simulation of the relevant data. We find that the distribution of the standard test statistic used to derive confidence intervals for $\delta_{\rm CP}$ is highly non-Gaussian and depends on the unknown true values of $\theta_{23}$ and the neutrino mass ordering. Values of $\delta_{\rm CP}$ around $\pi/2$ are disfavored at between $2\sigma$ and $3\sigma$, depending on the unknown true values of $\theta_{23}$ and the mass ordering. Typically the standard $\chi^2$ approximation leads to over-coverage of the confidence intervals for $\delta_{\rm CP}$. For the 2-dimensional confidence region in the ($\delta_{\rm CP},\theta_{23}$) plane the usual $\chi^2$ approximation is better justified. The 2-dimensional region does not include the value $\delta_{\rm CP} = \pi/2$ up to the 86.3\% (89.2\%)~CL assuming a true normal (inverted) mass ordering. Furthermore, we study the sensitivity to $\delta_{\rm CP}$ and $\theta_{23}$ of an increased exposure of the T2K experiment, roughly a factor 12 larger than the current exposure and including also anti-neutrino data. Also in this case deviations from Gaussianity may be significant, especially if the mass ordering is unknown.

One of the ultimate goals of neutrino oscillation physics is to determine the complex phase δ CP . Values of δ CP different from zero and π imply CP violation in the lepton sector [20][21][22], see refs. [23,24] for reviews. Determining δ CP and possibly establishing CP violation with reasonable precision is a formidable task which most likely will require high-intensity neutrino beams beyond the current generation of experiments. Nevertheless, already with current experiments, some first hints on a preferred range of δ CP may be obtained at a modest confidence level, see for instance [25][26][27][28][29] for estimates. Indeed, currently available global data seem to indicate a slight preference for the range π < δ CP < 2π compared to 0 < δ CP < π [17][18][19]. This hint emerges mostly from the combination of the ν µ → ν e observation from the long-baseline experiments T2K [13] and MINOS [6] with the determination of the mixing angle θ 13 by reactor experiments DayaBay [12], RENO [10], and DoubleChooz [9].
Assessing the significance of this hint is a non-trivial task. Standard statistical tools usually employed in global fits of neutrino oscillation data are likely to fail for the determination of δ CP . The reason is that several conditions for Wilks theorem [30] are violated in this case: statistics is low, the sensitivity of the data to the parameter is rather poor, and predictions depend non-linearly on the parameter (via trigonometric functions). In the context of present data those issues have been commented on in ref. [17], based on preliminary simulations of the statistical properties of the used ∆χ 2 statistics. Similar considerations for future experiments can be found in refs. [31,32].
In this work we extend the results of ref. [17] and study in detail the distribution of the relevant test statistics by generating large samples of pseudo data and constructing confidence intervals or regions with the correct coverage following the Feldman-Cousins prescription [33]. We study the behavior as a function of the unknown true values of δ CP , θ 23 , and the neutrino mass ordering. In refs. [31,32] the sensitivity to CP violation has been studied, whereas in this work we concentrate on the related but different problem of constructing confidence intervals for δ CP . In addition to analyzing present data, we also investigate the behavior of the test statistics assuming an increased exposure of the T2K experiment, to be expected in the timescale of several years. We attempt to provide an explanation of our numerical results by considering the non-linear structure of the relevant oscillation probabilities including parameter degeneracies.
The outline of the paper is as follows. In section 2 we briefly describe the data from the T2K and MINOS experiments, introduce the relevant test statistics, and discuss the statistical analysis based on the Monte Carlo simulations. In section 3 we consider the relevant oscillation probabilities and provide a discussion about why deviations from Gaussianity can be expected. Our results analyzing present data are presented in section 4. We discuss the distributions of the 1-dimensional ∆χ 2 statistics for δ CP and θ 23 , finding large non-Gaussian behavior, especially for δ CP . Then we construct the 2-dimensional regions in the (δ CP , θ 23 ) plane and find them to be much closer to the Gaussian approximation. In section 5 we investigate how this situation will change, once more data become available. We study the sensitivity of T2K by increasing the exposure by roughly a factor 12 compared to the present one including also anti-neutrino data. We find even in that situation deviations from the Gaussian approximation remain significant in certain regions of the parameter space. We summarize and conclude in section 6. In the appendix we show the impact of first data on anti-neutrinos from T2K [34], which appeared after the completion of this work.

Description of data and statistical analysis
In this work we use the data from the long-baseline experiments T2K and MINOS, both from the appearance and disappearance channels, including also a small anti-neutrino data sample from MINOS, see Tab. 1 for details and references. 1 Our code departs from the re-analysis of the data developed in the context of the NuFit collaboration and used in ref. [17]. For each of the six data samples shown in Tab. 1 we perform a spectral fit, where the numbers of spectral bins are given in the table. Our predictions of the event spectra T r i (Θ) have been calibrated in order to reproduce the expected spectra provided by the collaborations. Here r runs over the six data samples, i labels the energy bins, and Θ collectively denotes the oscillation parameters. Each data set is described by a χ 2 statistics appropriate for Poisson  Table 1: Summary of used data. The last two columns give the number of bins used to fit the energy spectrum, and the total number of observed events, respectively. Recent T2K data on the ν µ → ν e channel [34] are not used in the main text, but we show some results including them in the appendix.
distributed data: where O r i is the observed number of events, and σ r sys is the systematic over-all normalization error included via the pull parameters a r . Since those data are largely statistics dominated, this simple treatment of systematic errors suffice. For each experiment our analysis is validated by checking that when analyzing the data in the same way we can reproduce the confidence regions in parameter space obtained by the experimental collaborations with good accuracy. When combining the data samples given in Tab. 1 we simply add the χ 2 functions, ignoring possible correlated systematic errors between the data sets. The results from the reactor experiments [9,10,12] are taken into account implicitly by fixing θ 13 = 8.5 • . Below we are going to focus on the parameters δ CP , θ 23 , and ∆m 2 32 , which currently have the largest uncertainties, including the sign of ∆m 2 32 . The other oscillation parameters are fixed to θ 12 = 33.5 • , θ 13 = 8.5 • , ∆m 2 21 = 7.5 × 10 −5 eV 2 . Those parameters are known within better than 15% at 3σ 2 and we expect that fixing those parameters has only a small impact on our results. Hence, in the notation used above we have Θ = {δ CP , θ 23 , ∆m 2 32 }. If we are interested in confidence regions of one of the parameters Θ = {δ CP , θ 23 , ∆m 2 32 }, irrespective of the others, we consider the following test statistic. Taking for example δ CP , we define ∆χ 2 (δ CP ) = min where χ 2 min is the global minimum of the χ 2 with respect to all parameters Θ. Note that when minimizing over ∆m 2 32 , we always take into account both signs, i.e. we minimize also over the two mass orderings. Similar definitions apply for the other 1-dimensional cases, ∆χ 2 (θ 23 ) and ∆χ 2 (∆m 2 32 ). The 2-dimensional confidence regions are based on an analogous definition, e.g., ∆χ 2 (δ CP , θ 23 ) = min This procedure is equivalent to the profile-likelihood method to treat nuisance parameters. Wilks theorem [30] implies that under certain conditions, the test statistics from eqs. (4) and (5) are distributed according to the χ 2 -distribution with 1 and 2 degrees of freedom (dof), respectively. This is the basis of the standard method to derive confidence regions for the parameters, using the condition ∆χ 2 ≤ t χ 2 (CL, dof). We refer to t χ 2 as "cut levels", and their values can be obtained by integrating the corresponding χ 2 distribution. For instance, the 1-dimensional intervals at 1σ, 2σ, 3σ are derived by ∆χ 2 ≤ 1, 4, 9, respectively. 3 We will refer to this situation as the "Gaussian limit" or the "χ 2 limit" in the following.
Wilks theorem applies if the theoretical predictions T i (Θ) span a linear space when Θ is varied. For instance, this is the case if T i (Θ) can be expanded to linear order: T i (Θ) ≈ A i + B i Θ. This is trivially fulfilled for a linear model, where this relation is exact. For non-linear models T i (Θ), the linear approximation will hold in the vicinity of the best fit point and will be reliable up to a certain CL, beyond which the non-linear character of the parameter dependence can lead to deviations from the Gaussian limit. For "powerful" data, which constrain the parameter efficiently, the linear approximation will hold up to a high CL, whereas for "weak" data with poor sensitivity to the parameter it will break down already at low CL. Deviations from Gaussianity are expected for example close to a physical boundary of a parameter, or when certain values of the predictions T i (Θ) cannot be reached due to the parameter dependence of the model, for instance via trigonometric functions, as we are going to see below.
In order to study deviations from the Gaussian limit we have performed Monte Carlo (MC) simulations and calculated the distributions of the test statistics numerically. For assumed true values of the parameters Θ true we generate artificial pseudo data by assuming a Poisson distribution for the observables with mean T r i (Θ true ). Those data are multiplied by a random Gaussian number with mean 1 and standard deviation σ r sys in order to take into account the systematic normalization uncertainty. In this approach the origin of the systematic uncertainty is related to some auxiliary measurements, determining for instance the fiducial volume of the detector or the beam normalization. The measured values of those experiments are used to determine the theoretical predictions. Hence, the normalization of the predictions is subject to statistical fluctuations of the auxiliary measurements. We set the unknown true value of the normalization constant to 1 and generate random realisations of this number with standard deviation σ r sys , which then enters the "observables" for the MC generated data. This implies that we consider the auxiliary measurements as part of the experiment, also hypothetically to be repeated many times. However, we have checked that our results do not depend on whether we randomize the systematic error or not.
Then the test statistics eqs. (4) or (5) are calculated at the point Θ true , e.g., ∆χ 2 MC[Θ] (δ CP ), where the subscript MC [Θ] indicates that the pseudo data from the Monte Carlo generated at the point Θ are used in the χ 2 . We perform this calculation 10 4 times for each point in the Θ true space, which provides us the true distribution of the test statistic for each parameter value. From those histograms we can obtain the true cut levels t MC (CL, Θ) for a given CL α, by demanding that a fraction α of all pseudo experiments fulfills Then the correct confidence intervals (or regions) for the parameters are obtained by those values of the parameters for which the test statistics of the real data fulfills ∆χ 2 (δ CP ) ≤ t MC (CL, Θ). Those intervals have the correct coverage by construction and follow the prescription of Feldman and Cousins [33].
Here we used the test statistic for δ CP as an example, but analogous expressions hold for the other 1-dimensional as well as 2-dimensional test statistics. Note that in the left side of eq. (6), ∆χ 2 is evaluated at δ CP corresponding to the same value as used to generate the pseudo data. However, although the test statistic for given data depends only on δ CP , the MC results do depend also on the other parameters θ 23 and ∆m 2 32 . Hence, the confidence intervals for δ CP may depend on the unknown true values of the other parameters, an effect we will indeed observe in our numerical studies presented in the next section.

Discussion of oscillation probabilities
In the case of interest, we are facing a complicated parameter dependence of the predictions. We review here the relevant oscillation probabilities through which the parameters enter the event rate predictions. Note that we do not use the approximate expression for the numerical work (which is based on numerical calculations of the full three-flavor probabilities including the matter effect) but they serve well for a qualitative understanding.
Let us define where L is the baseline, E is the neutrino energy, and V is the effective matter potential. For the ν µ disappearance channel we have where we show only the leading term and neglect corrections due to ∆ 21 as well as θ 13 . An approximate expression for the ν µ → ν e oscillation probability, valid for a constant matter density is given by [35,36]: The signs a and s describe the effects of CP-conjugation and the neutrino mass ordering, respectively, with a = +1 for neutrinos and a = −1 for anti-neutrinos, and s = sgn(∆m 2 31 ). The matter effect enters via the parameter A. Numerically one finds for a matter density of 3 g/cm 3 Hence, for the T2K experiment, with E 0.7 GeV, the matter effect is of order 6% and we can expand eq. (9) also in A, keeping only terms up to first order in A. To simplify the expression further we assume the first oscillation maximum, ∆ ≈ π/2, which is a good approximation for T2K. Introducing the definitions eq. (9) becomes for neutrinos and anti-neutrinos Note that the magnitude of d is constrained by the ν µ → ν µ disappearance channel, but we are left with the octant degeneracy for θ 23 , described by a sign ambiguity of d, with d < 0 (d > 0) corresponding to the first (second) octant for θ 23 . We are going to consider values in the range −0.1 ≤ d ≤ 0.1, within the currently 3σ confidence interval. Hence, the free parameters in the problem are the continuous parameter δ CP and the two signs of d and s, i.e. four discrete sign combinations. Note, however, that especially for current data this is an over-simplification, since the uncertainty on |d| is large. Numerically we have sin 2 θ 13 ≈ 0.022, C ≈ 0.013 (with a very weak dependence on d for |d| 0.1), and A ≈ 0.06. Hence, all terms in eqs. (12) and (13) are of similar order, and both the octant [37] and the mass ordering [38] degeneracies will lead to changes in the predictions of similar size as δ CP . Recent discussions of the δ CP and θ 23 interplay can be found in refs. [29,39,40].
Given the parameter dependencies from eqs. (8) and (12,13) we can expect deviations from the Gaussian limit for the following reasons.
1. Present data show only weak sensitivity to δ CP , i.e., the full range 0 ≤ δ CP < 2π is allowed at relatively low CL. This implies that the strong non-linearity of the trigonometric dependence in eq. (12) comes into play. In particular, because of the sine dependence, only a finite change of T i (δ CP ) can be achieved by varying δ CP , in contrast to the unbounded variation of a linear model. This means that δ CP effectively provides less than 1 dof, which implies that the true cut levels for a given CL will be lower than the ones corresponding to the Gaussian approximation. For future data with decreased statistical errors the fluctuations of the data may become of similar size as the compact region spanned in T i upon varying δ CP , which means that the coverage of T i is actually more efficient than for the linear model and δ CP thus provides more than 1 dof. If the exposure is further increased the fluctuations will become even smaller than the compact region in T i such that the linear expansion becomes valid and we approach 1 dof.
2. Disappearance data depend on sin 2 2θ 23 , which leads to the octant degeneracy for θ 23 . However, due to the sin 2 θ 23 factor in the appearance probability the degeneracy is not complete when ν µ → ν e data are included. The two possible solutions for sin 2 θ 23 corresponding to the two signs of d imply more freedom than a single linear parameter.
Hence the presence of the degeneracy leads to an increase of the effective dofs, which implies increased cut levels.
3. For values of θ 23 close to π/4 we are facing a physical boundary in the disappearance channel, since sin 2 2θ 23 ≤ 1, which implies that predictions corresponding to sin 2 2θ 23 > 1 cannot be reached by varying θ 23 . This leads to a decrease of the effective dof, and reduces the cut levels.
4. Similarly, for δ CP π/2 or 3π/2 the sin δ CP dependence in the appearance probabilities (12) and (13) imply a physical boundary due to | sin δ CP | ≤ 1. For those values of δ CP the derivatives of the probabilities with respect to δ CP vanish. Hence, we expect decreased cut levels for those values, while for δ CP 0 or π the dependence of the appearance channel resembles approximately a linear model. The behavior around δ CP π/2 or 3π/2 is further complicated by the octant and mass ordering degeneracies, see the discussion related to eq. (14) below.
In our numerical results presented below we will observe all of those effects, where some of them may occur simultaneously, leading to a complicated interplay of effects. Nevertheless, some general features can be understood qualitatively. For instance, considering the δ CP dependence of eqs. (12) and (13) plus the 4-fold degeneracy related to the signs of d and s, we find that there are minimal and maximal values for the oscillation probabilities (and hence for the event rates) given by the following combinations of the parameters: If the true values of d, s, δ CP correspond to one of the combinations in eqs. (14) then we are located at a physical boundary for the event rates: there is no point in the parameter space which can provide a larger (or smaller) value of the probability. Statistical fluctuations leading to even larger (or smaller) event rates than predicted for those extreme parameter values cannot be accommodated by adjusting the model. This implies that the effective number of dof of the ∆χ 2 is reduced, i.e. lower cut levels. A related discussion can also be found in ref. [32].
4 Results for present data 4.1 One-dimensional intervals for δ CP We start presenting the results of our simulations for the 1-dimensional ∆χ 2 distributions defined in eq. (4). In figs. 1 and 2 we consider ∆χ 2 (δ CP ) for the CP phase δ CP and show the cut levels t MC for 1σ, 2σ, 3σ. Fig. 1 uses T2K data only whereas fig. 2 uses all data given in Tab. 1, showing qualitatively similar results to the T2K-only case. We have checked that our T2K results are consistent with the ones shown by the T2K collaboration [15], in cases where comparison is possible. By comparing the curves for t MC to the χ 2 approximation t χ 2 indicated by the dashed curves, we observe significant deviations for the Gaussian limit, with t MC (2σ) and t MC (3σ) being much lower than the corresponding t χ 2 [17]. Furthermore we find large variations of the ∆χ 2 (δ CP ) distribution depending on δ CP itself, and on the assumed true values for Dashed lines indicate the Gaussian approximation t χ 2 . Left, middle, right panels correspond to sin 2 θ true 23 = 0.4, 0.5, 0.6, respectively. We take |∆m 2 32 true | = 2.4 × 10 −3 eV 2 and for the upper (lower) row we have assumed a true normal (inverted) mass ordering. The black solid curve shows ∆χ 2 (δ CP ) using the observed data (same curve in all panels). sin 2 θ 23 [17] and to a lesser extent also depending on the mass ordering. The various panels in figs. 1 and 2 correspond to different assumptions about θ true 23 and the true mass ordering. Note that those are the "true" values assumed for generating the pseudo data, while when fitting to the data we leave θ 23 and ∆m 2 32 free. Comparing figs. 1 and 2, we find that the addition of MINOS data makes some of the "dips" in the cut levels less sever, e.g. the ones for sin 2 θ 23 0.4 and δ π/2 for both orderings.
The behavior of the t MC curves can be understood from the discussion given in section 3. The reduction of t MC compared to the Gaussian limit follows from the poor sensitivity of the data to δ CP , which implies that the full range 0 ≤ δ CP < 2π becomes accessible. Hence the trigonometric dependence becomes relevant, changing δ CP provides less freedom than a linear parameter, and the effective number of dof becomes reduced.
The appearance of the bumps, for instance for normal ordering and (sin 2 θ true 23 = 0.4, δ CP 3π/2) and (sin 2 θ true 23 = 0.6, δ CP π/2) can be understood by considering the θ 23 octant degeneracy. We show in fig. 3 the sensitivity of T2K in the (sin 2 θ 23 , δ CP ) plane. We calculate so-called Asimov data, using the theoretical prediction for certain assumed true values without statistical fluctuations as "data". This indicates the expected sensitivity for that particular set of true values for an average experiment. Regions in fig. 3 are derived based on the Gaussian approximation for ∆χ 2 (δ CP , θ 23 ). By comparing those results with the upper row of panels in fig. 1 we find a correlation with the ability to resolve the degeneracy: in cases with improved sensitivity to the octant (panels (b) and (h) in fig. 3) the cut levels are low (upper left panel for δ CP π/2 and upper right panel for δ CP 3π/2 in fig. 1), whereas for cases where the degeneracy is strong for all values of δ CP (panels (d) and (f) in fig. 3) the cut levels are high (upper left panel for δ CP 3π/2 and upper right panel for δ CP π/2 in fig. 1). This shows that the presence of the degeneracy increases the effective number of dof for the ∆χ 2 distribution. We have confirmed a similar correspondence also for the inverted ordering. Note that the overall sensitivity to the octant is very poor even in the cases shown in panels (d) and (f) in fig. 3, corresponding to ∆χ 2 ≈ 0.5 for the wrong octant solution. However, for the distribution of ∆χ 2 (δ CP ) it is relevant that the degeneracy can be resolved for a significant range of δ CP values, while for the the cases corresponding to panels (d) and (f) in fig. 3 the degeneracy is present at below 1σ for all values of δ CP . Those results can also be understood from the discussion related to the maximal and minimal values of the oscillation probability given in the first 2 lines of eq. (14). Fig. 1 shows low cut levels for those combinations of parameters, following from the physical boundary of the event rates. 4 For the cut levels in figs. 1 and 2, we have always minimized with respect to both mass orderings. However, we have checked that assuming that the mass ordering was known (i.e., restricting to the true ordering in the fit) does not change the results qualitatively. This is different from the cut levels for increased exposure discussed in section 5, where we will find a significant effect of restricting the mass ordering.
The exact confidence intervals for δ CP can be obtained by comparing the ∆χ 2 (δ CP ) from the observed data (shown as black curves in figs. 1 and 2) to the curves for the cut levels. The confidence interval at a given CL is obtained by those values of δ CP for which ∆χ 2 (δ CP ) ≤ t MC (CL). We show the confidence intervals for δ CP at 1σ, 2σ, 3σ in table 2 and fig. 4 for different assumptions about the true values of θ 23 and the mass ordering, and we compare them to the Gaussian approximation. Because of the dependence of t MC on the values of the other parameters the confidence intervals for δ CP depend on those unknown true values. From fig. 4 we see that the 1σ intervals are relatively stable and agree well with the Gaussian approximation. The variations are relatively large for the 2σ interval. There is a large parameter dependence on the CL of rejection of δ CP π/2. Using combined T2K+MINOS data this rejection ranges from around 3σ for sin 2 θ 23 = 0.4 (both mass orderings) to only 2σ for sin 2 θ 23 = 0.6 (NO), see fig. 2. We obtain ∆χ 2 = 3.37 at δ CP = π/2. Hence, the Gaussian approximation gives a rejection of δ CP = π/2 at 1.8σ. The dependence of the MC results on probability, apparently in that case changing the mass ordering from NO to IH does not provide enough freedom to overcome the physical boundary.   the true values of other parameters is even more pronounced for T2K data only, see fig. 1. Note that the rejection of δ CP π/2 is stronger for T2K data-only and the significance decreases somewhat when MINOS data are included. The reason for the slight decrease of ∆χ 2 (δ CP ) when MINOS data are added to T2K is that MINOS appearance data prefer a somewhat smaller value of θ 13 than T2K and hence the combination with the reactor result for θ 13 becomes somewhat less effective in constraining δ CP when MINOS is included.
Within the frequentist framework there is no way to marginalize or average over the true values of nuisance parameters, since those are considered to be fixed constants of Nature. Hence the dependence of the results for δ CP on the unknown true values of other parameters (especially θ 23 ) introduces an unpleasant ambiguity. One possibility to deal with this situation could be to present for each CL the largest confidence interval for δ CP . Then the CL would be a lower bound on the true coverage of the interval, i.e., such an interval at the α CL will cover the true value with a probability of at least α. A problem with this approach is that one has to maximize the confidence interval of δ CP with respect to the true value of the Gaussian approximation t χ 2 . Top row: T2K disappearance only, middle row: T2K disappearance and appearance, bottom row: combined T2K and MINOS data. The columns correspond to δ true CP = 0, π/2, π, 3π/2 from left to right, and we always assume a true normal mass ordering with ∆m 2 32 = 2.4 × 10 −3 eV 2 . The black solid curve shows ∆χ 2 (θ 23 ) using the observed data (same curve in the 4 panels in each row). θ 23 , and within a frequentist framework it is not possible to decide which range for θ 23 has to be considered. 5 This problem is solved by considering 2-dimensional confidence regions in both parameters. Before presenting those in section 4.3 below, we proceed now by discussing the 1-dimensional intervals for θ 23 .

One-dimensional intervals for θ 23
The cut levels for ∆χ 2 (θ 23 ) obtained from our MC simulation are shown in fig. 5 for T2K disappearance data only (top row), for T2K disappearance and appearance data (middle 5 A similar discussion in the context of the mass ordering determination can be found in ref. [41]. row), and T2K and MINOS combined (bottom row). The columns of panels correspond to different values of the CP phase δ CP used to generate the pseudo data. We make the following observations: 1. For non-maximal values | sin 2 θ 23 − 0.5| 0.05, the test statistic ∆χ 2 (θ 23 ) is distributed approximately as a χ 2 with 1 dof, according to the Gaussian approximation.
2. For maximal values sin 2 θ 23 0.5 the cut levels are somewhat reduced compared to the Gaussian case. This is the manifestation of the physical boundary sin 2 2θ 23 ≤ 1, which reduces the freedom provided by the parameter θ 23 .
3. Disappearance data only (top row) is insensitive to the true value of the phase δ CP .
4. When appearance data are included we observe a slight increase of the cut levels compared to the Gaussian limit for certain combinations of θ 23 and δ true CP , namely for δ true CP = π/2, sin 2 θ 23 0.6 and for δ true CP = 3π/2, sin 2 θ 23 0.4. These are the same regions where we have noted also an increase in the cut levels for ∆χ 2 (δ CP ) in the previous section, corresponding to the cases of strong octant degeneracy, see panels (d) and (f) in fig. 3.

The behavior of the cut levels is basically unchanged by adding MINOS data to T2K.
However, the ∆χ 2 (sin 2 θ 23 ) from the observed data (black curves in the plots) is slightly disfavoring maximal mixing. While the confidence intervals for sin 2 θ 23 based on the MC from T2K happen to be very similar to the χ 2 approximation, some deviations from the Gaussian limit occur when MINOS data are added, since the observed ∆χ 2 is pushed somewhat into regions where t MC differs from t χ 2 . Fig. 5 corresponds to assuming a normal ordering for generating the MC data. Very similar results are obtained also for inverted ordering. We have also investigated the distribution of the 1-dimensional ∆χ 2 for ∆m 2 23 and we have found very good agreement with the Gaussian limit, independent of any other parameters for any combination of T2K and/or MINOS data. This reflects the very robust determination of ∆m 2 32 by the ν µ disappearance spectral data.

Two-dimensional confidence regions
As we have seen above, the distribution of the test statistic for δ CP depends significantly on the unknown true value of θ 23 . Hence, treating θ 23 as nuisance parameter leads to the unpleasant result that the confidence intervals for δ CP cannot be stated independently of the true value of θ 23 . One way to deal with such a situation in a frequentist framework is to consider two-dimensional confidence regions of both parameters, keeping in mind that the interpretation of the results is different.
We consider the test statistic ∆χ 2 (δ CP , θ 23 ) defined in eq. (5) and simulate the distribution of this statistic for a grid of true values in the (δ CP , θ 23 ) plane. Then in each point in this plane the observed value of ∆χ 2 (δ CP , θ 23 ) can be compared to the distribution from the MC to decide whether this point is included in the confidence region at a given CL. approximation, i.e. using the cut levels t χ 2 obtained from the χ 2 distribution for 2 dof. We observe that regions obtained from the MC are relatively close to the Gaussian limit. We conclude that the 2-dimensional test statistic has "better" statistical properties than the 1-dimensional one, where the θ 23 dependence is profiled out.
This result is further illustrated in fig. 7, which shows sections through the 2-dimensional distribution. Those results should not be confused with the ones shown in figs. 1 and 2, where the χ 2 is minimised with respect to θ 23 , whereas here we keep it fixed at the assumed true value (eq. (4) versus (5)). The MC curves in fig. 7 should be compared to the corresponding cut values t χ 2 for a χ 2 distribution with 2 dof, indicated by the dashed lines in the figure. We observe that the MC cut levels are close to the Gaussian limit. For sin 2 θ 23 0.5 we observe somewhat smaller t MC values compared to the Gaussian ones, due to the physical boundary sin 2 2θ 23 ≤ 1 (visible in the right panel in fig. 7). In all cases the variation of the MC cut levels with δ CP as well as with θ 23 is significantly reduced compared to the 1-dimensional case. We conclude that for present data, the Gaussian approximation to derive confidence regions is more reliable in the (δ CP , θ 23 ) plane, whereas 1-dimensional confidence intervals for the CP phase δ CP suffer from large deviations from Gaussianity.
For figs. 6 and 7 we have assumed a true normal mass ordering to generate the MC data. The corresponding plots for a true inverted ordering are very similar. We can use the 2dimensional confidence regions also to quantify the rejection of δ CP = π/2 by looking for the largest CL for which the (δ CP , sin 2 θ 23 ) confidence regions do not contain δ CP = π/2. In this way we find from the MC calculation that combined T2K and MINOS data allow to reject δ CP = π/2 at the 81.8% (83.9%) CL assuming a true normal (inverted) mass ordering. We δ CP 0 π/2 π 3π/2 2π have ∆χ 2 (δ CP = π/2) = 3.37, which in the Gaussian approximation for 2 dof corresponds to the 81.5% CL. We note that the MC results for NO and IO are very similar and close to the one obtained in the Gaussian limit. In contrast, the corresponding rejection confidence levels based on 1-dimensional confidence intervals for δ CP vary strongly with the true mass ordering and true θ 23 and differ significantly from the Gaussian limit (see discussion in section 4.1). Note that the interpretation of the rejection confidence levels based on 1-dimensional or 2-dimensional confidence regions is different.
In Fig. 8 we show the 2-dimensional Feldman-Cousins confidence regions in the plane of sin 2 θ 23 and ∆m 2 32 for the combined T2K and MINOS data. We observe that they agree quite well with the standard Gaussian approximation. For generating the MC data we have assumed here δ CP = 3π/2, but the results are very similar for other true values of δ CP .

Increased exposure and T2K anti-neutrino data
Next we are going to discuss how this situation will change in the near future, for increased exposure in T2K or when data on anti-neutrinos become available. Following the T2K collaboration [42] we consider an exposure of 7.8 × 10 21 protons-on-target (p.o.t.), approximately a factor 12 larger than the current exposure. We consider two cases, either using all of this exposure for neutrino data or equally sharing the exposure between neutrino and anti-neutrino running. We depart from the code used for our present-data T2K analysis (see section 2), and scale the spectrum normalization such that we can reproduce the expected number of events given in tables 4 and 5 in [42]. This is a rough approximation, especially for the anti-neutrino case, where we ignore the (rather substantial) contribution from the neutrino component in the beam. Despite those simplifications we can reproduce accurately the event spectra from fig. 2 in [42], as well as the sensitivity plots based on Asimov data given in ref. [42]. Note that total event numbers are still relatively small, 210 (260) events for neutrinos and 49 (35) events for antineutrinos for δ CP = 0 (3π/2), implying large statistical errors. Our approximate implementation suffices to study the expected statistical properties of the test statistics. A detailed and accurate sensitivity calculation is beyond the scope of this work.
In fig. 9 we show the cut levels for ∆χ 2 (δ CP ) for different assumptions about the true θ 23 and mass ordering, both for neutrino-only data and for combining neutrino and anti-neutrino data. We find that in both cases a significant dependence on the unknown true values of θ 23 and the mass ordering remains. In particular, the significance of excluding values of δ CP π/2 or 3π/2 will vary quite strongly. The locations of the dips in the cut levels follow the pattern discussed in section 3. The regions of low cut levels for neutrino data-only visible for NO, sin 2 θ 23 = 0.6, δ CP 3π/2 (upper right panel) and IO, sin 2 θ 23 = 0.4, δ CP π/2 (lower left panel) correspond to the regions of maximal and minimal oscillation probability indicated in the first two lines of eq. (14). Also the dips in the middle panels of fig. 9 (for sin 2 θ 23 = 0.5) follow this argument.
We also observe from the figure that in many cases using anti-neutrino data has only a small impact on the distribution of the test statistics. The only exceptions are sin 2 θ 23 = 0.4, NO (upper left) and sin 2 θ 23 = 0.6, IO (lower right). In those cases cut levels are raised close to the Gaussian limit for 0 ≤ δ CP ≤ π or π ≤ δ CP ≤ 2π, respectively. In the opposite regions of δ CP for those cases cut levels remain low also when anti-neutrino data are added. For those regions, there is an octant degenerate solution which basically destroys any sensitivity to δ CP , see panel (d) in fig. 3, while adding information from anti-neutrinos significantly helps in resolving this degeneracy which increases the sensitivity to δ CP . Those regions correspond to the minimal and maximal values for the anti-neutrino oscillation probability indicated in the last two lines of eq. (14), explaining the good sensitivity of anti-neutrino data for those cases.
In fig. 10 we show the same analysis as in fig. 9, except that we assume that the mass ordering is known (determined independently by some other experiment). Hence, we do not minimize with respect to the mass ordering, but restrict the fit to the assumed true ordering. We observe that the sign(∆m 2 32 ) degeneracy has a large impact on the distribution of the test statistics and cut levels become much closer to the Gaussian approximation. 6 This is different to the situation we find for current data discussed in section 4, where the mass ordering degeneracy has only a small impact on the distribution of the test statistics. For neutrino-only data, there are still a few cases of large deviations from Gaussianity, whereas the combined neutrino and anti-neutrino data lead to cut levels close to the χ 2 limit (see dashed curves in fig. 10). We still observe small dips for δ CP π/2 and 3π/2 (now pretty symmetric around δ CP = π), which can be traced back to the sin δ CP dependence of the probabilities, leading to a reduction of the dof for those values of δ CP . Note also that in some cases we find now cut levels which are even higher than the Gaussian ones. A possible explanation for this behavior can be found in the discussion in section 3, see item 1.

Discussion and conclusions
We have studied in detail the information we can obtain on the leptonic CP phase δ CP from current data, focusing on the robustness of frequentist confidence regions, by performing a Monte Carlo simulation of the data from the T2K and MINOS experiments. We attempt to quantify the current preference for δ CP 3π/2 over δ CP π/2. We have focused on the interplay of the main unknown parameters, namely δ CP , θ 23 , and the neutrino mass ordering. Our findings can be summarized as follows.
The distribution of the ∆χ 2 test statistic used for 1-dimensional confidence intervals for δ CP shows large deviations from the Gaussian limit. In particular, it strongly depends on the unknown true value of θ 23 . This introduces an ambiguity in the confidence intervals for δ CP , see fig. 4. While the 1σ interval for δ CP is relatively stable and close to the Gaussian approximation, at higher confidence level large variations occur. In particular, the CL with which values of δ CP π/2 are disfavored ranges from 2σ to 3σ, depending on θ 23 and the mass ordering. We can trace back the origin of those results to the complicated non-linear parameter dependence of the relevant oscillation probabilities (trigonometric dependence of δ CP and θ 23 -octant and mass ordering degeneracies), combined with the rather poor sensitivity of current data to δ CP .
We conclude that one should not use the Gaussian approximation when making statements about δ CP based on the 1-dimensional ∆χ 2 test statistic. Typically the "true" confidence levels obtained from the Monte Carlo simulation lead to more restrictive confidence intervals and to stronger rejections of values of δ CP around π/2. In this sense the use of the Gaussian approximation is conservative. We have shown that the Gaussian approximation is better justified for 2-dimensional confidence regions in the plane of δ CP and sin 2 θ 23 , see fig. 6. In particular, the dependence of the ∆χ 2 distribution on the true values of δ CP and θ 23 is much less severe in the 2-dimensional case. The 2-dimensional confidence region in the (δ CP , θ 23 ) plane for combined T2K and MINOS data does not include δ CP = π/2 up to the 81.8% (83.9%) CL assuming a true normal (inverted) mass ordering. 7 Those values are close to the CL of 81.5% obtained under the Gaussian approximation.
We have considered also the 1-dimensional confidence intervals for θ 23 and ∆m 2 32 . For θ 23 we find approximately Gaussian behavior, with some deviations around maximal mixing, see fig. 5. This is a manifestation of the boundary sin 2 2θ 23 ≤ 1, which implies that the derivative of the event rates predicted for the disappearance channel with respect to θ 23 is zero for θ 23 = 45 • . The test statistic for ∆m 2 32 has a distribution very close to the Gaussian limit, as well as the 2-dimensional confidence regions in the (sin 2 θ 23 , ∆m 2 32 ) plane. In section 5 we have studied the distribution of the 1-dimensional ∆χ 2 test statistic for δ CP assuming an increased exposure for T2K of 7.8×10 21 protons-on-target, roughly a factor 12 larger than current exposure, where we consider also the possibility of using half of this exposure for anti-neutrino running. We find that even in this case large deviations from the Gaussian behavior can be expected. Typically reduced cut levels for the ∆χ 2 are obtained around either δ CP π/2 or 3π/2, depending on the unknown true value of θ 23 and the mass ordering. Close to Gaussian behavior is only obtained for neutrino plus anti-neutrino running and assuming that the neutrino mass ordering is known.
Let us mention that in the global fit of all oscillation data also SuperKamiokande atmospheric neutrino data contribute to the determination of θ 23 and to a small extent also of δ CP , see ref. [17] for a discussion. Ideally a combined MC simulation of long-baseline and atmospheric neutrino data should be performed, which however, is not feasible due to the numerical complexity of the atmospheric neutrino fit. Since atmospheric neutrino data play only a subleading role for δ CP we expect that the results presented here would not be modified substantially by including atmospheric neutrinos.
For the investigation of near term future data in section 5 we have not considered the NOvA experiment [43,44], from which data will become available during the next years. In general we expect improved behaviour of the relevant test statistics, since complementary data from NOvA may help to resolve some of the degeneracies (see, e.g. refs. [28,29] for recent studies), which-as we have shown-play a crucial role for the deviations from Gaussianity. The same is true also for experiments aiming to determine the neutrino mass ordering, see ref. [41] for a discussion and references. An exhaustive investigation of the expected statistical properties of future data is beyond the scope of this work. Some discussion along those lines in the context of CP violation can be found in ref. [32]. It will be an interesting topic for future work to study sensitivities to δ CP of combined data from NOvA, T2K, and other upcoming experiments based on the true distributions of the relevant test statistics.
A Impact of first anti-neutrino data from T2K After completion and submission of this paper, the first anti-neutrino results from T2K were presented at the EPS HEP 2015 conference [34]. In this appendix we show the impact of those data on the determination of δ CP .
The results presented in [34] corresponds to about 4 × 10 20 p.o.t. in the anti-neutrino mode with 3 observed events in the appearance channel. 8 The expected background (NC and other) is 1.17 events, while the predicted signal from ν µ → ν e (ν µ → ν e ) induced events ranges from 2 to 4 (0.3 to 0.6) events, depending on δ CP and the mass ordering. We could reproduce the predicted number of events given in [34] with good accuracy, which allows us to include those data and combine it with the other data used in this work. We use a χ 2 as given in eq. (2) with just one bin (only the total number of events is fitted), taking into account the background expectation, as well as oscillated anti-neutrino and neutrino event predictions.
Clearly the statistical significance of those results is poor, since 3 observed events are even consistent with the background only hypothesis of about 1.17 events (no oscillation induced events at all) at slightly more than 1σ. Therefore we expect that those data will change ∆χ 2 by about 1 unit. Nevertheless, anti-neutrinos carry complementary information to the neutrino data and therefore it may be interesting to investigate the impact of those results. Fig. 11 shows the impact of the anti-neutrino data on the 1-dimensional confidence interval for δ CP . The solid curves, both for the cut levels as well as for ∆χ 2 (δ CP ) are without T2K anti-neutrinos and are identical to fig. 2, while the dashed curves include the information from the T2K anti-neutrino events. Comparing the black solid and dashed curve we find that ∆χ 2 for δ CP = π/2 is increased by about 0.75. However, also the cut levels are increased by a similar amount and hence the significance of rejecting δ CP = π/2 is hardly affected. For sin 2 θ 23 = 0.4 the effect is most pronounced and in those cases the significance actually decreases, despite the increased ∆χ 2 . ordering. The black curves show ∆χ 2 (δ CP ) using the observed data (same curves in all panels). Solid curves correspond to the data given in tab. 1 (no T2K anti-neutrinos, same as in fig. 2), dashed curves include T2K anti-neutrino data.
In fig. 12 we show the 2-dimensional confidence region in the (δ CP , θ 23 ) plane for T2K as well as T2K+MINOS data, including the T2K anti-neutrino events in both cases. This figure should be compared with fig. 6, which shows the corresponding regions without T2K antineutrinos. As expected the difference is small, with the size of the confidence region being slightly decreased due to the new data. Including T2K anti-neutrinos we find that combined T2K and MINOS data allow to reject δ CP = π/2 at the 86.3% (89.2%) CL assuming a true normal (inverted) mass ordering, to be compared with 81.8% (83.9%) CL without T2K anti-neutrinos. For ∆χ 2 (δ CP = π/2) we find now a value of 4.27, which in the Gaussian approximation for 2 dof corresponds to the 88.2% CL. for the fit we minimize with respect to ∆m 2 32 including its sign.