Bayesian global analysis of neutrino oscillation data

We perform a Bayesian analysis of current neutrino oscillation data. When estimating the oscillation parameters we find that the results generally agree with those of the $\chi^2$ method, with some differences involving $s_{23}^2$ and CP-violating effects. We discuss the additional subtleties caused by the circular nature of the CP-violating phase, and how it is possible to obtain correlation coefficients with $s_{23}^2$. When performing model comparison, we find that there is no significant evidence for any mass ordering, any octant of $s_{23}^2$ or a deviation from maximal mixing, nor the presence of CP-violation.


Introduction
Neutrino oscillation experiments have now established beyond doubt that neutrinos are massive and there is leptonic flavour violation in their propagation [1,2], see Ref. [3] for an overview. It has also been clear for more than a decade that a consistent description of the global data on neutrino oscillations is possible by assuming that the three known neutrinos (ν e , ν µ , ν τ ) are linear quantum superposition of three massive states ν i (i = 1, 2, 3) with masses m i . Consequently, a leptonic mixing matrix is present in the weak charged current interactions [4,5] of the mass eigenstates, which can be parametrized as [6]: where c ij ≡ cos θ ij and s ij ≡ sin θ ij . If one chooses the convention where the angles θ ij are taken to lie in the first quadrant, θ ij ∈ [0, π/2], and the CP phase δ CP ∈ [0, 2π], then ∆m 2 21 = m 2 2 −m 2 1 > 0 by convention, and ∆m 2 31 can be positive or negative. It is customary to refer to the first option as Normal Ordering (NO), and to the second one as Inverted Ordering (IO). In the following we adopt the (arbitrary) convention of reporting results for ∆m 2 31 for NO and ∆m 2 32 for IO, i.e., we always use the one which has the larger absolute value. Sometimes we will generically denote such quantity as ∆m 2 3 , with = 1 for NO and = 2 for IO.
Several global analyses exist in the literature [7][8][9], which, by fitting the results from the bulk of oscillation experiments, obtain best estimates and allowed ranges for these six oscillation parameters. Generically they obtain their results within a frequentist framework, using a χ 2 statistics.
Alternatively, a consistent approach to obtaining the probability that a certain parameter within a given model takes certain values is provided by Bayesian inference. Furthermore, Bayesian analysis is particularly suited for comparing how much better one model describes the data compared to another model. So one may question to what degree the current determination of the oscillation parameters is dependent on the assumed statistical approach, and whether Bayesian statistics can shed some light on the presently open issues related to the mass ordering, the octant of θ 23 , and the presence of CP-violation.
In this article we address these questions by performing a Bayesian analysis of the current neutrino oscillation data. In Sec. 2 we briefly describe the elements of Bayesian statistics required for this analysis. In Sec. 3 we present the global results of the analysis and compare them with those of the χ 2 analysis of the same data samples of NuFIT 2.0 [10]. We discuss in detail the main results related to the determination of sin 2 θ 23 and δ CP in Secs. 4 and 5, where we also discuss the additional subtleties caused by the circular nature of the CP-violating phase, and study how it is possible to define correlation coefficients with s 2 23 in Sec. 6. Finally in Sec. 7 we summarize our conclusions.

Statistical framework
In this work, we will be using Bayesian probability theory, where each proposition is associated with a probability or plausibility, defined to lie between 0 and 1. In order to calculate the probabilities of different assumptions, hypotheses, or models, the laws of probability are used when conditioned on some known (or assumed) information. Of particular interest is Bayes' theorem, which can be used to compare a set of hypotheses M j , using some set of collected data, D, through calculation of the posterior odds, Pr(M i ) Pr(M j ) . (2.1) The prior odds Pr(M i )/ Pr(M j ) quantifies how much more plausible one model is than the other a priori. The evidence, Z i = Pr(D|M i ), is the likelihood for the model quantifying how well the model describes the data. The Bayes factor, which is the ratio of the evidences, quantifies how much better the model M i describes the data than M j . Given that the model M contains the free parameters Θ, the evidence is given by where L(Θ) ≡ Pr(D|Θ, M ) is the likelihood function. The prior probability density of the parameters is given by π(Θ) ≡ Pr(Θ|M ), and should always be normalized, i.e., it | log(odds)| odds should integrate to unity. The assignment of priors are probably the most discussed and controversial part of Bayesian inference. This is often far from trivial, but nevertheless this assignment is an important, even essential, part of any Bayesian analysis. The Bayes factors, or rather the posterior odds, are interpreted or "translated" into ordinary language using the so-called Jeffreys scale, given in Tab. 1 as used in, e.g., Refs. [11,12] ("log" denotes the natural logarithm). Even though the Bayes factor in general will favour the correct model once "enough" data have been obtained, the evidence is often highly dependent on the choice of prior on the parameters.
In principle, the evidence defined above is really the only consistent quantity to judge the (relative) merit of a model. However, there are also some so-called information criteria which have been used to compare different models, see, e.g., [13,14]. These do not explicitly depend on any prior, but typically are derived using quite restrictive assumptions. This makes their use less reliable, since conclusions based on them could differ much from a full Bayesian analysis. We will also consider the Akaike Information Criterion (AIC) (which is neither a Bayesian nor a frequentist meassure), motivated by minimizing the expected "distance" between the true data distribution, and the data distribution given by the fitted model. It yields a fixed penalty to each model as 1 dropping an irrelevant constant, and with N par the number of free parameters. Hence, we see that each additional parameter needs to improve the χ 2 by 2 units to make up for the additional complexity. Although great caution should be exercised, typically Z ∝ e −AIC/2 = L max e −Npar would be used as a proxy for the model likelihood, and hence −∆AIC/2 between two models as log of the Bayes factor, and interpreted using Tab. 1. However, unlike the Bayesian evidence, it punishes complex models with additional parameters regardless of whether these are constrained by the data, and for parameters which are constrained, the punishment is typically smaller than in the full Bayesian analysis.
Under the assumption that a model M is true, complete inference of its parameters is given by the posterior distribution, In this case, the evidence is only a normalization factor, since it is independent of the values of the parameters Θ and it is therefore often disregarded in parameter estimation. Thus the main result of Bayesian parameter inference is the posterior and its marginalized versions (usually in one or two dimensions). In this respect, one must distinguish between the marginal posterior distributions and the marginal likelihood, which is the likelihood integrated over all other parameters (after multiplication by the prior of these parameters).
The former is a probability distribution, while the latter is not [15]. However, if the parameters of interest have a uniform prior, the marginal posterior distribution and the marginal likelihood are proportional to each other. For the present analysis, it is only for the derived parameter J CP that the prior is sufficiently non-uniform to have a noticeable impact on the posterior, as we will show in Sec. 5. Generically in parameter inference, point estimates such as the posterior mean or median are given together with credible intervals (regions) for the parameters. A common way to define Bayesian credible intervals for a given parameter is by including all values with a posterior above a certain value, which however makes them non-invariant under nonlinear reparametrizations. Invariance can be restored by defining them to be iso-marginal likelihood intervals instead. 2 Then, one calls the "credible level" of a value η = η 0 of a subset of parameters simply the posterior volume within the likelihood of that value, This function is converted to the "number of σ's" in the usual manner as In this work we use MultiNest [16][17][18], a Bayesian inference tool which, given the prior and the likelihood, calculates the evidence with an uncertainty estimate, and generates posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions.

Priors on oscillation parameters
In a Bayesian analysis one has to choose a prior on model parameters, in our case the mixing parameters and mass-squared differences. Before considering any data, this prior should preferably not favour any basis or direction in flavour space, i.e., be invariant under rotations, or group transformations [19]. This Haar measure of neutrino mixing matrices is, after integrating out nonphysical and potential Majorana phases, the separable measure [20] π(s 2 12 , c 4 13 , s 2 23 , δ CP ) = 1/360 • , (2.8) in the standard parameterization. Although the prior is uniform in c 4 13 and not, for example, s 2 13 , this is of no practical consequence since s 2 13 is well-measured and significantly non-zero Ref. [7]. Furthermore, using other, non-invariant, priors such as uniform in the angles will in general not affect the results significantly. On the mass-square differences logarithmic priors are used. Since these are also well-measured their prior is also of no practical significance.
In addition, the neutrino mass ordering can be considered as just another free parameter. In this way, the two orderings can be compared, and also the inference of other quantities can be performed not assuming a mass ordering to be correct, but averaging over the two orderings. In this last case we take π(NO) = π(IO) = 0.5, and we denote this by mixed ordering (MO).
Regarding the experimental nuisance parameters, they are all minimized over as in a χ 2 analysis. Since the uncertainties of these are rather small and Gaussian, including them in the Monte Carlo and integrating over them instead of minimizing over them -as would be the correct procedure in a fully Bayesian analysis -would make a negligible difference.
The results are shown in Fig. 1 for NO, Fig. 2 for IO, and From these figures, we conclude that the absolute values of the two mass-square differences, as well as the mixing angles, s 2 12 , and s 2 13 , are well-measured and the posteriors of these parameters are Gaussian to a very good approximation.
We list in Tab. 2 different point estimates for each of these parameters: the global maximum likelihood (which is the best fit point, bfp, of the χ 2 analysis), the point at which the marginal likelihood is maximal, and the posterior mean and median. The table also contains measures of the uncertainty of each parameter in the form of the 1σ and 3σ Bayesian credible intervals as well as the corresponding χ 2 allowed regions at the same CL (which we also call χ 2 intervals for simplicity) which are identical to those given in Ref. [7]. As seen in the conclude that the present determination of these four parameters is very robust under variations of the statistical analysis and prior assumptions. Considering the comparison between mass orderings, we find that, assuming the same prior probability for both, their posterior probabilities are also very similar, the posterior probability of IO in this case given by The Bayes factor (which is independent of the prior on the ordering) is: i.e., there is a non-meaningful preference for inverted ordering. For comparison, the χ 2 analysis finds ∆χ 2 = χ 2 min (NO) − χ 2 min (IO) 0.97. Trivially, this gives ∆AIC/2 = 0.5 in favor of IO, which is also what log B would be if the likelihoods would have identical shapes. In summary, both ∆χ 2 and the Bayesian model comparison agree that there is no evidence for any of the mass ordering in the present data. However one must not forget that since the mass ordering is not a continuous parameter, ∆χ 2 should not have a χ 2 distribution, and hence the quantification of the degree of favouring/disfavouring of a given ordering based on the corresponding ∆χ 2 is not fully justified (see Ref. [51] for further discussion).
Finally we notice that figures 1-3 show some differences between the results of the χ 2 and Bayesian analyses where δ CP or s 2 23 are involved. For example, we see that the marginalization over δ CP pulls the bulk of the posterior of s 2 23 more into the second octant. Motivated by these differences we present a more detailed study of the results on s 2 23 , δ CP , and CP-violation in the following sections. We note that the Bayesian analysis generally prefers the second octant and it does so more than the χ 2 analysis, in particular for NO. Although the credible and confidence levels differ in the vicinity of the two peaks, both peaks are within the 2σ region, and outside of that region the difference between the two analyses is rather small. Typically, the lowcredibility Bayesian regions are larger than the small-χ 2 regions, while the high-credibility Bayesian regions are smaller than the large-χ 2 ones. This is just what is expected if the likelihood contains a relatively sharp peak on top of a broader plateau containing significant posterior probability.
For completeness, in addition to being displayed in Fig. 4, we also give the point estimates of s 2 23 in Tab

Octants of θ 23 and maximal mixing
A related question is that of which octant θ 23 belongs to, i.e., whether s 2 23 is larger or smaller than 0.5. With some similarity to the comparison of mass orderings, this is also a comparison of two non-nested models with the same number of parameters (although they are "adjacent"), and so one cannot expect difference between the χ 2 minima between the  two octants to have a χ 2 distribution. In a Bayesian analysis, the comparison is however straightforward, by simply integrating the likelihoods over each of the octants. In addition, one can also consider maximal mixing, s 2 23 = 0.5, as a realistic model, either exactly or approximately. From a statistical viewpoint, a model with a fixed value of a parameter can also be interpreted as a model where there is some non-zero, but negligible (compared to any experimental sensitivity) deviation from the fixed value [52]. Using any of these viewpoints, i.e., by either considering exact maximal mixing as a possible scenario, or alternatively as simply a very good approximation, one can make a comparison with the octants.
As always, a model with additional parameters will be punished for this extra complexity. In the present case, this punishment is uniquely fixed by the compactness of the space of the allowed values of s 2 23 . The Bayes factors between the second and first octants, as well as between non-maximal and maximal mixing, are given in Tab Table 5. Model comparison for different assumptions on s 2 23 . Logarithms of Bayes factors, the comparable differences in the AIC, and differences in χ 2 minima. The sign is chosen such that positive values correspond to preference for first mentioned assumptions in each case, i.e., the 2nd octant and non-maximal mixing, respectively. octant is weakly preferred over the first for the inverted ordering, but not in the normal and the mixed orderings. Using the AIC, with the values also given in Tab. 5, yields the same conclusions, although we remind the reader that interpreting the AIC as a model likelihood should be done with great care. Due to the relatively bad predictivity of the assumption of non-maximal mixing, maximal mixing is weakly preferred over non-maximal in all orderings. Note that ∆AIC/2 can never be smaller than −1 in this case, and these numbers close to that limit are simply saying that for no ordering is there any preference for non-maximal mixing.
If in the future the uncertainty on s 2 23 keeps on being reduced while maximal mixing continues to be allowed, at some point reducing the uncertainty further is pointless for the purpose of determining whether maximal-mixing is the correct model. Bayesian model comparison gives a quantification of at which point this is the case, which is when the evidence in favour of non-maximal mixing becomes strong.

Exploring δ CP and CP-violation
In this section we study the determination of δ CP in more detail. In the left panels of Fig. 5 we plot the Bayesian marginal posterior distribution of δ CP for all orderings together with the S of the credible intervals, as well as the profile likelihood and ∆χ 2 . For NO, the marginal and profile likelihoods have their maximum at about the same value of δ CP , but for IO and MO, the Bayesian analysis prefers larger δ CP . Comparing S with ∆χ 2 , the difference is not that large, apart from the shift just mentioned, and the fact that S diverges near δ CP 90 • , while ∆χ 2 is bounded by about 2.5.
In the right panels of Fig. 5 the marginal and profile likelihoods are plotted again, but in a polar coordinate system which better reflects its circular nature. We note that in a frequentist analysis the fact that δ CP is a phase and a circular, periodic variable will affect distributions of test statistics [54,55]. For the present data ∆χ 2 is expected to be a poor approximation of the frequentist significance, and typically the true significance will be higher than the naive expectation. Hence, Fig. 5 does not give a direct comparison of frequentist and Bayesian results.
In the Bayesian analysis, however, the circular nature of δ CP does not affect the posterior distributions or its interpretation. Nevertheless, it still needs to be taken into account if one wants to make summaries of the posterior in terms of point estimates such as the mean, median, or measures of dispersion such as the standard deviation. This is because the normal, linear definitions of these quantities will depend on the arbitrary choice of origin for δ CP [56][57][58].
In this respect a useful summary of the distribution of δ CP is given by the first moment, with · denoting the mean (indeed, it is e iδ CP which enters the mixing matrix). The appropriate analogues of the mean and median of δ CP are the circular mean and circular median. The first one is given by the argument of the first moment, while the second is defined as the endpoint closer to mean of the diameter of the circle that has 0.5 probability on each of its sides. These point estimates are summarized in Tab. 6 together with the likelihood maxima, and their values are plotted in Fig. 5. In what respects characterization of the dispersion, besides the credible intervals, if one wants to have a characterization similar to that provided by the linear standard deviation, one can make use of the fact that R = |m 1 | gives a reasonable measure of dispersion, with R = 0 for a uniform distribution and R = 1 for a degenerate one. However, it could be preferable and more easily interpretable to have such a measure which is an expected deviation in radians. Noting that the standard linear variance is the expectation of the Euclidean distance squared from the mean, in general one could use to obtain a dispersion, where d is some metric on the circle. The usual linear metric d(α, β) = |α − β| is not invariant with respect to choice of origin, but one can take instead d as the minimum arc length between α and β, also called the great-circle distance. Hence, one can simply take σ = d 2 (δ CP , δ CP ) as the variance. Another metric one can use is the one inherited from the Euclidean embedding, Then, the variance becomes To get the equivalent deviation as an angle away from the mean, we solve V = 2(1 − cos σ ), giving simply σ = arccos R, (5.6) which is then the deviation from the mean which has the same distance squared as the expectation over the distribution.  The presence of CP violation can also be studied in terms of the Jarlskog invariant, J CP , which, in the standard parameterization, is given by We plot in Fig. 6 the Bayesian marginal posterior distribution of J CP and J max CP for all orderings together with the S of the credible intervals, as well as the profile likelihood and ∆χ 2 . We note that these are derived parameters, and so their priors and posteriors are determined by those of the free oscillation parameters. In particular, their priors are not exactly uniform. For J max CP (the left panels) the prior is very close to uniform, and from the figure we see that it is so well constrained that it is perfectly Gaussian and agrees with the profile likelihood.
For J CP (right panels of Fig. 6), we plot both the posterior and the marginal likelihood, and we observe a difference, although it is not very large. A much larger difference is observed between these and the profile likelihood, which translates into a difference in the corresponding CL's (S and ∆χ 2 ). However, this difference is much smaller than one could naively expect form the differences in posterior versus the profile likelihood, the reason for this being that the Bayesian results are a function of the total probability contained in a region, and the sharp peak in the posterior still contains relatively little probability.
That the posterior of J CP shows peaks towards the edges of the distribution is simply because the density of | sin δ CP | is larger for those values. This is not canceled out in the marginal likelihood because J max CP has a broad prior, which means that so has J CP . Of course, the symmetry around J CP = 0 is broken by the information on δ CP supplied by the data, which then means that negative values of J CP are preferred, and more strongly so than in the χ 2 analysis. Note that since we do not have any freedom left in choosing our priors on the oscillation angles and phase, this is in some sense a robust consequence of using consistent Bayesian inference.

CP-violation vs CP-conservation
In the same way as maximal mixing, one can consider either exact CP-conservation as a possible scenario, or alternatively simply CP-conservation as a very good approximation, and compare the models: Note that these assumptions on CPC and CPV are unambiguously defined in the sense that they do not depend on a parameterization, and that the prior on δ CP in M CPV is uniquely given by the Haar measure. Hence, there is essentially no flexibility remaining in the choice of prior. Due to this fact and the compact nature of the parameter space, the normal pitfalls of model comparison, i.e., the potentially large and prior dependent penalty acquired for additional parametric complexity, are avoided, or at least heavily mitigated.
This unusually robust (fixed in size) and small penalty for the additional parameter means that the Bayesian analysis is expected to be more powerful at detecting CPV than it normally is at detecting a new physical effect. Hence, when comparing with a χ 2 analysis, a smaller significance or value of ∆χ 2 than normally should be needed for robust, Bayesian, detection of CPV. Equivalently, a certain value of ∆χ 2 would lead to a stronger Bayesian evidence of CPV than what the same ∆χ 2 would yield in a different setting.
Interestingly, also the true frequentist significance of CP-violation is expected to be stronger than the naive expectation [55], although the details depend significantly on the (unknown) value of s 2 23 assumed to be true 4 . This does not happen in a Bayesian analysis, which also does not depend on any distributions of test statistics under repeated experiments, but only on likelihood of the data which was actually observed.
The likelihoods of the different assumptions on δ CP , in the usual form of logarithms of Bayes factors, log(Z/Z CPV ) relative to M CPV are shown in Tab. 8, together with the AIC and difference in χ 2 . Although technically CP-violation is preferred in all cases, in none of the cases is the evidence even weak. Notice also that since δ CP is relatively unconstrained, the preference for CPV is even smaller using the AIC than in the Bayesian analysis. 6 Correlation between s 2 23 and δ CP In this section we discuss the possible quantification of the correlation between sin 2 θ 23 and δ CP . The posterior in the s 2 23 − δ CP plane for all the orderings is plotted in Fig. 7, together with the credible regions and χ 2 contours. Although the difference between the Bayesian and χ 2 analysis does not appear to be extremely large, there are some things which a Bayesian analysis makes possible which cannot be done in a χ 2 analysis. In particular, as seen in the figure, it is clear that s 2 23 and δ CP are not independent, and it will be interesting to quantify if the degeneracy between them is something which persists in future experiments. In a χ 2 analysis, quantifying the "correlation" between two parameters is typically limited to fitting a two dimensional Gaussian at the best-fit point. In a Bayesian analysis, global measures of association such as the standard Pearson product-moment correlation coefficient are available. However, this one only measures linear association, and is hence less useful when there are non-linear trends involved, including multi-modality. In particular it is possible for two highly dependent variables to have very small value of the Pearson correlation. Furthermore, in the present case, it fails in an even worse manner since the Pearson correlation is not circular invariant, i.e., its value depends on the arbitrary choice of origin for δ CP . In what respects θ 23 one can treat θ 23 as circular variable or use instead the linear variable s 2 23 . So let us focus on how to define a correlation coefficient which can overcome these limitations. Typically a correlation coefficient will aim to quantify how much of the variation in one variable can be explained by the variation in another one. For example, to what extent the linear relation Y |X = x = ax + b is responsible for the variation in Y (which leads to the standard Pearson correlation coefficient). Similarly one can consider circularcircular associations between two circular variables Θ and Φ (in this case δ CP and θ 23 ), circular-linear association, predicting the expectation of Θ, given X = x , or linear-circular association, predicting the expectation of X, given Θ = θ (in these cases X = sin 2 θ 23 ).
Many measures of correlation involving circular variables already exist in the literature (see [56][57][58][59]). For two circular variables a simple one is where the bar denotes the circular mean. This has many properties in common with the linear version, such as being confined to the interval [−1, 1], it is zero if the variables are independent, and it numerically agrees with the linear version for concentrated distributions. An alternative, but slightly more complex, correlation coefficient for two circular variables is the T-linear one of Ref. [60], where Θ 1 and Θ 2 are treated as two independent copies of Θ, and similarly for Φ. Also for linear-circular association, one can split the circular variable into its sine and cosine and consider the multiple correlation coefficient between X and (sin Θ, cos Θ), giving with ρ xc = ρ(x, cos y), ρ xs = ρ(x, sin y), ρ cs = ρ(cos y, sin y) being standard linear coefficients. We notice that being defined by a square, only |ρ lc | is known and hence gives no information on the "sign" or "direction" of the association. While the above measures of association overcome the problem of the circular invariance they are still only sensitive to a limited kind of association, and it is possible for them to be zero even when the variables are highly dependent on each other. It could hence be of interest to have a measure which can quantify any type of dependence, and which will only be zero when the variables are independent. Such a measure, based on information theory, is the mutual information [61][62][63][64]. This is information gained by knowing the full distribution P (x, y) rather than only the marginal distributions P (x), and P (y), or equivalently, the average information gained on X by knowing the value of Y (and vice versa). This can be expressed as the so-called Kullback-Liebler divergence between P X,Y and the product P X P Y , Using the natural logarithm gives the result in nats, while one obtains the results in bits by using base 2. It holds that I(X, Y ) ≥ 0 with equality if and only if X and Y are independent. Next, in order to make the connection with the standard correlation coefficient, we note that for a two-dimensional Gaussian distribution (for which no correlation is equivalent to independence), I = log(1/ 1 − ρ 2 ), and so we define We now have constructed a correlation coefficient which is independent of any boundary conditions on the variables and is invariant under arbitrary univariate redefinitions of x and y (which the others are not). As the previous coefficients it also reduces to the standard Pearson coefficient in the limit of a concentrated Gaussian distribution. However, like |ρ cl |, it only measures the degree of dependence, but not any "direction" of the association. Our estimates of the different correlation coefficients are given in Tab. 9. 5 For all measures, we find stronger correlation in NO than in IO, typically significantly so (with the exception of ρ T ). Furthermore, the two signed circular-circular measures have significantly smaller absolute values than the others, and for these we also find that in MO the correlation is actually larger than both NO and IO, which is not the case for the others. We note that all are smaller than or equal in size of |ρ I |. This is somehow expected as |ρ I |, in some sense, measures "all" the dependence between δ CP and s 2 23 .  Table 9. Different correlation coefficients between s 2 23 and δ CP .

Summary
We have presented the results of a Bayesian global analysis of solar, atmospheric, reactor and accelerator neutrino data in the framework of three-neutrino oscillations and compared them with those from the standard χ 2 analysis in NuFIT 2.0 [10]. The results are summarized Fig. 1 for NO, Fig. 2 for IO, and Fig. 3 for MO where we compare the relevant Bayesian quantities (the posterior distribution and two-dimensional Bayesian credible regions) with the profile-likelihood and the two dimensional χ 2 allowed regions. We found that the four parameters ∆m 2 3 , ∆m 2 21 , s 2 12 , and s 2 13 , are well-measured and their posterior distributions are Gaussian to a very good approximation. The corresponding Bayesian credibility intervals at a given CL are also very similar to the χ 2 allowed regions at the same CL, as seen in Table 2.
We found some differences between the results of the χ 2 and Bayesian analysis where δ CP or s 2 23 are involved. In particular, the marginalization over δ CP pulls the bulk of the posterior of s 2 23 more into the second octant which has some effect on the ranges of parameter estimates and the quality of the description between octants. We study the determination of θ 23 in more detail in Sec. 4 and we conclude that the Bayesian analysis generally prefer the second octant more so than the χ 2 analysis, in particular for NO. The credible and confidence levels differ in the vicinity of the two peaks but both peaks are 5 We note that large biases in the estimation of the mutual information may occur [62,63]. As before we use kernel density estimate of the densities, similar to Ref. [64], and our very large sample sizes ensures an accurate estimate.
within the 2σ regions. Altogether the low-credibility Bayesian regions are larger than the small-χ 2 regions, while the high-credibility Bayesian regions are smaller than the large-χ 2 ones.
In what respects the present determination of δ CP , presented in Sec. 5, we found that for NO, the marginal and profile likelihoods have their maximum at about the same value of δ CP , but for IO and MO, the Bayesian analysis prefers slightly larger values of δ CP . Also, unlike the χ 2 interval, the 3σ Bayesian credible interval do not contain the full range of δ CP but some values near π/2 are not included. We have also introduced and quantified two measures of the dispersion of δ CP equivalent to the linear standard deviations but valid for a circular variable.
In addition, we have studied the Jarlskog invariant, J CP , as well as its maximal value over δ CP and find that the posterior distribution of J max CP is perfectly Gaussian and agrees with the profile likelihood. For J CP large differences appear between the posterior distribution and the profile likelihood and lead to some difference in the corresponding CL intervals. In particular we find that negative values of J CP are preferred in both analysis but more strongly in the Bayesian than in the χ 2 analysis.
The possible quantification of the correlation between θ 23 and δ CP taking into account their circular nature has been discussed in Sec. 6. In particular, we have introduced a new correlation coefficient, ρ I , defined in terms of the mutual information, which is independent of any boundary conditions on the variables and is invariant under arbitrary univariate redefinitions of them. Quantitatively we always find stronger correlation between δ CP and θ 23 in NO than in IO.
Finally, we note that a Bayesian analysis is particularly suited for comparing how much better one model describes the data compared to another model, a comparison which is quantified in terms of the Bayes factor of the two models (assuming both models to be equally probable a priori). We have applied this to the comparison between the mass orderings, the octant of θ 23 , and to the presence of CP violation with the following conclusions: • In what regards the comparison between both orderings, we find that, assuming the same prior probability for both, their posterior probabilities are also very similar: 0.55 for IO and 0.45 for NO with a logarithm of Bayes factor of −0.2, which implies that slight preference for inverted ordering is not statistically meaningful.
• Applied to the preference for the octant of θ 23 we find that the second octant is weakly preferred over the first for the inverted ordering, but not in the normal nor in the case of no assumption or knowledge on the ordering. Also due to the relatively bad predictivity of the assumption of non-maximal mixing, maximal mixing is weakly preferred over non-maximal in all orderings.
• As for CP violation we find that although technically CP-violation is preferred over CP conservation (either for δ CP = 0 or δ CP = π), the corresponding value of the logarithm of the Bayes factor is always smaller than 1 in absolute value, i.e., the corresponding evidence is not even weak.