Constraining power of open likelihoods, made prior-independent

One of the most criticized features of Bayesian statistics is the fact that credible intervals, especially when open likelihoods are involved, may strongly depend on the prior shape and range. Many analyses involving open likelihoods are affected by the eternal dilemma of choosing between linear and logarithmic prior, and in particular in the latter case the situation is worsened by the dependence on the prior range under consideration. In this letter, using the tools of Bayesian model comparison, we propose a simple method to obtain constraints that depend neither on the prior shape nor range. An application to the case of cosmological bounds on the sum of the neutrino masses is discussed as an example.


Introduction
In several cases, physics experiments try to measure unknown quantities: the mass of some particle, a new coupling constant, the scale of new physics. Most of the times, the absolute scale of such new quantities is completely unknown, and the analyses of experimental data require to scan a very wide range of values for the parameter under consideration, to finally end up with a lower or upper bound when data are compatible with the null hypothesis.
In the context of Bayesian analysis, performing this kind of analysis implies a profound discussion on the choice of the considered priors, which may be logarithmic when many orders of magnitude are involved. A robust analysis usually shows what happens when more than one type of prior is considered, but the calculation of credible intervals always require also a precise definition of the prior range. Especially in the case of logarithmic priors, a choice of the range can be difficult even when physical boundaries (e.g. a mass or coupling must be positive) exist, with the consequence that the selected allowed range for the parameter can influence the available prior volume and as a consequence the bound itself.
Let us consider for example the case of neutrino masses and their cosmological constraints. Current data are sensitive basically only on the sum of the neutrino masses and not on the single mass eigenstates (see e.g. [1,2]). There are therefore good reasons to describe the physics by means of Σm ν and to consider a linear prior on it, as the parameter range is limited from below by oscillation experiments [3,4,5] and from above by KATRIN [6]. Even given these considerations, however, one can decide to perform the analysis considering a lower limit Σm ν > 0 [7], instead of enforcing the oscillation-driven one, Email address: gariazzo@ific.uv.es (S. Gariazzo) Σm ν 60 (100) meV (respectively for normal and inverted ordering of the neutrino masses, see e.g. [8,9]): the obtained upper bounds will differ in the various cases.
In order to overcome these problems, in this letter we revisit a simple way to use Bayesian model comparison techniques to obtain prior-independent constraints, which can be useful for an easier comparison of the constraining power of various experimental results, not only in the context of cosmology, but in all Bayesian analyses in general.

Prior-free Bayesian constraints
The foundation of Bayesian statistics is represented by the Bayes theorem: where π(θ|M i ) and p(θ|d, M i ) are the prior and posterior probabilities for the parameters θ given a model M i , L M i (θ) is the likelihood function, depending on the parameters θ, given the data d and the model M i , and is the Bayesian evidence of M i [10], the integral of prior times likelihood over the entire parameter space Ω θ . While the Bayes theorem indicates how to obtain the posterior probability as a function of all the model parameters θ, when presenting results we are typically interested in the marginalized posterior probability as a function of one parameter (or two), which we can generally indicate with x. The marginalization is performed over the remaining parameters, which we can indicate with ψ: Let us now assume that the prior is separable and we can write π(θ|M i ) = π(x|M i ) · π(ψ|M i ). Under such hypothesis, Eq. (3) can be written as: Let us consider the marginalized posterior as written in Eq. (4). The prior dependence is only present explicitly outside the integral, and therefore we can obtain a prior-independent quantity 1 just dividing the posterior by the prior. The righthand side of Eq. (4), however, has an explicit dependence on the value of x through the likelihood that appears in the integral. We can note that such integral resembles the definition of the Bayesian evidence in Eq. (2), not anymore for model M i , but for a sub-case of M i which contains x as a fixed parameter. Let us label this model with M x i and, for sake of brevity, let us define its Bayesian evidence which is independent of the prior π(x), but still depends on the parameter value x, now fixed. Now, let us consider two models M x 1 i and M x 2 i . We can use the Bayes theorem applied to such models and compare their posteriors: where we indicate with π(M x i ), p(M x i ) the prior and posterior of model M x i , and as usual the Bayes factor is: Few considerations are needed. Since the underlying model M i is the same and only the value of x changes, the prior π(M x i ) is proportional to the prior of x within model M i , π(x|M i ). In the same way, the model posterior p(M x i ) can be written as the model posterior of M i , p(M i ), multiplied by the parameter posterior p(x|d, M i ). When taking the ratio, therefore, we get: Notice that the same equation can be obtained by rewriting Eq. (4) using the definition in Eq. (5), once for x 1 and once for x 2 , and dividing p( For reasons that will be clear in a moment, let us now rewrite this last equation using x = x 1 and x 0 = x 2 , and move some of the terms between the two sides. The quantity B x,x 0 , indicated with R(x, x 0 |d), has been named "relative belief updating ratio" or "shape distortion function" in the past [11,12,13]: As we can see from the last equality, which is useful to understand the meaning of R but unpractical to compute it, and since π(x|M i ) does not enter Eq. (5), R(x, x 0 |d) is completely independent of the prior on the parameter x. This quantity therefore represents a prior-independent way to compare some results concerning two values of some parameter x. At the practical level, R is particularly useful when dealing with open likelihoods, i.e. when data only constrain the value of some parameter from above or from below. In such case, the likelihood becomes insensitive to the parameter variations below (or above) a certain threshold. Let us consider for example the absolute scale of neutrino masses, on which data (either cosmological or at laboratory experiments) only put an upper limit: the data are insensitive to the value of x when x goes towards 0, so we can consider x 0 = 0 as a reference value. Regardless of the prior, when x is sufficiently close to x 0 the likelihoods in x and x 0 are essentially the same in all the points of the parameter space Ω ψ , so Z x i Z x 0 i and R(x, x 0 ) → 1. In the same way, when x is sufficiently far from x 0 , the data penalize its value (Z x i Z x 0 i ) and we have R(x, x 0 ) → 0. In the middle, the function R indicates how much x is favored/disfavored with respect to x 0 in each point, in the same way a Bayes factor indicates how much a model is preferred with respect to another one.
While R can define the general behavior of the posterior as a function of x, any probabilistic limit one can compute will always depend on the prior shape and range, which is an unavoidable ingredient of Bayesian statistic. The description of the results through the R function, however, allows to use the data to define a region above which the parameter values are disfavored, regardless of the prior assumptions, and also to guarantee an easier comparison of two experimental results. A good standard could be to provide a (non-probabilistic) "sensitivity bound", defined as the value of x at which R drops below some level, for example | ln R| = 1, 3 or 5 in accordance to the Jeffreys' scale (see e.g. [2,10]). Let us consider x 0 = 0 as above: we could say, for example, that we consider as "moderately (strongly) disfavored" the region x > x s for which ln R < s, with s = −3 (or −5), and then use the different values x s to compare the strengths of different data combinations d in constraining the parameter x. This will not represent an upper bound at some given confidence level, since it is not a probabilistic bound, but rather a hedge "which separates the region in which we are, and where we see nothing, from the the region we cannot see" [12].
Notice also that, once R is known, anyone can obtain credible intervals with any prior of choice: the posterior p(x|d, M i ) can easily be computed using Eq. (8) and normalizing to a total probability of 1 within the prior.

A simple example with Planck 2018 chains
To demonstrate a simple example with recent cosmological data, we provide in Fig. 1 the function R(Σm ν , 0) computed in few cases, obtained from the publicly available Planck 2018  (P18) chains 2 with four different data sets and considering the ΛCDM+Σm ν model. The datasets include the full CMB temperature and polarization data [14] plus the lensing measurements [15] by Planck 2018, and BAO information from the SDSS BOSS DR12 [16,17,18,19] the 6DF [20] and the SDSS DR7 MGS [21] surveys.
The calculation of R is easy in this case. The Planck collaboration considered a flat prior on Σm ν 3 , so we simply have to obtain the posterior p(Σm ν |d, ΛCDM + Σm ν ) by standard means and use it to compute R according to Eq. (9) 4 . Since the lower limit adopted by Planck is Σm ν = 0, we can compute R for any positive value of Σm ν , as far as we do not exceed the upper bound of the prior. To better show the results, we consider a logarithmic scale and an appropriate parameter range for the plot in Fig. 1.
From the figure, we can notice that the data are completely insensitive to the value of Σm ν when it falls below 0.01 eV: in this region, there will be no change between prior and posterior distributions, and R → 1 as expected. On the other hand, Σm ν 0.4 eV will be disfavored by data, for all the data combinations shown here, as R → 0. As is also expected, the exact shape of R between 0.01 and 0.4 eV depends on the inclusion of the BAO constraints and only slightly on the lensing dataset. 2 The chains are available through the Planck Legacy Archive, http:// pla.esac.esa.int/pla/. Note that a simple post-processing of the available chains is sufficient to produce Fig. 1. 3 Note that, although the considered prior is linear, the calculations through the CAMB code enforce a non-trivial distortion of the prior, which comes from the numerical requirements of the code. In the region below 1 eV, however, the prior on Σm ν is substantially unchanged by this fact. 4 Note that this is practically what is already shown by most authors in cosmology, since the results for 1-dimensional marginalized posteriors are often presented in plots where p π (x|d)/p max π is shown, being π(x) a linear prior on the quantity x, therefore not affecting the conversion between posterior and R according to Eq. (9). Apart for the normalization constant, p π (x|d)/p max π may be intended as an unnormalized posterior probability, which can be employed for bounds calculations as if a linear prior on x was considered, or as a shape distortion function, therefore not suitable to compute limits unless some prior is assumed first.
Regardless of considering a cut at R = e −3 or R = e −5 , indeed, the value of the sensitivity bound only depends on the inclusion of the BAO data. A comparison of the CMB dataset without (P18) or with (P18+BAO) the BAO constraints, therefore, can be summarized by two numbers, considering for example s = −5:

The case of multiple models
In the previous sections we discussed the case when dealing with only one model. The situation is slightly different when more models are considered, for example if one wants to study and take into account several extensions of the same minimal scenario, as in Ref. [22]. It is not difficult to rewrite the definition of R to deal with multiple models, if we assume that the prior for the parameter x under consideration is the same in all of them, i.e. that π(x) ≡ π(x|M i ) does not depend on M i .
Let us now recall the method proposed in [22]. The modelmarginalized posterior distribution of the parameter x is obtained as where p(M i |d) is the posterior probability of the model M i , which can be computed using [23] p(M i |d) = Z i π(M i ) In both cases the sum runs over all the studied models. Coming back to Eq. (12) and using Eqs. (4) 5 and (13), we obtain the fully (prior-and model-) marginalized posterior probability of x: Remembering that we assumed π(x) ≡ π(x|M i ) to be independent of M i , the ratio between prior and marginalized posterior probabilities for the parameter x is: If we use this result to define R again as in equation (9), we have:  which has exactly the same meaning as before, apart for the fact that in this case R has been marginalized over several models.
From the computational point of view, in the modelmarginalized case obtaining R is as simple as when only one model is considered. One just has to select a prior π(x) and a sufficiently broad range, obtain the marginalized posterior probability as in [22], then divide it by the considered prior and normalize appropriately.
As an example, we provide in Fig. 2 the R function obtained from the vary same posteriors studied in Ref. [22]. Such cases are computed considering the full Planck 2015 (P15) CMB data [24,25], together with the lensing likelihood [26] and the BAO observations by from the SDSS BOSS DR11 [27], the 6DF [20] and the SDSS DR7 MGS [21] surveys. The considered models are the same extensions of the ΛCDM+Σm ν case adopted by the Planck collaboration for the 2015 public release, but with a prior Σm ν > 60 meV. Also in this case we can see how the R function is very close to one below 0.1 eV and always goes to zero above ∼ 0.7 eV. In the middle, the various models (dashed lines) have different constraining powers, whose weighted average is represented by the solid line. The model-marginalized, prior-independent result corresponds to Σm ν,−5 = 0.6 eV (P15+lensing+BAO).

Discussion and conclusions
In this letter we discussed a possible way to show priorindependent results in the context of Bayesian analysis. The method uses Bayesian model comparison techniques to compare the constraining power of the data at different values of the considered parameter, and is particularly useful when open likelihoods are involved in the analysis. We applied the simple formulas to the case of neutrino mass constraints from cosmology, discussing the case of several datasets analyzed with one single cosmological model, and the case where we have only one dataset but multiple models. In the latter case, Bayesian model comparison also allows to take into account the constraints from the different models to obtain a prior-independent and modelmarginalized bound. An extended application of this method is left for a separate work.