Abstract
In Bayesian model selection for the sake of objectivity very often default estimation priors are used. However, these priors are usually improper yielding indeterminate Bayes factors that preclude the comparison of the models. To solve this difficulty integral priors have been proposed as prior distributions for Bayesian model selection in Cano et al. (Test 17(3):493–504, 2008). These priors are the solution to a system of two integral equations, and the \(\sigma \)-finite invariant measures associated with a Markov chain. They have been further developed in Cano and Salmerón (Bayesian Anal 8(2):361–380, 2013) and applied to binomial regression models in Salmerón et al. (Stat Sin 25(3):1009–1023, 2015). One of the main advantages of this methodology is that it can be applied to compare both nested and non-nested models. Here, we present some applications of this methodology along with some new technical developments, from the simplest case to more advanced ones to illustrate how it works. We begin with the toy example of a normal mean with known variance to easily point out how this methodology operates. Then, we consider the comparison of the normal location model with the double exponential one. Finally, we consider the case of integral priors for the one-way heteroscedastic ANOVA, where the simulation of the Markov chains involves a Gibbs sampling algorithm, and we present some relevant conclusions and outline oncoming research.
Similar content being viewed by others
References
Bayarri MJ, Berger JO, Forte A, García-Donato G (2012) Criteria for Bayesian model choice with application to variable selection. Ann Stat 40:1550–1577
Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91(433):109–122
Berger JO, Pericchi LR, Varshavsky J (1998) Bayes factors and marginal distributions in invariant situations. Sankhya A 60:307–321
Berger JO, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):112–122
Cano JA, Kessler M, Moreno E (2004) On intrinsic priors for nonnested models. Test 13(2):445–463
Cano JA, Kessler M, Salmerón D (2007a) Integral priors for the one way random effects model. Bayesian Anal 2(1):59–68
Cano JA, Kessler M, Salmerón D (2007b) A synopsis of integral priors for the one way random effects model. In: Bernardo JM et al (eds) Bayesian statistics 8. Oxford University Press, Oxford, pp 577–582
Cano JA, Salmerón D (2013) Integral priors and constrained imaginary training samples for nested and non-nested Bayesian model comparison. Bayesian Anal 8(2):361–380
Cano JA, Salmerón D (2016) A review of the developments on integral priors for Bayesian model selection. Beio 32(2):96–111
Cano JA, Salmerón D, Robert CP (2008) Integral equation solutions as prior distributions for Bayesian model selection. Test 17(3):493–504
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions by Bayesian sampling. J R Stat Soc Ser B 56:363–375
Eaton ML (1992) A statistical dyptich: admissible inferences-recurrence of symmetric Markov chains. Ann Stat 20:1147–1179
Hobert JP, Robert CP (1999) Eaton’s Markov chain, its conjugate partner and P-admissibility. Ann Stat 27:361–373
León-Novelo L, Moreno E, Casella G (2012) Objective Bayes model selection in probit models. Stat Med 31(4):353–365
Lindley DV (1957) A statistical paradox. Biometrika 44:187–192
Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and sampling schemes. Biometrika 81:27–40
Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, New York
Moreno E, Bertolino F, Racugno W (1998) An intrinsic limiting procedure for model selection and hypotheses testing. J Am Statist Assoc 93:1451–1460
Moreno E, Girón FJ, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952
Pérez JM, Berger JO (2002) Expected posterior priors for model selection. Biometrika 89(3):491–512
Salmerón D, Cano JA, Robert CP (2015) Objective Bayesian hypothesis testing in binomial regression models with integral prior distributions. Stat Sin 25(3):1009–1023
Womack AJ, León-Novelo L, Casella G (2014) Inference from intrinsic Bayes procedures under model selection and uncertainty. J Am Stat Assoc 109(507):1040–1053
Acknowledgements
This research was supported by the Séneca Foundation Programme for the Generation of Excellence Scientific Knowledge under Project 15220/PI/10.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Integral and intrinsic priors for testing a point null hypothesis in a normal model with unknown mean and variance
1.1 Intrinsic priors for nested models with the default estimation prior for model \(M_1\) as its intrinsic prior
In this case there are two ways to obtain the intrinsic prior \(\pi _2(\theta _2)\) for model \(M_2\), see Moreno et al. (1998). The first one is to obtain it from the posterior distribution as
and the second one from the Fubini’s theorem as
This second way is more comfortable to work with and it is the commonly used in the literature on intrinsic priors. Now, expressing \(\pi _2(\theta _2)\) as
it is clear that integral priors generalize intrinsic priors.
In the case where the simpler model is a point null hypothesis, \(H_0:\theta _1=\theta _{10}\), the two ways yield \(\pi _2(\theta _2)\) as
and
respectively, that of course, are the same.
1.2 Intrinsic and integral priors for the comparison of \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\)
To compare \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\), with default estimation prior \(\pi _2^N(\theta ,\sigma )\propto 1/\sigma \), using the first expression and taking into account that \({m_2^N(x)}\) is equal to \(\frac{1}{2\vert x_1-x_2\vert }\), see Berger and Pericchi (1996), the intrinsic prior is
that using the change of variables \(u=x_1-x_2\) y \(v=x_1+x_2\) yields
Of course, as we are dealing with a point null hypothesis integral priors and intrinsic priors coincide and they are unique. Nevertheless, as it is shown next, we can also obtain the integral priors explicitly using the steps of the associated Markov chain, which again highlight the idea that integral and intrinsic priors go in parallel ways while dealing with nested situations but integral priors can go further.
1.3 Obtention of the integral prior for \(M_2:N(\theta ,\sigma ^2)\) using the associated Markov chain
The minimal training sample is \(x=(x_1,x_2)\), and the posterior distribution with the default estimation prior \(\pi ^N(\theta ,\sigma )\propto 1/\sigma \) is therefore
with \(s^2=(x_1-x_2)^2/2\) and \(\bar{x}=(x_1+x_2)/2\). Then, \(\pi _2^N(\theta \mid \sigma ,x)=N(\theta \mid \bar{x},\sigma ^2/2)\) and
Now, the associated Markov chain transition consists of just two steps as in the toy example since we are dealing with a point null hypothesis again. First, it is simulated x from model \(M_1\) and secondly it is simulated from \(\pi _2^N(\theta ,\sigma \mid x)\). Then, \( \theta =\bar{x}+\sigma \varepsilon _1/\sqrt{2}, \) where \(\varepsilon _1\sim N(0,1)\) and \(\bar{x}\sim N(0,1/2)\), and therefore \(\pi (\theta \mid \sigma )=N(\theta \mid 0,(\sigma ^2+1)/2)\) and \( \pi (\sigma )= \int \pi _2^N(\sigma \mid x)p(s^2)ds^2, \) where \(p(s^2)\) is \(\chi ^2_1\) density. Then, normalizing \(\pi _2^N(\sigma \mid x)\) we obtain \( \pi (\sigma )=2/\pi (\sigma ^2+1), \) and finally we obtain again \(\pi (\theta ,\sigma )= 2N(\theta \mid 0,(\sigma ^2+1)/2)/\pi (\sigma ^2+1)\).
Appendix 2. Computation of the marginal \(m_{2}(\mathbf {x})\) in the comparison of the normal model versus the double exponential one
The ordered statistics of \(\mathbf {x},\) \((x_{(1)},\ldots ,x_{(n)}),\) are needed to compute the marginal \(m_{2}(\mathbf {x})\). Denoting \(R_{1}=(-\infty ,x_{(1)})\), \(R_{j}=(x_{(j-1)}, x_{(j)})\) for \(j=2,\ldots ,n\) and \(R_{n+1}=(x_{(n)},\infty )\), we have that
Let \(H_{j}(\mathbf x ,\lambda )\), \(j=1,\ldots ,n+1,\) be the value of the function \(-\sum _{i=1}^{n}|x_{i}-\lambda |\) in the region \(R_{j}\), that is, \(H_{1}(\mathbf x ,\lambda )=n\lambda -\sum _{i=1}^{n}x_{(i)},\) \(H_{n+1}(\mathbf x ,\lambda )=\sum _{i=1}^{n}x_{(i)}-n\lambda ,\) and \(H_{j}(\mathbf x ,\lambda )=\sum _{i=1}^{j-1}x_{(i)}+(n-2(j-1))\lambda -\sum _{i=j}^{n}x_{(i)}\) for \(j=2,\ldots ,n\). Then,
Straightforward computations yield \(I_{1}=\frac{1}{n}\exp \left( (n-1)x_{(1)}-\sum _{i=2}^{n}x_{(i)}\right) .\) For the cases \(j=2,\ldots ,n\) and \(j\ne n/2+1\) we have:
For even n we obtain \(I_{j}\) for \(j=n/2+1\) as:
therefore for the sake of simplicity and without loss of generality we have restricted ourselves to the case of odd n. Finally,
Rights and permissions
About this article
Cite this article
Cano, J.A., Iniesta, M. & Salmerón, D. Integral priors for Bayesian model selection: how they operate from simple to complex cases. TEST 27, 968–987 (2018). https://doi.org/10.1007/s11749-018-0579-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-018-0579-1