Integral priors for Bayesian model selection: how they operate from simple to complex cases

Cano, J. A.; Iniesta, M.; Salmerón, D.

doi:10.1007/s11749-018-0579-1

Integral priors for Bayesian model selection: how they operate from simple to complex cases

Original Paper
Published: 10 February 2018

Volume 27, pages 968–987, (2018)
Cite this article

TEST Aims and scope Submit manuscript

J. A. Cano¹,
M. Iniesta¹ &
D. Salmerón^2,3,4

140 Accesses
2 Altmetric
Explore all metrics

Abstract

In Bayesian model selection for the sake of objectivity very often default estimation priors are used. However, these priors are usually improper yielding indeterminate Bayes factors that preclude the comparison of the models. To solve this difficulty integral priors have been proposed as prior distributions for Bayesian model selection in Cano et al. (Test 17(3):493–504, 2008). These priors are the solution to a system of two integral equations, and the $\sigma $-finite invariant measures associated with a Markov chain. They have been further developed in Cano and Salmerón (Bayesian Anal 8(2):361–380, 2013) and applied to binomial regression models in Salmerón et al. (Stat Sin 25(3):1009–1023, 2015). One of the main advantages of this methodology is that it can be applied to compare both nested and non-nested models. Here, we present some applications of this methodology along with some new technical developments, from the simplest case to more advanced ones to illustrate how it works. We begin with the toy example of a normal mean with known variance to easily point out how this methodology operates. Then, we consider the comparison of the normal location model with the double exponential one. Finally, we consider the case of integral priors for the one-way heteroscedastic ANOVA, where the simulation of the Markov chains involves a Gibbs sampling algorithm, and we present some relevant conclusions and outline oncoming research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Objective Bayesian inference with proper scoring rules

Article 12 July 2018

F. Giummolè, V. Mameli, … L. Ventura

Bayesian Variable Selection for Linear Models Using I-Priors

Empirical Bayes methods in classical and Bayesian inference

Article 03 June 2014

Sonia Petrone, Stefano Rizzelli, … Catia Scricciolo

References

Bayarri MJ, Berger JO, Forte A, García-Donato G (2012) Criteria for Bayesian model choice with application to variable selection. Ann Stat 40:1550–1577
Article MathSciNet Google Scholar
Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91(433):109–122
Article MathSciNet Google Scholar
Berger JO, Pericchi LR, Varshavsky J (1998) Bayes factors and marginal distributions in invariant situations. Sankhya A 60:307–321
MathSciNet MATH Google Scholar
Berger JO, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):112–122
MathSciNet MATH Google Scholar
Cano JA, Kessler M, Moreno E (2004) On intrinsic priors for nonnested models. Test 13(2):445–463
Article MathSciNet Google Scholar
Cano JA, Kessler M, Salmerón D (2007a) Integral priors for the one way random effects model. Bayesian Anal 2(1):59–68
Article MathSciNet Google Scholar
Cano JA, Kessler M, Salmerón D (2007b) A synopsis of integral priors for the one way random effects model. In: Bernardo JM et al (eds) Bayesian statistics 8. Oxford University Press, Oxford, pp 577–582
Google Scholar
Cano JA, Salmerón D (2013) Integral priors and constrained imaginary training samples for nested and non-nested Bayesian model comparison. Bayesian Anal 8(2):361–380
Article MathSciNet Google Scholar
Cano JA, Salmerón D (2016) A review of the developments on integral priors for Bayesian model selection. Beio 32(2):96–111
Google Scholar
Cano JA, Salmerón D, Robert CP (2008) Integral equation solutions as prior distributions for Bayesian model selection. Test 17(3):493–504
Article MathSciNet Google Scholar
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions by Bayesian sampling. J R Stat Soc Ser B 56:363–375
MATH Google Scholar
Eaton ML (1992) A statistical dyptich: admissible inferences-recurrence of symmetric Markov chains. Ann Stat 20:1147–1179
Article Google Scholar
Hobert JP, Robert CP (1999) Eaton’s Markov chain, its conjugate partner and P-admissibility. Ann Stat 27:361–373
Article MathSciNet Google Scholar
León-Novelo L, Moreno E, Casella G (2012) Objective Bayes model selection in probit models. Stat Med 31(4):353–365
Article MathSciNet Google Scholar
Lindley DV (1957) A statistical paradox. Biometrika 44:187–192
Article Google Scholar
Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and sampling schemes. Biometrika 81:27–40
Article MathSciNet Google Scholar
Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, New York
Book Google Scholar
Moreno E, Bertolino F, Racugno W (1998) An intrinsic limiting procedure for model selection and hypotheses testing. J Am Statist Assoc 93:1451–1460
Article MathSciNet Google Scholar
Moreno E, Girón FJ, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952
Article MathSciNet Google Scholar
Pérez JM, Berger JO (2002) Expected posterior priors for model selection. Biometrika 89(3):491–512
Article MathSciNet Google Scholar
Salmerón D, Cano JA, Robert CP (2015) Objective Bayesian hypothesis testing in binomial regression models with integral prior distributions. Stat Sin 25(3):1009–1023
MathSciNet MATH Google Scholar
Womack AJ, León-Novelo L, Casella G (2014) Inference from intrinsic Bayes procedures under model selection and uncertainty. J Am Stat Assoc 109(507):1040–1053
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by the Séneca Foundation Programme for the Generation of Excellence Scientific Knowledge under Project 15220/PI/10.

Author information

Authors and Affiliations

Departamento de Estadística e Investigación Operativa, Universidad de Murcia, 30100, Espinardo, Spain
J. A. Cano & M. Iniesta
Servicio de Epidemiología, Consejería de Sanidad, IMIB-Arrixaca, Murcia, Spain
D. Salmerón
CIBER Epidemiología y Salud Pública (CIBERESP), Murcia, Spain
D. Salmerón
Departamento de Ciencias Sociosanitarias, Universidad de Murcia, Murcia, Spain
D. Salmerón

Authors

J. A. Cano
View author publications
You can also search for this author in PubMed Google Scholar
M. Iniesta
View author publications
You can also search for this author in PubMed Google Scholar
D. Salmerón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. A. Cano.

Appendices

Appendix 1: Integral and intrinsic priors for testing a point null hypothesis in a normal model with unknown mean and variance

1.1 Intrinsic priors for nested models with the default estimation prior for model $M_1$ as its intrinsic prior

In this case there are two ways to obtain the intrinsic prior $\pi _2(\theta _2)$ for model $M_2$, see Moreno et al. (1998). The first one is to obtain it from the posterior distribution as

$$\begin{aligned} \pi _2(\theta _2)= & {} \int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x=\int \frac{f_2(x\mid \theta _2)\pi _2^N(\theta _2)}{m_2^N(x)}m_1^N(x)\mathrm{d}x \\= & {} \pi _2^N(\theta _2)\int \frac{f_2(x\mid \theta _2)m_1^N(x)}{m_2^N(x)}\mathrm{d}x=\pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{m_1^N(x)}{m_2^N(x)}\right) , \end{aligned}$$

and the second one from the Fubini’s theorem as

$$\begin{aligned} \pi _2(\theta _2)= & {} \int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x=\int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _1)\pi _1^N(\theta _1)\mathrm{d}\theta _1\mathrm{d}x \\= & {} \int \left( \int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _1)\mathrm{d}x\right) \pi _1^N(\theta _1)d\theta _1=\int \pi (\theta _2\mid \theta _1)\pi _1^N(\theta _1)d\theta _1. \end{aligned}$$

This second way is more comfortable to work with and it is the commonly used in the literature on intrinsic priors. Now, expressing $\pi _2(\theta _2)$ as

$$\begin{aligned} \pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{m_1^N(x)}{m_2^N(x)}\right) =\int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x, \end{aligned}$$

it is clear that integral priors generalize intrinsic priors.

In the case where the simpler model is a point null hypothesis, $H_0:\theta _1=\theta _{10}$, the two ways yield $\pi _2(\theta _2)$ as

$$\begin{aligned} \pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{f_1(x\mid \theta _{10})}{m_2^N(x)}\right) , \end{aligned}$$

and

$$\begin{aligned} \int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _{10})\mathrm{d}x, \end{aligned}$$

respectively, that of course, are the same.

1.2 Intrinsic and integral priors for the comparison of $M_1:N(0,1)$ versus $M_2:N(\theta ,\sigma ^2)$

To compare $M_1:N(0,1)$ versus $M_2:N(\theta ,\sigma ^2)$, with default estimation prior $\pi _2^N(\theta ,\sigma )\propto 1/\sigma $, using the first expression and taking into account that ${m_2^N(x)}$ is equal to $\frac{1}{2\vert x_1-x_2\vert }$, see Berger and Pericchi (1996), the intrinsic prior is

$$\begin{aligned} \frac{1}{\sigma }\int 2\vert x_1-x_2\vert N(x_1\mid \theta ,\sigma ^2)N(x_2\mid \theta ,\sigma ^2)N(x_1\mid 0,1)N(x_2\mid 0,1)\mathrm{d}x_1\mathrm{d}x_2, \end{aligned}$$

that using the change of variables $u=x_1-x_2$ y $v=x_1+x_2$ yields

$$\begin{aligned}&\frac{1}{\sigma }\int \frac{1}{2}2\vert u\vert N((u+v)/2\mid \theta ,\sigma ^2) \\&\quad \times N((v-u)/2\mid \theta ,\sigma ^2)N((u+v)/2\mid 0,1)N((v-u)/2\mid 0,1)\mathrm{d}u\mathrm{d}v \end{aligned}$$

$$\begin{aligned}= & {} \frac{1}{\sigma }\int \vert u\vert \frac{\exp \left( -\frac{\theta ^2}{\sigma ^2+1}-\frac{1}{4} \left( \frac{1}{\sigma ^2}+1\right) u^2\right) }{2 \pi ^{3/2} \sigma ^2\sqrt{\frac{1}{\sigma ^2}+1} }du=\frac{1}{\sigma }\frac{2 e^{-\frac{\theta ^2}{\sigma ^2+1}}}{\pi ^{3/2} \sqrt{\frac{1}{\sigma ^2}+1} \left( \sigma ^2+1\right) } \\= & {} N(\theta \mid 0,(\sigma ^2+1)/2)\frac{2}{\pi (1+\sigma ^2)}. \end{aligned}$$

Of course, as we are dealing with a point null hypothesis integral priors and intrinsic priors coincide and they are unique. Nevertheless, as it is shown next, we can also obtain the integral priors explicitly using the steps of the associated Markov chain, which again highlight the idea that integral and intrinsic priors go in parallel ways while dealing with nested situations but integral priors can go further.

1.3 Obtention of the integral prior for $M_2:N(\theta ,\sigma ^2)$ using the associated Markov chain

The minimal training sample is $x=(x_1,x_2)$, and the posterior distribution with the default estimation prior $\pi ^N(\theta ,\sigma )\propto 1/\sigma $ is therefore

$$\begin{aligned} \pi _2^N(\theta ,\sigma \mid x)\propto \sigma ^{-3}\exp \left( -\frac{1}{2\sigma ^2} \left( s^2+2(\theta -\bar{x})^2\right) \right) , \end{aligned}$$

with $s^2=(x_1-x_2)^2/2$ and $\bar{x}=(x_1+x_2)/2$. Then, $\pi _2^N(\theta \mid \sigma ,x)=N(\theta \mid \bar{x},\sigma ^2/2)$ and

$$\begin{aligned} \pi _2^N(\sigma \mid x)\propto \sigma ^{-2}\exp \left( -\frac{s^2}{2\sigma ^2}\right) . \end{aligned}$$

Now, the associated Markov chain transition consists of just two steps as in the toy example since we are dealing with a point null hypothesis again. First, it is simulated x from model $M_1$ and secondly it is simulated from $\pi _2^N(\theta ,\sigma \mid x)$. Then, $ \theta =\bar{x}+\sigma \varepsilon _1/\sqrt{2}, $ where $\varepsilon _1\sim N(0,1)$ and $\bar{x}\sim N(0,1/2)$, and therefore $\pi (\theta \mid \sigma )=N(\theta \mid 0,(\sigma ^2+1)/2)$ and $ \pi (\sigma )= \int \pi _2^N(\sigma \mid x)p(s^2)ds^2, $ where $p(s^2)$ is $\chi ^2_1$ density. Then, normalizing $\pi _2^N(\sigma \mid x)$ we obtain $ \pi (\sigma )=2/\pi (\sigma ^2+1), $ and finally we obtain again $\pi (\theta ,\sigma )= 2N(\theta \mid 0,(\sigma ^2+1)/2)/\pi (\sigma ^2+1)$.

Appendix 2. Computation of the marginal $m_{2}(\mathbf {x})$ in the comparison of the normal model versus the double exponential one

The ordered statistics of $\mathbf {x},$ $(x_{(1)},\ldots ,x_{(n)}),$ are needed to compute the marginal $m_{2}(\mathbf {x})$. Denoting $R_{1}=(-\infty ,x_{(1)})$, $R_{j}=(x_{(j-1)}, x_{(j)})$ for $j=2,\ldots ,n$ and $R_{n+1}=(x_{(n)},\infty )$, we have that

$$\begin{aligned} m_{2}(\mathbf {x})=\sum _{j=1}^{n+1}\int _{R_{j}}2^{-n} \exp \left( -\sum _{i=1}^{n}|x_{i}-\lambda |\right) \mathrm{d}\lambda . \end{aligned}$$

Let $H_{j}(\mathbf x ,\lambda )$, $j=1,\ldots ,n+1,$ be the value of the function $-\sum _{i=1}^{n}|x_{i}-\lambda |$ in the region $R_{j}$, that is, $H_{1}(\mathbf x ,\lambda )=n\lambda -\sum _{i=1}^{n}x_{(i)},$ $H_{n+1}(\mathbf x ,\lambda )=\sum _{i=1}^{n}x_{(i)}-n\lambda ,$ and $H_{j}(\mathbf x ,\lambda )=\sum _{i=1}^{j-1}x_{(i)}+(n-2(j-1))\lambda -\sum _{i=j}^{n}x_{(i)}$ for $j=2,\ldots ,n$. Then,

$$\begin{aligned} m_{2}(\mathbf x )=2^{-n}\sum _{j=1}^{n+1}\int _{R_{j}}\exp (H_{j}(\mathbf x ,\lambda ))\mathrm{d}\lambda =2^{-n}\sum _{j=1}^{n+1}I_{j}. \end{aligned}$$

Straightforward computations yield $I_{1}=\frac{1}{n}\exp \left( (n-1)x_{(1)}-\sum _{i=2}^{n}x_{(i)}\right) .$ For the cases $j=2,\ldots ,n$ and $j\ne n/2+1$ we have:

$$\begin{aligned}&I_{j}=\frac{1}{n-2(j-1)}\exp \left( \sum _{i=1}^{j-1}x_{(i)}-\sum _{i=j}^{n}x_{(i)}\right) \\&\quad \times \left[ \exp \left( (n-2(j-1))x_{(j)}\right) -\exp \left( (n-2(j-1))x_{(j-1)}\right) \right] . \end{aligned}$$

For even n we obtain $I_{j}$ for $j=n/2+1$ as:

$$\begin{aligned} I_{j}=\exp \left( \sum _{i=1}^{j-1}x_{(i)}-\sum _{i=j}^{n}x_{(i)}\right) \left( x_{(j)}-x_{(j-1)}\right) , \end{aligned}$$

therefore for the sake of simplicity and without loss of generality we have restricted ourselves to the case of odd n. Finally,

$$\begin{aligned} I_{n+1}=\frac{1}{n}\exp \left( \sum _{i=1}^{n-1}x_{(i)}-(n-1)x_{(n)}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cano, J.A., Iniesta, M. & Salmerón, D. Integral priors for Bayesian model selection: how they operate from simple to complex cases. TEST 27, 968–987 (2018). https://doi.org/10.1007/s11749-018-0579-1

Download citation

Received: 26 May 2017
Accepted: 04 February 2018
Published: 10 February 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11749-018-0579-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integral priors for Bayesian model selection: how they operate from simple to complex cases

Abstract

Access this article

Similar content being viewed by others

Objective Bayesian inference with proper scoring rules

Bayesian Variable Selection for Linear Models Using I-Priors

Empirical Bayes methods in classical and Bayesian inference

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Integral and intrinsic priors for testing a point null hypothesis in a normal model with unknown mean and variance

1.1 Intrinsic priors for nested models with the default estimation prior for model \(M_1\) as its intrinsic prior

1.2 Intrinsic and integral priors for the comparison of \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\)

1.3 Obtention of the integral prior for \(M_2:N(\theta ,\sigma ^2)\) using the associated Markov chain

Appendix 2. Computation of the marginal \(m_{2}(\mathbf {x})\) in the comparison of the normal model versus the double exponential one

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Integral priors for Bayesian model selection: how they operate from simple to complex cases

Abstract

Access this article

Similar content being viewed by others

Objective Bayesian inference with proper scoring rules

Bayesian Variable Selection for Linear Models Using I-Priors

Empirical Bayes methods in classical and Bayesian inference

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Integral and intrinsic priors for testing a point null hypothesis in a normal model with unknown mean and variance

1.1 Intrinsic priors for nested models with the default estimation prior for model \(M_1\) as its intrinsic prior

1.2 Intrinsic and integral priors for the comparison of \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\)

1.3 Obtention of the integral prior for \(M_2:N(\theta ,\sigma ^2)\) using the associated Markov chain

Appendix 2. Computation of the marginal \(m_{2}(\mathbf {x})\) in the comparison of the normal model versus the double exponential one

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation