Abstract
In this paper we consider the Bayesian approach to the problem of variable selection in normal linear regression models with related predictors. We adopt a generalized singular \(g\)-prior distribution for the unknown model parameters and the beta-prime prior for the scaling factor \(g\), which results in a closed-form expression of the marginal posterior distribution without integral representation. A special prior on the model space is then advocated to reflect and maintain the hierarchical or structural relationships among predictors. It is shown that under some nominal assumptions, the proposed approach is consistent in terms of model selection and prediction. Simulation studies show that our proposed approach has a good performance for structured variable selection in linear regression models. Finally, a real-data example is analyzed for illustrative purposes.
Similar content being viewed by others
References
Baragatti M, Pommeret D (2012) A study of variable selection using g-prior distribution with ridge parameter. Comput Stat Data Anal 56:1920–1934
Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897
Bartlett M (1957) A comment on D.V. Lindley’s statistical paradox. Biometrika 44:533–534
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80:580–619
Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J R Stat Soc Ser B 60:627–641
Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101:157–167
Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90:1313–1321
Chipman H (1996) Bayesian variable selection with related predictors. Can J Stat 24:17–36
Chipman H, Hamada M, Wu C (1997) A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381
Cui W, George EI (2008) Empirical Bayes vs. fully Bayes variable selection. J Stat Plan Inference 138:888–900
Farcomeni A (2010) Bayesian constrained variable selection. Stat Sin 20:1043–1062
Fernández C, Ley E, Steel MFJ (2001) Benchmark priors for Bayesian model averaging. J Econom 100:381–427
Foster DP, George EI (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975
George E, McCulloch R (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 22:881–889
George E, McCulloch R (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373
George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87:731–747
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics, 4 (Peñíscola, 1991). Oxford University Press, New York, pp 169–193
Guo R, Speckman PL (2009) Bayes factor consistency in linear models. In: 2009 international workshop on objective bayes methodology, Philadelphia, June 5–9, 2009
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Lamnisos D, Griffin JE, Steel MFJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Graph Stat 18:592–612
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of \(g\) priors for Bayesian variable selection. J Am Stat Assoc 103:410–423
Maruyama Y (2009) A Bayes factor with reasonable model selection consistency for ANOVA model. arXiv:0906.4329v1 [stat.ME]
Maruyama Y, George EI (2011) Fully Bayes factors with a generalized g-prior. Ann. Stat. 39:2740–2765
Maruyama Y, Strawderman WE (2010) Robust Bayesian variable selection with sub-harmonic priors. arXiv:1009.1926v3 [stat.ME]
Nelder J (1994) The statistics of linear models: back to basics. Stat Comput 4:221–234 (with discussion in, vol. 5 (1995) 84–111)
Panagiotelis A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. J Econom 143:291–316
Raftery A, Madigan D, Hoeting J (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191
Raftery AE, Lewis SM (1992) One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat Sci 7:493–497
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343
Song X, Lu Z (2011) Response to “Comments on ‘Bayesian variable selection for disease classification using gene expression data’ ”. Bioinformatics 27:2169–2170
Wang M, Sun X (2013) Bayes factor consistency for unbalanced ANOVA models. Stat A J Theor Appl Stat 47:1104–1115
West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat 7:723–732
Yang A, Song X (2010) Bayesian variable selection for disease classification using gene expression data. Bioinformatics 26:215–222
Yuan M, Joseph V, Zou H (2009) Structured variable selection and estimation. Ann Appl Stat 3:1738–1757
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with \(g\)-prior distributions. In: Goel PK, Zellner A (eds) Bayesian inference and decision techniques, Studies in Bayesian Econometrics and Statistics. North-Holland, Amsterdam, vol. 6, pp 233–243
Acknowledgments
We would like to thank the editor, the associate editor, and referees for their constructive comments that led to a marked improvement of the article. The first author was partially supported by the New Faculty Start-Up Fund at Michigan Technological University.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 2
The well-known Stirling’s formula is the asymptotic relation given by
when \(x\) is sufficiently large. Here, ‘\(f \approx g\)’ means that the ratio of the two sides approaches 1 as \(x\) goes to infinity. For our problem, when \(n\) approaches infinity, it may be verified that
For comparing an arbitrary model \(M_{\gamma '}\) and the true model \(M_\gamma \), we assume the design matrix of the linear models satisfies the assumption in (22). Then the posterior consistency for model selection means that when sampling from \(M_\gamma \), it follows that
where the model \(M_{\gamma '} \ne M_{\gamma }\) and \(f(\gamma \mid \mathbf{Y})\) is given by Eq. (16).
In what follows, for simplicity of notation, let \(c_i\) be some constants independent of the sample size \(n\) for \(i =1, \ldots , 4\). It can be seen from Lemma A. 1 of Fernández et al. (2001) that when sampling from the model \(M_\gamma \) which is nested within or equal to model \(M_{\gamma '}\), we have
and that when sampling from the model \(M_{\gamma '}\) which does not nest \(M_{\gamma }\), we have
where \(c_{\gamma '}\) is given by Eq. (22). To show the consistency of model selection, we now consider the following two situations.
-
(a)
If \(M_\gamma \not \subseteq M_{\gamma '}\), then by using the relationship in (29), as \(n\) approaches infinity, Eq. (30) can be written as
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&\,{=}\, \frac{\Gamma {\Bigl (\frac{m_{\gamma '} {+} 2a {+} 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_{\gamma '}}{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_{\gamma '}\bigr )^{-(n+ a_ 0 - m_{\gamma '})/2 + a + 1}\pi ({\gamma '})}{\Gamma {\Bigl (\frac{m_\gamma + 2a + 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_\gamma }{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_\gamma \bigr )^{-(n+ a_ 0 -m_\gamma )/2 + a +1}\pi (\gamma )}\\&= c_1 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{1 - \tilde{R}^2_{\gamma '}}{1 - \tilde{R}^2_{\gamma }}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_3 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2}\biggl (\frac{\sigma ^2}{\sigma ^2 + c_{\gamma '}}\biggr )^{n/2} \\&= 0, \end{aligned}$$because \(c_{\gamma '} >0\) and the term \(\big (\sigma ^2/(\sigma ^2 + c_{\gamma '})\big )^{n/2}\) converges to 0 in probability exponentially fast as \(n\) approaches infinity, no matter what value of \(m_{\gamma }-m_{\gamma '}\). Thus, the limit of Eq. (30) converges to 0 with respect to \(n\).
-
(b)
If \(M_{\gamma } \subseteq M_{\gamma '}\), it is immediate from the result of Fernández et al. (2001) that we have
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } {\biggl (\frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma })\mathbf{Y}/n}{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}\biggr )^{n/2}} \mathop {\longrightarrow }\limits ^{D} \exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_{\gamma }}}{2}\biggr )}, \end{aligned}$$where \(\mathop {\longrightarrow }\limits ^{D}\) means convergence in distribution. As \(n\) approaches infinity, the limit of Eq. (30) becomes
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_4\mathop {\hbox {plim}}\limits _{n\rightarrow \infty }{n^{(m_\gamma -m_{\gamma '})/2}}\exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_\gamma }}{2}\biggr )} \\&= 0, \end{aligned}$$because \(m_\gamma - m_{\gamma '} < 0\) for \(M_\gamma \subseteq M_{\gamma '}\). This completed the proof.
\(\square \)
Proof of Theorem 3:
We consider the following two situations.
-
(a)
When \(\mathbf{M}_\gamma = \mathbf{M}_N\), it follows directly from the consistency of least squares estimators that \(\Vert \hat{\varvec{\beta }}_\gamma \Vert \rightarrow 0\), and therefore, the consistency of the BMA estimates follows.
-
(b)
When \(\mathbf{M}_\gamma \ne \mathbf{M}_N\), if follows from Theorem 2 that \(\mathop {\hbox {plim}}\limits \nolimits _{n\rightarrow \infty } P(M_\gamma \mid \mathbf{Y}) = 1\) when \(M_\gamma \) is the true model. In addition, following the result of Liang et al. (2008), we have
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = \frac{\hat{g}}{1 + \hat{g}}\biggl (1 + O\Bigl (\frac{1}{n}\Bigr )\biggr ), \end{aligned}$$where \(\hat{g}\) can be obtained by maximizing the function of the form
$$\begin{aligned} L(g) = (1+g)^{(n-m_\gamma +a_0)/2} \bigl (1 + g(1-\tilde{R}^2_\gamma )\bigr )^{-(n+a_0)/2}. \end{aligned}$$Taking the first partial derivative of \(\log (L(g))\) with respect to \(g\) and putting it equal to 0, it provides
$$\begin{aligned} \hat{g} = \max \biggl \{\frac{\tilde{R}^2_\gamma /m_\gamma }{(1-\tilde{R}^2_\gamma )/(n-m_\gamma +a_0)}-1, 0 \biggr \}. \end{aligned}$$Note that \(\hat{g}\) approaches infinity under the true model \(M_\gamma \) as \(n\) tends to infinity, and therefore, we obtain
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = 1. \end{aligned}$$Using the consistency property of the least squares estimators, it is easy to show that
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\hat{Y}_f = \mathbb {E}[Y_f] = \alpha {\mathbf{1}_m} + \mathbf{X}_f \varvec{\beta }_\gamma . \end{aligned}$$This completes the proof.\(\square \)
Rights and permissions
About this article
Cite this article
Wang, M., Sun, X. & Lu, T. Bayesian structured variable selection in linear regression models. Comput Stat 30, 205–229 (2015). https://doi.org/10.1007/s00180-014-0529-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0529-7