Bayesian structured variable selection in linear regression models

Wang, Min; Sun, Xiaoqian; Lu, Tao

doi:10.1007/s00180-014-0529-7

Bayesian structured variable selection in linear regression models

Original Paper
Published: 02 September 2014

Volume 30, pages 205–229, (2015)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Min Wang¹,
Xiaoqian Sun² &
Tao Lu³

679 Accesses
7 Citations
Explore all metrics

Abstract

In this paper we consider the Bayesian approach to the problem of variable selection in normal linear regression models with related predictors. We adopt a generalized singular $g$-prior distribution for the unknown model parameters and the beta-prime prior for the scaling factor $g$, which results in a closed-form expression of the marginal posterior distribution without integral representation. A special prior on the model space is then advocated to reflect and maintain the hierarchical or structural relationships among predictors. It is shown that under some nominal assumptions, the proposed approach is consistent in terms of model selection and prediction. Simulation studies show that our proposed approach has a good performance for structured variable selection in linear regression models. Finally, a real-data example is analyzed for illustrative purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Variable Selection for Linear Models Using I-Priors

Bayesian Variable Selection for Generalized Linear Models Using the Power-Conditional-Expected-Posterior Prior

On consistency and optimality of Bayesian variable selection based on $$g$$ -prior in normal linear regression models

Article 24 September 2014

References

Baragatti M, Pommeret D (2012) A study of variable selection using g-prior distribution with ridge parameter. Comput Stat Data Anal 56:1920–1934
Article MATH MathSciNet Google Scholar
Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897
Article MATH MathSciNet Google Scholar
Bartlett M (1957) A comment on D.V. Lindley’s statistical paradox. Biometrika 44:533–534
Article MATH MathSciNet Google Scholar
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80:580–619
Article MATH MathSciNet Google Scholar
Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J R Stat Soc Ser B 60:627–641
Article MATH MathSciNet Google Scholar
Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101:157–167
Article MATH MathSciNet Google Scholar
Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90:1313–1321
Article MATH MathSciNet Google Scholar
Chipman H (1996) Bayesian variable selection with related predictors. Can J Stat 24:17–36
Article MATH MathSciNet Google Scholar
Chipman H, Hamada M, Wu C (1997) A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381
Article MATH Google Scholar
Cui W, George EI (2008) Empirical Bayes vs. fully Bayes variable selection. J Stat Plan Inference 138:888–900
Article MATH MathSciNet Google Scholar
Farcomeni A (2010) Bayesian constrained variable selection. Stat Sin 20:1043–1062
MATH MathSciNet Google Scholar
Fernández C, Ley E, Steel MFJ (2001) Benchmark priors for Bayesian model averaging. J Econom 100:381–427
Article MATH Google Scholar
Foster DP, George EI (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975
Article MATH MathSciNet Google Scholar
George E, McCulloch R (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 22:881–889
Article Google Scholar
George E, McCulloch R (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373
MATH Google Scholar
George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87:731–747
Article MATH MathSciNet Google Scholar
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics, 4 (Peñíscola, 1991). Oxford University Press, New York, pp 169–193
Guo R, Speckman PL (2009) Bayes factor consistency in linear models. In: 2009 international workshop on objective bayes methodology, Philadelphia, June 5–9, 2009
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Article MATH Google Scholar
Lamnisos D, Griffin JE, Steel MFJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Graph Stat 18:592–612
Article MathSciNet Google Scholar
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of $g$ priors for Bayesian variable selection. J Am Stat Assoc 103:410–423
Article MATH MathSciNet Google Scholar
Maruyama Y (2009) A Bayes factor with reasonable model selection consistency for ANOVA model. arXiv:0906.4329v1 [stat.ME]
Maruyama Y, George EI (2011) Fully Bayes factors with a generalized g-prior. Ann. Stat. 39:2740–2765
Article MATH MathSciNet Google Scholar
Maruyama Y, Strawderman WE (2010) Robust Bayesian variable selection with sub-harmonic priors. arXiv:1009.1926v3 [stat.ME]
Nelder J (1994) The statistics of linear models: back to basics. Stat Comput 4:221–234 (with discussion in, vol. 5 (1995) 84–111)
Article Google Scholar
Panagiotelis A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. J Econom 143:291–316
Article MathSciNet Google Scholar
Raftery A, Madigan D, Hoeting J (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191
Article MATH MathSciNet Google Scholar
Raftery AE, Lewis SM (1992) One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat Sci 7:493–497
Article Google Scholar
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343
Article MATH Google Scholar
Song X, Lu Z (2011) Response to “Comments on ‘Bayesian variable selection for disease classification using gene expression data’ ”. Bioinformatics 27:2169–2170
Article Google Scholar
Wang M, Sun X (2013) Bayes factor consistency for unbalanced ANOVA models. Stat A J Theor Appl Stat 47:1104–1115
MATH Google Scholar
West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat 7:723–732
Google Scholar
Yang A, Song X (2010) Bayesian variable selection for disease classification using gene expression data. Bioinformatics 26:215–222
Article Google Scholar
Yuan M, Joseph V, Zou H (2009) Structured variable selection and estimation. Ann Appl Stat 3:1738–1757
Article MATH MathSciNet Google Scholar
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In: Goel PK, Zellner A (eds) Bayesian inference and decision techniques, Studies in Bayesian Econometrics and Statistics. North-Holland, Amsterdam, vol. 6, pp 233–243

Download references

Acknowledgments

We would like to thank the editor, the associate editor, and referees for their constructive comments that led to a marked improvement of the article. The first author was partially supported by the New Faculty Start-Up Fund at Michigan Technological University.

Author information

Authors and Affiliations

Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
Min Wang
Department of Mathematical Sciences, Clemson University, Clemson, SC, 29634, USA
Xiaoqian Sun
Department of Epidemiology and Biostatistics, State University of New York, Albany, NY, 12144, USA
Tao Lu

Authors

Min Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tao Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Wang.

Appendix

Proof of Theorem 2

The well-known Stirling’s formula is the asymptotic relation given by

$$\begin{aligned} \Gamma {(c_{1}x +c_{2})} \approx \sqrt{2\pi }e^{-c_{1}x}(c_{1}x)^{c_{1}x + c_{2} - 1/2}, \end{aligned}$$

(28)

when $x$ is sufficiently large. Here, ‘$f \approx g$’ means that the ratio of the two sides approaches 1 as $x$ goes to infinity. For our problem, when $n$ approaches infinity, it may be verified that

$$\begin{aligned} \frac{\Gamma {\big ((n+a_{0} - m_{\gamma '})/2\big )}}{\Gamma {\big ((n+a_{0} - m_{\gamma })/2\big )}} \approx \biggl (\frac{n}{2}\biggr )^{(m_{\gamma }-m_{\gamma '})/2}. \end{aligned}$$

(29)

For comparing an arbitrary model $M_{\gamma '}$ and the true model $M_\gamma $, we assume the design matrix of the linear models satisfies the assumption in (22). Then the posterior consistency for model selection means that when sampling from $M_\gamma $, it follows that

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }{\frac{p(M_{\gamma '} \mid \mathbf{Y})}{p(M_{\gamma } \mid \mathbf{Y})}} = \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}= 0, \end{aligned}$$

(30)

where the model $M_{\gamma '} \ne M_{\gamma }$ and $f(\gamma \mid \mathbf{Y})$ is given by Eq. (16).

In what follows, for simplicity of notation, let $c_i$ be some constants independent of the sample size $n$ for $i =1, \ldots , 4$. It can be seen from Lemma A. 1 of Fernández et al. (2001) that when sampling from the model $M_\gamma $ which is nested within or equal to model $M_{\gamma '}$, we have

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } \frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{n} = \sigma ^2, \end{aligned}$$

and that when sampling from the model $M_{\gamma '}$ which does not nest $M_{\gamma }$, we have

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } \frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{n} = \sigma ^2 + c_{\gamma '}, \end{aligned}$$

where $c_{\gamma '}$ is given by Eq. (22). To show the consistency of model selection, we now consider the following two situations.

(a)
If $M_\gamma \not \subseteq M_{\gamma '}$, then by using the relationship in (29), as $n$ approaches infinity, Eq. (30) can be written as
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&\,{=}\, \frac{\Gamma {\Bigl (\frac{m_{\gamma '} {+} 2a {+} 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_{\gamma '}}{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_{\gamma '}\bigr )^{-(n+ a_ 0 - m_{\gamma '})/2 + a + 1}\pi ({\gamma '})}{\Gamma {\Bigl (\frac{m_\gamma + 2a + 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_\gamma }{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_\gamma \bigr )^{-(n+ a_ 0 -m_\gamma )/2 + a +1}\pi (\gamma )}\\&= c_1 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{1 - \tilde{R}^2_{\gamma '}}{1 - \tilde{R}^2_{\gamma }}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_3 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2}\biggl (\frac{\sigma ^2}{\sigma ^2 + c_{\gamma '}}\biggr )^{n/2} \\&= 0, \end{aligned}$$
because $c_{\gamma '} >0$ and the term $\big (\sigma ^2/(\sigma ^2 + c_{\gamma '})\big )^{n/2}$ converges to 0 in probability exponentially fast as $n$ approaches infinity, no matter what value of $m_{\gamma }-m_{\gamma '}$. Thus, the limit of Eq. (30) converges to 0 with respect to $n$.
(b)
If $M_{\gamma } \subseteq M_{\gamma '}$, it is immediate from the result of Fernández et al. (2001) that we have
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } {\biggl (\frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma })\mathbf{Y}/n}{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}\biggr )^{n/2}} \mathop {\longrightarrow }\limits ^{D} \exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_{\gamma }}}{2}\biggr )}, \end{aligned}$$
where $\mathop {\longrightarrow }\limits ^{D}$ means convergence in distribution. As $n$ approaches infinity, the limit of Eq. (30) becomes
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_4\mathop {\hbox {plim}}\limits _{n\rightarrow \infty }{n^{(m_\gamma -m_{\gamma '})/2}}\exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_\gamma }}{2}\biggr )} \\&= 0, \end{aligned}$$
because $m_\gamma - m_{\gamma '} < 0$ for $M_\gamma \subseteq M_{\gamma '}$. This completed the proof.

$\square $

Proof of Theorem 3:

We consider the following two situations.

(a)
When $\mathbf{M}_\gamma = \mathbf{M}_N$, it follows directly from the consistency of least squares estimators that $\Vert \hat{\varvec{\beta }}_\gamma \Vert \rightarrow 0$, and therefore, the consistency of the BMA estimates follows.
(b)
When $\mathbf{M}_\gamma \ne \mathbf{M}_N$, if follows from Theorem 2 that $\mathop {\hbox {plim}}\limits \nolimits _{n\rightarrow \infty } P(M_\gamma \mid \mathbf{Y}) = 1$ when $M_\gamma $ is the true model. In addition, following the result of Liang et al. (2008), we have
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = \frac{\hat{g}}{1 + \hat{g}}\biggl (1 + O\Bigl (\frac{1}{n}\Bigr )\biggr ), \end{aligned}$$
where $\hat{g}$ can be obtained by maximizing the function of the form
$$\begin{aligned} L(g) = (1+g)^{(n-m_\gamma +a_0)/2} \bigl (1 + g(1-\tilde{R}^2_\gamma )\bigr )^{-(n+a_0)/2}. \end{aligned}$$
Taking the first partial derivative of $\log (L(g))$ with respect to $g$ and putting it equal to 0, it provides
$$\begin{aligned} \hat{g} = \max \biggl \{\frac{\tilde{R}^2_\gamma /m_\gamma }{(1-\tilde{R}^2_\gamma )/(n-m_\gamma +a_0)}-1, 0 \biggr \}. \end{aligned}$$
Note that $\hat{g}$ approaches infinity under the true model $M_\gamma $ as $n$ tends to infinity, and therefore, we obtain
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = 1. \end{aligned}$$
Using the consistency property of the least squares estimators, it is easy to show that
$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\hat{Y}_f = \mathbb {E}[Y_f] = \alpha {\mathbf{1}_m} + \mathbf{X}_f \varvec{\beta }_\gamma . \end{aligned}$$
This completes the proof.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Sun, X. & Lu, T. Bayesian structured variable selection in linear regression models. Comput Stat 30, 205–229 (2015). https://doi.org/10.1007/s00180-014-0529-7

Download citation

Received: 25 July 2013
Accepted: 21 August 2014
Published: 02 September 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s00180-014-0529-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian structured variable selection in linear regression models

Abstract

Access this article

Similar content being viewed by others

Bayesian Variable Selection for Linear Models Using I-Priors

Bayesian Variable Selection for Generalized Linear Models Using the Power-Conditional-Expected-Posterior Prior

On consistency and optimality of Bayesian variable selection based on $$g$$ -prior in normal linear regression models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 2

Proof of Theorem 3:

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian structured variable selection in linear regression models

Abstract

Access this article

Similar content being viewed by others

Bayesian Variable Selection for Linear Models Using I-Priors

Bayesian Variable Selection for Generalized Linear Models Using the Power-Conditional-Expected-Posterior Prior

On consistency and optimality of Bayesian variable selection based on $$g$$ -prior in normal linear regression models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 2

Proof of Theorem 3:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation