Skip to main content
Log in

Bayesian structured variable selection in linear regression models

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this paper we consider the Bayesian approach to the problem of variable selection in normal linear regression models with related predictors. We adopt a generalized singular \(g\)-prior distribution for the unknown model parameters and the beta-prime prior for the scaling factor \(g\), which results in a closed-form expression of the marginal posterior distribution without integral representation. A special prior on the model space is then advocated to reflect and maintain the hierarchical or structural relationships among predictors. It is shown that under some nominal assumptions, the proposed approach is consistent in terms of model selection and prediction. Simulation studies show that our proposed approach has a good performance for structured variable selection in linear regression models. Finally, a real-data example is analyzed for illustrative purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Baragatti M, Pommeret D (2012) A study of variable selection using g-prior distribution with ridge parameter. Comput Stat Data Anal 56:1920–1934

    Article  MATH  MathSciNet  Google Scholar 

  • Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897

    Article  MATH  MathSciNet  Google Scholar 

  • Bartlett M (1957) A comment on D.V. Lindley’s statistical paradox. Biometrika 44:533–534

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80:580–619

    Article  MATH  MathSciNet  Google Scholar 

  • Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J R Stat Soc Ser B 60:627–641

    Article  MATH  MathSciNet  Google Scholar 

  • Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101:157–167

    Article  MATH  MathSciNet  Google Scholar 

  • Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90:1313–1321

    Article  MATH  MathSciNet  Google Scholar 

  • Chipman H (1996) Bayesian variable selection with related predictors. Can J Stat 24:17–36

    Article  MATH  MathSciNet  Google Scholar 

  • Chipman H, Hamada M, Wu C (1997) A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381

    Article  MATH  Google Scholar 

  • Cui W, George EI (2008) Empirical Bayes vs. fully Bayes variable selection. J Stat Plan Inference 138:888–900

    Article  MATH  MathSciNet  Google Scholar 

  • Farcomeni A (2010) Bayesian constrained variable selection. Stat Sin 20:1043–1062

    MATH  MathSciNet  Google Scholar 

  • Fernández C, Ley E, Steel MFJ (2001) Benchmark priors for Bayesian model averaging. J Econom 100:381–427

    Article  MATH  Google Scholar 

  • Foster DP, George EI (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975

    Article  MATH  MathSciNet  Google Scholar 

  • George E, McCulloch R (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 22:881–889

    Article  Google Scholar 

  • George E, McCulloch R (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373

    MATH  Google Scholar 

  • George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87:731–747

    Article  MATH  MathSciNet  Google Scholar 

  • Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics, 4 (Peñíscola, 1991). Oxford University Press, New York, pp 169–193

  • Guo R, Speckman PL (2009) Bayes factor consistency in linear models. In: 2009 international workshop on objective bayes methodology, Philadelphia, June 5–9, 2009

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    Article  MATH  Google Scholar 

  • Lamnisos D, Griffin JE, Steel MFJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Graph Stat 18:592–612

    Article  MathSciNet  Google Scholar 

  • Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of \(g\) priors for Bayesian variable selection. J Am Stat Assoc 103:410–423

    Article  MATH  MathSciNet  Google Scholar 

  • Maruyama Y (2009) A Bayes factor with reasonable model selection consistency for ANOVA model. arXiv:0906.4329v1 [stat.ME]

  • Maruyama Y, George EI (2011) Fully Bayes factors with a generalized g-prior. Ann. Stat. 39:2740–2765

    Article  MATH  MathSciNet  Google Scholar 

  • Maruyama Y, Strawderman WE (2010) Robust Bayesian variable selection with sub-harmonic priors. arXiv:1009.1926v3 [stat.ME]

  • Nelder J (1994) The statistics of linear models: back to basics. Stat Comput 4:221–234 (with discussion in, vol. 5 (1995) 84–111)

    Article  Google Scholar 

  • Panagiotelis A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. J Econom 143:291–316

    Article  MathSciNet  Google Scholar 

  • Raftery A, Madigan D, Hoeting J (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191

    Article  MATH  MathSciNet  Google Scholar 

  • Raftery AE, Lewis SM (1992) One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat Sci 7:493–497

    Article  Google Scholar 

  • Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343

    Article  MATH  Google Scholar 

  • Song X, Lu Z (2011) Response to “Comments on ‘Bayesian variable selection for disease classification using gene expression data’ ”. Bioinformatics 27:2169–2170

    Article  Google Scholar 

  • Wang M, Sun X (2013) Bayes factor consistency for unbalanced ANOVA models. Stat A J Theor Appl Stat 47:1104–1115

    MATH  Google Scholar 

  • West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat 7:723–732

    Google Scholar 

  • Yang A, Song X (2010) Bayesian variable selection for disease classification using gene expression data. Bioinformatics 26:215–222

    Article  Google Scholar 

  • Yuan M, Joseph V, Zou H (2009) Structured variable selection and estimation. Ann Appl Stat 3:1738–1757

    Article  MATH  MathSciNet  Google Scholar 

  • Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with \(g\)-prior distributions. In: Goel PK, Zellner A (eds) Bayesian inference and decision techniques, Studies in Bayesian Econometrics and Statistics. North-Holland, Amsterdam, vol. 6, pp 233–243

Download references

Acknowledgments

We would like to thank the editor, the associate editor, and referees for their constructive comments that led to a marked improvement of the article. The first author was partially supported by the New Faculty Start-Up Fund at Michigan Technological University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Wang.

Appendix

Appendix

Proof of Theorem 2

The well-known Stirling’s formula is the asymptotic relation given by

$$\begin{aligned} \Gamma {(c_{1}x +c_{2})} \approx \sqrt{2\pi }e^{-c_{1}x}(c_{1}x)^{c_{1}x + c_{2} - 1/2}, \end{aligned}$$
(28)

when \(x\) is sufficiently large. Here, ‘\(f \approx g\)’ means that the ratio of the two sides approaches 1 as \(x\) goes to infinity. For our problem, when \(n\) approaches infinity, it may be verified that

$$\begin{aligned} \frac{\Gamma {\big ((n+a_{0} - m_{\gamma '})/2\big )}}{\Gamma {\big ((n+a_{0} - m_{\gamma })/2\big )}} \approx \biggl (\frac{n}{2}\biggr )^{(m_{\gamma }-m_{\gamma '})/2}. \end{aligned}$$
(29)

For comparing an arbitrary model \(M_{\gamma '}\) and the true model \(M_\gamma \), we assume the design matrix of the linear models satisfies the assumption in (22). Then the posterior consistency for model selection means that when sampling from \(M_\gamma \), it follows that

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }{\frac{p(M_{\gamma '} \mid \mathbf{Y})}{p(M_{\gamma } \mid \mathbf{Y})}} = \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}= 0, \end{aligned}$$
(30)

where the model \(M_{\gamma '} \ne M_{\gamma }\) and \(f(\gamma \mid \mathbf{Y})\) is given by Eq. (16).

In what follows, for simplicity of notation, let \(c_i\) be some constants independent of the sample size \(n\) for \(i =1, \ldots , 4\). It can be seen from Lemma A. 1 of Fernández et al. (2001) that when sampling from the model \(M_\gamma \) which is nested within or equal to model \(M_{\gamma '}\), we have

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } \frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{n} = \sigma ^2, \end{aligned}$$

and that when sampling from the model \(M_{\gamma '}\) which does not nest \(M_{\gamma }\), we have

$$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } \frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{n} = \sigma ^2 + c_{\gamma '}, \end{aligned}$$

where \(c_{\gamma '}\) is given by Eq. (22). To show the consistency of model selection, we now consider the following two situations.

  1. (a)

    If \(M_\gamma \not \subseteq M_{\gamma '}\), then by using the relationship in (29), as \(n\) approaches infinity, Eq. (30) can be written as

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&\,{=}\, \frac{\Gamma {\Bigl (\frac{m_{\gamma '} {+} 2a {+} 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_{\gamma '}}{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_{\gamma '}\bigr )^{-(n+ a_ 0 - m_{\gamma '})/2 + a + 1}\pi ({\gamma '})}{\Gamma {\Bigl (\frac{m_\gamma + 2a + 2}{2}\Bigr )}\Gamma {\Bigl (\frac{n + a_0 - m_\gamma }{2}\Bigr )}\bigl (1 - \widetilde{R}^{2}_\gamma \bigr )^{-(n+ a_ 0 -m_\gamma )/2 + a +1}\pi (\gamma )}\\&= c_1 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{1 - \tilde{R}^2_{\gamma '}}{1 - \tilde{R}^2_{\gamma }}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}}{b_0 + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}}\biggr )^{-n/2}\\&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_3 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2}\biggl (\frac{\sigma ^2}{\sigma ^2 + c_{\gamma '}}\biggr )^{n/2} \\&= 0, \end{aligned}$$

    because \(c_{\gamma '} >0\) and the term \(\big (\sigma ^2/(\sigma ^2 + c_{\gamma '})\big )^{n/2}\) converges to 0 in probability exponentially fast as \(n\) approaches infinity, no matter what value of \(m_{\gamma }-m_{\gamma '}\). Thus, the limit of Eq. (30) converges to 0 with respect to \(n\).

  2. (b)

    If \(M_{\gamma } \subseteq M_{\gamma '}\), it is immediate from the result of Fernández et al. (2001) that we have

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } {\biggl (\frac{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma })\mathbf{Y}/n}{\mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}\biggr )^{n/2}} \mathop {\longrightarrow }\limits ^{D} \exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_{\gamma }}}{2}\biggr )}, \end{aligned}$$

    where \(\mathop {\longrightarrow }\limits ^{D}\) means convergence in distribution. As \(n\) approaches infinity, the limit of Eq. (30) becomes

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\frac{f(\gamma '\mid \mathbf{Y})}{f(\gamma \mid \mathbf{Y})}&= c_2 \mathop {\hbox {plim}}\limits _{n\rightarrow \infty } n^{(m_{\gamma }-m_{\gamma '})/2} \biggl (\frac{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_{\gamma '})\mathbf{Y}/n}{b_0/n + \mathbf{Y}'(\mathbf{I}_n - \mathbf{H}_\gamma )\mathbf{Y}/n}\biggr )^{-n/2}\\&= c_4\mathop {\hbox {plim}}\limits _{n\rightarrow \infty }{n^{(m_\gamma -m_{\gamma '})/2}}\exp {\biggl (\frac{\chi ^{2}_{m_{\gamma '} - m_\gamma }}{2}\biggr )} \\&= 0, \end{aligned}$$

    because \(m_\gamma - m_{\gamma '} < 0\) for \(M_\gamma \subseteq M_{\gamma '}\). This completed the proof.

\(\square \)

Proof of Theorem 3:

We consider the following two situations.

  1. (a)

    When \(\mathbf{M}_\gamma = \mathbf{M}_N\), it follows directly from the consistency of least squares estimators that \(\Vert \hat{\varvec{\beta }}_\gamma \Vert \rightarrow 0\), and therefore, the consistency of the BMA estimates follows.

  2. (b)

    When \(\mathbf{M}_\gamma \ne \mathbf{M}_N\), if follows from Theorem 2 that \(\mathop {\hbox {plim}}\limits \nolimits _{n\rightarrow \infty } P(M_\gamma \mid \mathbf{Y}) = 1\) when \(M_\gamma \) is the true model. In addition, following the result of Liang et al. (2008), we have

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = \frac{\hat{g}}{1 + \hat{g}}\biggl (1 + O\Bigl (\frac{1}{n}\Bigr )\biggr ), \end{aligned}$$

    where \(\hat{g}\) can be obtained by maximizing the function of the form

    $$\begin{aligned} L(g) = (1+g)^{(n-m_\gamma +a_0)/2} \bigl (1 + g(1-\tilde{R}^2_\gamma )\bigr )^{-(n+a_0)/2}. \end{aligned}$$

    Taking the first partial derivative of \(\log (L(g))\) with respect to \(g\) and putting it equal to 0, it provides

    $$\begin{aligned} \hat{g} = \max \biggl \{\frac{\tilde{R}^2_\gamma /m_\gamma }{(1-\tilde{R}^2_\gamma )/(n-m_\gamma +a_0)}-1, 0 \biggr \}. \end{aligned}$$

    Note that \(\hat{g}\) approaches infinity under the true model \(M_\gamma \) as \(n\) tends to infinity, and therefore, we obtain

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\int _0^\infty \frac{g}{1+g}\pi (g \mid M_\gamma , Y) = 1. \end{aligned}$$

    Using the consistency property of the least squares estimators, it is easy to show that

    $$\begin{aligned} \mathop {\hbox {plim}}\limits _{n\rightarrow \infty }\hat{Y}_f = \mathbb {E}[Y_f] = \alpha {\mathbf{1}_m} + \mathbf{X}_f \varvec{\beta }_\gamma . \end{aligned}$$

    This completes the proof.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Sun, X. & Lu, T. Bayesian structured variable selection in linear regression models. Comput Stat 30, 205–229 (2015). https://doi.org/10.1007/s00180-014-0529-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-014-0529-7

Keywords

Navigation