Negotiating multicollinearity with spike-and-slab priors

Abstract

In multiple regression under the normal linear model, the presence of multicollinearity is well known to lead to unreliable and unstable maximum likelihood estimates. This can be particularly troublesome for the problem of variable selection where it becomes more difficult to distinguish between subset models. Here we show how adding a spike-and-slab prior mitigates this difficulty by filtering the likelihood surface into a posterior distribution that allocates the relevant likelihood information to each of the subset model modes. For identification of promising high posterior models in this setting, we consider three EM algorithms, the fast closed form EMVS version of Rockova and George (J Am Stat Assoc, 2014) and two new versions designed for variants of the spike-and-slab formulation. For a multimodal posterior under multicollinearity, we compare the regions of convergence of these three algorithms. Deterministic annealing versions of the EMVS algorithm are seen to substantially mitigate this multimodality. A single simple running example is used for illustration throughout.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Bar, H., Booth, J., Wells, M.: An empirical Bayes approach to variable selection and QTL analysis. Proceedings of the 25th International Workshop on Statistical Modelling, pp. 63–68. Glasgow, Scotland (2010)

    Google Scholar 

  2. 2.

    Figueiredo, M.A.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1150–1159 (2003)

    Article  Google Scholar 

  3. 3.

    George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)

    Article  Google Scholar 

  4. 4.

    George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)

    MATH  Google Scholar 

  5. 5.

    George, E., Rockova, V., Lesaffre, E.: Faster spike-and-slab variable selection with dual coordinate ascent EM. In: Proceedings of the 28th Workshop on Statistical Modelling, vol. 1, pp. 165–170 (2013)

  6. 6.

    Griffin, J., Brown, P.: Alternative prior distributions for variable selection with very many more variables than observations. In: Technical report, University of Warwick, University of Kent (2005)

  7. 7.

    Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)

    Article  MathSciNet  Google Scholar 

  8. 8.

    Hayashi, T., Iwata, H.: EM algorithm for Bayesian estimation of genomic breeding values. BMC Genetics 11, 1–9 (2010)

    Article  Google Scholar 

  9. 9.

    Kiiveri, H.: A Bayesian approach to variable selection when the number of variables is very large. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40, 127–143 (2003)

    Article  MathSciNet  Google Scholar 

  10. 10.

    Rockova, V., George, E.: EMVS: the EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 361 (2014, forthcoming)

  11. 11.

    Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

    MATH  MathSciNet  Google Scholar 

  12. 12.

    Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998)

    Article  Google Scholar 

  13. 13.

    Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel, P.K., Zellner, A. (eds.) Bayesian inference and decision techniques, pp. 233–243. Elsevier, North-Holland, Amsterdam

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Edward I. George.

Additional information

The authors would like to thank the reviewers for very helpful suggestions. This work was supported by AHRQ Grant R21-HS021854.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ročková, V., George, E.I. Negotiating multicollinearity with spike-and-slab priors. METRON 72, 217–229 (2014). https://doi.org/10.1007/s40300-014-0047-y

Download citation

Keywords

  • Deterministic annealing
  • EM algorithm
  • EMVS
  • \(g\)-prior
  • Variable selection

Mathematics Subject Classification

  • 62F15
  • 62J05