, Volume 72, Issue 2, pp 217–229 | Cite as

Negotiating multicollinearity with spike-and-slab priors

  • Veronika Ročková
  • Edward I. GeorgeEmail author


In multiple regression under the normal linear model, the presence of multicollinearity is well known to lead to unreliable and unstable maximum likelihood estimates. This can be particularly troublesome for the problem of variable selection where it becomes more difficult to distinguish between subset models. Here we show how adding a spike-and-slab prior mitigates this difficulty by filtering the likelihood surface into a posterior distribution that allocates the relevant likelihood information to each of the subset model modes. For identification of promising high posterior models in this setting, we consider three EM algorithms, the fast closed form EMVS version of Rockova and George (J Am Stat Assoc, 2014) and two new versions designed for variants of the spike-and-slab formulation. For a multimodal posterior under multicollinearity, we compare the regions of convergence of these three algorithms. Deterministic annealing versions of the EMVS algorithm are seen to substantially mitigate this multimodality. A single simple running example is used for illustration throughout.


Deterministic annealing EM algorithm EMVS \(g\)-prior Variable selection 

Mathematics Subject Classification

62F15 62J05 


  1. 1.
    Bar, H., Booth, J., Wells, M.: An empirical Bayes approach to variable selection and QTL analysis. Proceedings of the 25th International Workshop on Statistical Modelling, pp. 63–68. Glasgow, Scotland (2010)Google Scholar
  2. 2.
    Figueiredo, M.A.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1150–1159 (2003)CrossRefGoogle Scholar
  3. 3.
    George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)CrossRefGoogle Scholar
  4. 4.
    George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)zbMATHGoogle Scholar
  5. 5.
    George, E., Rockova, V., Lesaffre, E.: Faster spike-and-slab variable selection with dual coordinate ascent EM. In: Proceedings of the 28th Workshop on Statistical Modelling, vol. 1, pp. 165–170 (2013)Google Scholar
  6. 6.
    Griffin, J., Brown, P.: Alternative prior distributions for variable selection with very many more variables than observations. In: Technical report, University of Warwick, University of Kent (2005)Google Scholar
  7. 7.
    Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Hayashi, T., Iwata, H.: EM algorithm for Bayesian estimation of genomic breeding values. BMC Genetics 11, 1–9 (2010)CrossRefGoogle Scholar
  9. 9.
    Kiiveri, H.: A Bayesian approach to variable selection when the number of variables is very large. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40, 127–143 (2003)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Rockova, V., George, E.: EMVS: the EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 361 (2014, forthcoming)Google Scholar
  11. 11.
    Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998)CrossRefGoogle Scholar
  13. 13.
    Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel, P.K., Zellner, A. (eds.) Bayesian inference and decision techniques, pp. 233–243. Elsevier, North-Holland, AmsterdamGoogle Scholar

Copyright information

© Sapienza Università di Roma 2014

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations