Variational Bayes for Hierarchical Mixture Models

  • Muting Wan
  • James G. Booth
  • Martin T. WellsEmail author
Part of the Springer Handbooks of Computational Statistics book series (SHCS)


In recent years, sparse classification problems have emerged in many fields of study. Finite mixture models have been developed to facilitate Bayesian inference where parameter sparsity is substantial. Classification with finite mixture models is based on the posterior expectation of latent indicator variables. These quantities are typically estimated using the expectation-maximization (EM) algorithm in an empirical Bayes approach or Markov chain Monte Carlo (MCMC) in a fully Bayesian approach. MCMC is limited in applicability where high-dimensional data are involved because its sampling-based nature leads to slow computations and hard-to-monitor convergence. In this chapter, we investigate the feasibility and performance of variational Bayes (VB) approximation in a fully Bayesian framework. We apply the VB approach to fully Bayesian versions of several finite mixture models that have been proposed in bioinformatics, and find that it achieves desirable speed and accuracy in sparse classification with finite mixture models for high-dimensional data.


Bayesian inference Generalized linear mixed models Large p small n problems Linear mixed models Markov chain Monte Carlo Statistical bioinformatics 



We would like to thank John T. Ormerod who provided supplementary materials for GBVA implementation in Ormerod (2011), and Haim Y. Bar for helpful discussions.

Professors Booth and Wells acknowledge the support of NSF-DMS 1208488 and NIH U19 AI111143.


  1. Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750Google Scholar
  2. Attias H (2000) A variational Bayesian framework for graphical models. Adv Neural Inf Process Syst 12(1–2):209–215Google Scholar
  3. Bar H, Schifano E (2010) Lemma: Laplace approximated EM microarray analysis. R package version 1.3-1.
  4. Bar H, Booth J, Schifano E, Wells M (2010) Laplace approximated EM microarray analysis: an empirical Bayes approach for comparative microarray experiments. Stat Sci 25(3):388–407MathSciNetCrossRefGoogle Scholar
  5. Beal M (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, University of LondonGoogle Scholar
  6. Bishop C (1999) Variational principal components. In: Proceedings of ninth international conference on artificial neural networks, ICANN’99, vol 1. IET, pp 509–514Google Scholar
  7. Bishop C (2006) Pattern recognition and machine learning. Springer Science+ Business Media, New YorkGoogle Scholar
  8. Bishop C, Spiegelhalter D, Winn J (2002) VIBES: a variational inference engine for Bayesian networks. Adv Neural Inf Proces Syst 15:777–784Google Scholar
  9. Blei D, Jordan M (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1(1):121–143MathSciNetCrossRefGoogle Scholar
  10. Booth J, Eilertson K, Olinares P, Yu H (2011) A Bayesian mixture model for comparative spectral count data in shotgun proteomics. Mol Cell Proteomics 10(8):M110-007203CrossRefGoogle Scholar
  11. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeGoogle Scholar
  12. Callow M, Dudoit S, Gong E, Speed T, Rubin E (2000) Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res 10(12):2022–2029CrossRefGoogle Scholar
  13. Christensen R, Johnson WO, Branscum AJ, Hanson TE (2011) Bayesian ideas and data analysis: an introduction for scientists and statisticians. CRC, Boca RatonGoogle Scholar
  14. Consonni G, Marin J (2007) Mean-field variational approximate Bayesian inference for latent variable models. Comput Stat Data Anal 52(2):790–798MathSciNetCrossRefGoogle Scholar
  15. Corduneanu A, Bishop C (2001) Variational Bayesian model selection for mixture distributions. In: Jaakkola TS, Richardson TS (eds) Artificial intelligence and statistics 2001. Morgan Kaufmann, Waltham, pp 27–34Google Scholar
  16. Cowles MK Carlin BP (1996) Markov chain Monte Carlo convergence diagnostics: a comparative review. J Am Stat Assoc 91(434):883–904MathSciNetCrossRefGoogle Scholar
  17. De Freitas N, Højen-Sørensen P, Jordan M, Russell S (2001) Variational MCMC. In: Breese J, Koller D (eds) Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, pp 120–127Google Scholar
  18. Efron B (2008) Microarrays, empirical Bayes and the two-groups model. Stat Sci 23(1):1–22MathSciNetCrossRefGoogle Scholar
  19. Faes C, Ormerod J, Wand M (2011) Variational Bayesian inference for parametric and nonparametric regression with missing data. J Am Stat Assoc 106(495):959–971MathSciNetCrossRefGoogle Scholar
  20. Friston K, Ashburner J, Kiebel S, Nichols T, Penny W (2011) Statistical parametric mapping: the analysis of functional brain images. Academic, LondonGoogle Scholar
  21. Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis. Chapman & Hall/CRC, London/Boca RatonGoogle Scholar
  22. Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixtures of factor analysers. Adv Neural Inf Proces Syst 12:449–455Google Scholar
  23. Goldsmith J, Wand M, Crainiceanu C (2011) Functional regression via variational Bayes. Electr J Stat 5:572MathSciNetCrossRefGoogle Scholar
  24. Grimmer J (2011) An introduction to Bayesian inference via variational approximations. Polit Anal 19(1):32–47CrossRefGoogle Scholar
  25. Honkela A, Valpola H (2005) Unsupervised variational Bayesian learning of nonlinear models. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17. MIT, Cambridge, pp 593–600Google Scholar
  26. Jaakkola TS (2000) Tutorial on variational approximation methods. In: Opper M, Saad D (eds) Advanced mean field methods: theory and practice. MIT, Cambridge, pp 129–159Google Scholar
  27. Li Z, Sillanpää M (2012) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190(1):231–249CrossRefGoogle Scholar
  28. Li J, Das K, Fu G, Li R, Wu R (2011) The Bayesian lasso for genome-wide association studies. Bioinformatics 27(4):516–523CrossRefGoogle Scholar
  29. Logsdon B, Hoffman G, Mezey J (2010) A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinf 11(1):58CrossRefGoogle Scholar
  30. Luenberger D, Ye Y (2008) Linear and nonlinear programming. International series in operations research & management science, vol 116. Springer, New YorkGoogle Scholar
  31. Marin J-M, Robert CP (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer, New YorkGoogle Scholar
  32. Martino S, Rue H (2009) R package: INLA. Department of Mathematical Sciences, NTNU, Norway. Available at
  33. McGrory C, Titterington D (2007) Variational approximations in Bayesian model selection for finite mixture distributions. Comput Stat Data Anal 51(11):5352–5367MathSciNetCrossRefGoogle Scholar
  34. McLachlan G, Peel D (2004) Finite mixture models. Wiley, New YorkGoogle Scholar
  35. Minka T (2001a) Expectation propagation for approximate Bayesian inference. In: Breese J, Koller D (eds) Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, pp 362–369Google Scholar
  36. Minka T (2001b) A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of TechnologyGoogle Scholar
  37. Ormerod J (2011) Grid based variational approximations. Comput Stat Data Anal 55(1):45–56MathSciNetCrossRefGoogle Scholar
  38. Ormerod J, Wand M (2010) Explaining variational approximations. Am Stat 64(2):140–153MathSciNetCrossRefGoogle Scholar
  39. Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 71(2):319–392MathSciNetCrossRefGoogle Scholar
  40. Salter-Townshend M, Murphy T (2009) Variational Bayesian inference for the latent position and cluster model. In: NIPS 2009 (Workshop on analyzing networks & learning with graphs)Google Scholar
  41. Sing T, Sander O, Beerenwinkel N, Lengauer T (2007) ROCR: visualizing the performance of scoring classifiers. R package version 1.0-2.
  42. Smídl V, Quinn A (2005) The variational Bayes method in signal processing. Springer, BerlinzbMATHGoogle Scholar
  43. Smyth G (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1):1–25. Article 3MathSciNetCrossRefGoogle Scholar
  44. Smyth G (2005) Limma: linear models for microarray data. In: Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 397–420CrossRefGoogle Scholar
  45. Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21(13):3025–3033CrossRefGoogle Scholar
  46. Tzikas D, Likas A, Galatsanos N (2008) The variational approximation for Bayesian inference. IEEE Signal Process Mag 25(6):131–146CrossRefGoogle Scholar
  47. Wand MP, Ormerod JT, Padoan SA, Frührwirth R (2011) Mean field variational Bayes for elaborate distributions. Bayesian Anal 6(4):1–48MathSciNetCrossRefGoogle Scholar
  48. Wang B, Titterington DM (2005) Inadequacy of interval estimates corresponding to variational Bayesian approximations. In: Cowell RG, Ghahramani Z (eds) Proceedings of the tenth international workshop on artificial intelligence and statistics. Society for Artificial Intelligence and Statistics, pp 373–380Google Scholar
  49. Zhang M, Montooth K, Wells M, Clark A, Zhang D (2005) Mapping multiple quantitative trait loci by Bayesian classification. Genetics 169(4):2305–2318CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.New York Life Insurance CompanyNew YorkUSA
  2. 2.Department of Biological Statistics and Computational BiologyCornell UniversityIthacaUSA
  3. 3.Department of Statistical ScienceCornell UniversityIthacaUSA

Personalised recommendations