Skip to main content

Advertisement

Log in

On Bayesian Analysis of Parsimonious Gaussian Mixture Models

  • Original Research
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Cluster analysis is the task of grouping a set of objects in such a way that objects in the same cluster are similar to each other. It is widely used in many fields including machine learning, bioinformatics, and computer graphics. In all of these applications, the partition is an inference goal, along with the number of clusters and their distinguishing characteristics. Mixtures of factor analyzers is a special case of model-based clustering which assumes the variance of each cluster comes from a factor analysis model. It simplifies the Gaussian mixture model through parameter dimension reduction and conceptually represents the variables as coming from a lower dimensional subspace where the clusters are separate. In this paper, we introduce a new RJMCMC (reversible-jump Markov chain Monte Carlo) inferential procedure for the family of constrained MFA models.

The three goals of inference here are the partition of the objects, estimation of the number of clusters, and identification and estimation of the covariance structure of the clusters; each therefore has posterior distributions. RJMCMC is the major sampling tool, which allows the dimension of the parameters to be estimated. We present simulations comparing the estimation of the clustering parameters and the partition between this inferential technique and previous methods. Finally, we illustrate these new methods with a dataset of DNA methylation measures for subjects with different brain tumor types. Our method uses four latent factors to correctly discover the five brain tumor types without assuming a constant variance structure and it classifies subjects with an excellent classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Blake, C. (1998). Uci repository of machine learning databases. https://archive.ics.uci.edu/ml/index.php.

  • Capper, D., Jones, D.T.W., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koellsche, C., Sahm, F., Chavez, L., Reuss, D.E., & et al. (2018). DNA methylation-based classification of central nervous system tumours. Nature, 555(7697), 469–474.

    Article  Google Scholar 

  • Diebolt, J., & Robert, C.P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society: Series B (Methodological), 56(2), 363–375.

    MathSciNet  MATH  Google Scholar 

  • Escobar, M.D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the american statistical association, 90(430), 577–588.

    Article  MathSciNet  Google Scholar 

  • Fokoué, E, & Titterington, D.M. (2003). Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning, 50 (1-2), 73–94.

    Article  Google Scholar 

  • Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In Food research and data analysis: proceedings from the IUFoST Symposium, September 20-23, 1982, Oslo, Norway/edited by H. Martens and H. Russwurm, Jr, London, Applied Science Publishers.

  • Forina, M., Leardi, R., Armanino, C., Lanteri, S., Conti, P., & Princi, P. (1988). PARVUS: An extendable package of programs for data exploration, classification and correlation. Journal of Chemometrics, 4(2), 191–193.

    Google Scholar 

  • Ghahramani, Z., Hinton, G.E., & et al. (1996). The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1 University of Toronto.

  • Hoadley, K.A., Yau, C., Hinoue, T., Wolf, D.M., Lazar, A.J., Drill, E., Shen, R., Taylor, A.M., Cherniack, A.D., & Thorsson, V. (2018). Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2), 291–304.

    Article  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193–218.

    Article  Google Scholar 

  • Larjo, A., & Lähdesmäki, H. (2015). Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning. EURASIP Journal on Bioinformatics and Systems Biology, 2015(1), 6.

    Article  Google Scholar 

  • Lopes, H.F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14(1), 41–67.

    MathSciNet  MATH  Google Scholar 

  • Lu, X. (2019). Model selection and variable selection for the mixture of factor analyzers model. PhD thesis, University of Rochester.

  • Lu, X., Li, Y., & Love, T. (2020). bpgmm: Bayesian model selection approach for parsimonious Gaussian mixture models. URL https://CRAN.R-project.org/package=bpgmm. R package version 1.0.7.

  • McLachlan, G., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventeenth international conference on machine learning, San Francisco, pages 599–606. Morgan Kaufmann.

  • McLachlan, G.J., & Basford, K.E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker Inc.

    MATH  Google Scholar 

  • McLachlan, G.J., Peel, D., & Bean, R.W. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41(3-4), 379–388.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P.D. (2016). Model-based clustering. Journal of Classification, 33(3), 331–373.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P.D., ElSherbiny, A., McDaid, A.F., & Murphy, T.B. (2019). pgmm: Parsimonious Gaussian mixture models. https://CRAN.R-project.org/package=pgmm. R package version 1.2.4.

  • McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    Article  MathSciNet  Google Scholar 

  • Meng, X.L., & Dyk, D.V. (1997). The EM algorithm—an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.

    Article  MathSciNet  Google Scholar 

  • Mengersen, K.L., & Robert, C.P. (1996). Testing for mixtures: a Bayesian entropic approach, MA: Oxford University Press, Cambridge.

  • Murphy, K., Viroli, C., & Gormley, I.C. (2020). Infinite mixtures of infinite factor analysers. Bayesian Analysis, 15(3), 937–963.

    Article  MathSciNet  Google Scholar 

  • Nobile, A. (1994). Bayesian analysis of finite mixture distributions. Pittsburgh: PhD thesis, PhD Thesis. Carnegie Mellon University.

    Google Scholar 

  • Panagiotis, P. (2018). Overfitting Bayesian mixtures of factor analyzers with an unknown number of components. Computational Statistics & Data Analysis, 124, 220–234.

    Article  MathSciNet  Google Scholar 

  • Papastamoulis, P. (2020). fabMix: Overfitting bayesian mixtures of factor analyzers with parsimonious covariance and unknown number of components. https://CRAN.R-project.org/package=fabMix. R package version 5.0.

  • Phillips, D.B., & Smith, A.F.M. (1996). Bayesian model comparison via jump diffusions, (pp. 215–239). New York: Springer.

    MATH  Google Scholar 

  • Press, S.J., & Shigemasu, K. (1989). Bayesian inference in factor analysis, (pp. 271–287). New York: Springer.

    Google Scholar 

  • Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), 59(4), 731–792.

    Article  Google Scholar 

  • Rodríguez-Paredes, M, & Manel, E. (2011). Cancer epigenetics reaches mainstream oncology. Nature Medicine, 17(3), 330.

    Article  Google Scholar 

  • Roeder, K., & Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92 (439), 894–902.

    Article  MathSciNet  Google Scholar 

  • Rousseau, J., & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 73(5), 689–710.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G., & et al. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461– 464.

    Article  MathSciNet  Google Scholar 

  • Sturm, D., Orr, B.A., Toprak, U.H., Hovestadt, V., Jones, D.T.W., Capper, D., Sill, M., Buchhalter, I., Northcott, P.A., Leis, I., & et al. (2016). New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell, 164(5), 1060–1072.

    Article  Google Scholar 

  • Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.

    Article  Google Scholar 

  • Utsugi, A., & Kumagai, T. (2001). Bayesian analysis of mixtures of factor analyzers. Neural Computation, 13(5), 993–1002.

    Article  Google Scholar 

  • Vats, D., Flegal, J.M., & Jones, G.L. (2019). Multivariate output analysis for Markov chain Monte Carlo. Biometrika, 106(2), 321–337.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanzy Love.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

(PDF 424 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Li, Y. & Love, T. On Bayesian Analysis of Parsimonious Gaussian Mixture Models. J Classif 38, 576–593 (2021). https://doi.org/10.1007/s00357-021-09391-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-021-09391-8

Keywords

Navigation