Skip to main content

A new Bayesian approach for determining the number of components in a finite mixture

Abstract

This article evaluates a new Bayesian approach to determining the number of components in a finite mixture. We evaluate through simulation studies mixtures of normals and latent class mixtures of Bernoulli responses. For normal mixtures we use a “gold standard” set of population models based on a well-known “testbed” data set—the galaxy recession velocity data set of Roeder (J Am Stat Assoc 85:617–624, 1990). For Bernoulli latent class mixtures we consider models for psychiatric diagnosis Berkhof et al. (Stat Sin 13:423–442, 2003). The new approach is based on comparing models with different numbers of components through their posterior deviance distributions, based on non-informative or diffuse priors. Simulations show that even large numbers of closely spaced normal components can be identified with sufficiently large samples, while for latent classes with Bernoulli responses identification is more complex, though it again improves with increasing sample size.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. Full details can be found in their paper.

  2. This may seem counter-intuitive, since the ML estimate of the saturated model always has the smallest frequentist deviance. The single observation in each “class” however gives a very diffuse likelihood for each \(p_{ij}\) and this leads to a very diffuse and large deviance distribution.

References

  1. Aitkin, M.: The calibration of p-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with discussion). Stat. Comput. 7, 253–272 (1997)

    Article  Google Scholar 

  2. Aitkin, M.: Likelihood and Bayesian analysis of mixtures. Stat. Model. 1, 287–304 (2001)

    Article  Google Scholar 

  3. Aitkin, M.: Statistical Inference: an Integrated Bayesian/Likelihood Approach. Chapman and Hall/CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  4. Aitkin, M.: How many components in a finite mixture? In: Mengersen, K.L., Robert, C.P., Titterington, D.M. (eds.) Mixtures Estimation and Applications. Wiley, Chichester (2011)

    Google Scholar 

  5. Bartlett, M.S.: A comment on D. V. Lindley’s statistical paradox. Biometrika 44, 533–534 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  6. Berkhof, J., van Mechelen, I., Gelman, A.: A Bayesian approach to the selection and testing of mixture models. Stat. Sin. 13, 423–442 (2003)

    MATH  Google Scholar 

  7. Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–674 (2006)

    Article  MathSciNet  Google Scholar 

  8. Dempster, A.P.: The direct use of likelihood in significance testing. Stat. Comput. 7, 247–252 (1997)

    Article  Google Scholar 

  9. Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  10. Garcia-Escudero, L.A., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 01/2015; 2015. doi:10.1007/s11222-014-9455-3

  11. Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  12. Lindley, D.V.: A statistical paradox. Biometrika 44, 187–192 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  13. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  14. Nylund, K.L., Asparouhov, T., Muthen, B.O.: Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct. Equ. Model. 14, 535–569 (2007)

    Article  MathSciNet  Google Scholar 

  15. Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC Press, Boca Raton (1996)

    Google Scholar 

  16. Postman, M., Huchra, J.P., Geller, M.J.: Probes of large-scale structures in the Corona Borealis region. Astron. J. 92, 1238–1247 (1986)

    Article  Google Scholar 

  17. Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  18. Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc. 85, 617–624 (1990)

    Article  MATH  Google Scholar 

  19. Roeder, K., Wasserman, L.: Practical Bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 92, 894–902 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  20. Spiegelhalter, D.J., Best, N., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639 (2002)

    Article  MATH  Google Scholar 

  21. Stephens, M.: Bayesian analysis of mixtures with an unknown number of components–an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  22. Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–550 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  23. van Mechelen, I., De Boeck, P.: Implicit taxonomy in psychiatric diagnosis: a case study. J. Soc. Clin. Psychol. 8, 276–287 (1989)

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful for research support from the Australian Research Council under project DP120102902 for the support of Duy Vu for the period of this research (2012–2015), and for visits by Brian Francis from the University of Lancaster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murray Aitkin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aitkin, M., Vu, D. & Francis, B. A new Bayesian approach for determining the number of components in a finite mixture. METRON 73, 155–176 (2015). https://doi.org/10.1007/s40300-015-0068-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-015-0068-1

Keywords

  • Finite mixture
  • Number of components
  • Deviance distribution