Skip to main content
Log in

Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion

  • Published:
METRON Aims and scope Submit manuscript

Abstract

The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution here is to illustrate that through the use of conjugate priors one can derive an exact expression for ICL and so avoiding any approximation. Moreover, we illustrate how one can find both the number of clusters and the best allocation of observations in one algorithmic framework. The performance of our algorithm is presented on several simulated and real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Aitkin, M.: Likelihood and Bayesian analysis of mixtures. Stat. Model. 1(4), 287–304 (2001)

    Article  MathSciNet  Google Scholar 

  2. Baudry, J.P., Raftery, A.E., Celeux, G., Lo, K., Gottardo, R.: Combining mixture components for clustering. J. Comput. Graph. Stat. 19(2), 332–353 (2010)

  3. Besag, J.: On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B (Methodol.) 48(3), 259–302 (1986)

  4. Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)

    Article  Google Scholar 

  5. Biernacki, C., Celeux, G., Govaert, G.: Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J. Stat. Plan. Inference 140(11), 2991–3002 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Côme, E., Latouche, P.: Model selection and clustering in stochastic block models with the exact integrated complete data likelihood (2013). arXiv:1303.2962

  7. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fraley, C., Raftery, A.E.: MCLUST version 3: an R package for normal mixture modeling and model-based clustering. Tech. rep. DTIC Document (2006)

  9. Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  10. Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)

  11. Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 186(1007), 453–461 (1946)

    Article  MathSciNet  MATH  Google Scholar 

  12. Maitra, R., Melnykov, V.: Assessing significance in finite mixture models. Tech. rep. Citeseer (2010)

  13. McDaid, A.F., Murphy, T.B., Friel, N., Hurley, N.J.: Improved Bayesian inference for the stochastic block model with application to large networks. Comput. Stat. Data Anal. 60, 12–31 (2013)

    Article  MathSciNet  Google Scholar 

  14. Mengersen, K., Robert, C., Titterington, M.: Mixtures: Estimation and Applications, vol. 896. Wiley, New York (2011)

  15. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)

    Article  Google Scholar 

  16. Nobile, A., Fearnside, A.T.: Bayesian finite mixtures with an unknown number of components: the allocation sampler. Stat. Comput. 17(2), 147–162 (2007)

    Article  MathSciNet  Google Scholar 

  17. Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59(4), 731–792 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  18. Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc. 85(411), 617–624 (1990)

    Article  MATH  Google Scholar 

  19. Steele, R.J., Raftery, A.E.: Performance of Bayesian model selection criteria for Gaussian mixture models. In: Frontiers of Statistical Decision Making and Bayesian Analysis, chap. 4.1, pp 113–130. Springer, New York (2010)

  20. Tessier, D., Schoenauer, M., Biernacki, C., Celeux, G., Govaert, G.: Evolutionary latent class clustering of qualitative data. INRIA Technical Report RR-6082, p 24 (2006)

  21. Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22(2), 415–428 (2012)

    Article  MathSciNet  Google Scholar 

  22. Wyse, J., Friel, N., Latouche, P.: Inferring structure in bipartite networks using the latent block model and exact ICL (2014). arXiv:1404.2911

Download references

Acknowledgments

The authors would like to thank Gilles Celeux for some helpful comments on an earlier draft of this paper. Marco Bertoletti completed this work while visiting the School of Mathematical Sciences, UCD, as part of an M.Sc. project. The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant No. SFI/12/RC/2289. Nial Friel and Riccardo Rastelli’s research was also supported by a Science Foundation Ireland under Grant No. 12/IP/1424.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Rastelli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertoletti, M., Friel, N. & Rastelli, R. Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion. METRON 73, 177–199 (2015). https://doi.org/10.1007/s40300-015-0064-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-015-0064-5

Keywords

Navigation