Abstract
The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution here is to illustrate that through the use of conjugate priors one can derive an exact expression for ICL and so avoiding any approximation. Moreover, we illustrate how one can find both the number of clusters and the best allocation of observations in one algorithmic framework. The performance of our algorithm is presented on several simulated and real examples.
Similar content being viewed by others
References
Aitkin, M.: Likelihood and Bayesian analysis of mixtures. Stat. Model. 1(4), 287–304 (2001)
Baudry, J.P., Raftery, A.E., Celeux, G., Lo, K., Gottardo, R.: Combining mixture components for clustering. J. Comput. Graph. Stat. 19(2), 332–353 (2010)
Besag, J.: On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B (Methodol.) 48(3), 259–302 (1986)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Biernacki, C., Celeux, G., Govaert, G.: Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J. Stat. Plan. Inference 140(11), 2991–3002 (2010)
Côme, E., Latouche, P.: Model selection and clustering in stochastic block models with the exact integrated complete data likelihood (2013). arXiv:1303.2962
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Fraley, C., Raftery, A.E.: MCLUST version 3: an R package for normal mixture modeling and model-based clustering. Tech. rep. DTIC Document (2006)
Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)
Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)
Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 186(1007), 453–461 (1946)
Maitra, R., Melnykov, V.: Assessing significance in finite mixture models. Tech. rep. Citeseer (2010)
McDaid, A.F., Murphy, T.B., Friel, N., Hurley, N.J.: Improved Bayesian inference for the stochastic block model with application to large networks. Comput. Stat. Data Anal. 60, 12–31 (2013)
Mengersen, K., Robert, C., Titterington, M.: Mixtures: Estimation and Applications, vol. 896. Wiley, New York (2011)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Nobile, A., Fearnside, A.T.: Bayesian finite mixtures with an unknown number of components: the allocation sampler. Stat. Comput. 17(2), 147–162 (2007)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59(4), 731–792 (1997)
Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc. 85(411), 617–624 (1990)
Steele, R.J., Raftery, A.E.: Performance of Bayesian model selection criteria for Gaussian mixture models. In: Frontiers of Statistical Decision Making and Bayesian Analysis, chap. 4.1, pp 113–130. Springer, New York (2010)
Tessier, D., Schoenauer, M., Biernacki, C., Celeux, G., Govaert, G.: Evolutionary latent class clustering of qualitative data. INRIA Technical Report RR-6082, p 24 (2006)
Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22(2), 415–428 (2012)
Wyse, J., Friel, N., Latouche, P.: Inferring structure in bipartite networks using the latent block model and exact ICL (2014). arXiv:1404.2911
Acknowledgments
The authors would like to thank Gilles Celeux for some helpful comments on an earlier draft of this paper. Marco Bertoletti completed this work while visiting the School of Mathematical Sciences, UCD, as part of an M.Sc. project. The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant No. SFI/12/RC/2289. Nial Friel and Riccardo Rastelli’s research was also supported by a Science Foundation Ireland under Grant No. 12/IP/1424.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bertoletti, M., Friel, N. & Rastelli, R. Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion. METRON 73, 177–199 (2015). https://doi.org/10.1007/s40300-015-0064-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-015-0064-5