Improved model-based clustering performance using Bayesian initialization averaging
- 25 Downloads
The expectation–maximization (EM) algorithm is a commonly used method for finding the maximum likelihood estimates of the parameters in a mixture model via coordinate ascent. A serious pitfall with the algorithm is that in the case of multimodal likelihood functions, it can get trapped at a local maximum. This problem often occurs when sub-optimal starting values are used to initialize the algorithm. Bayesian initialization averaging (BIA) is proposed as an ensemble method to generate high quality starting values for the EM algorithm. Competing sets of trial starting values are combined as a weighted average, which is then used as the starting position for a full EM run. The method can also be extended to variational Bayes methods, a class of algorithm similar to EM that is based on an approximation of the model posterior. The BIA method is demonstrated on real continuous, categorical and network data sets, and the convergent log-likelihoods and associated clustering solutions presented. These compare favorably with the output produced using competing initialization methods such as random starts, hierarchical clustering and deterministic annealing, with the highest available maximum likelihood estimates obtained in a higher percentage of cases, at reasonable computational cost. For the Stochastic Block Model for network data promising results are demonstrated even when the likelihood is unavailable. The implications of the different clustering solutions obtained by local maxima are also discussed.
KeywordsBayesian model averaging Expectation–maximization algorithm Finite mixture models Hierarchical clustering Model-based clustering Multimodal likelihood
The authors would like to acknowledge the contribution of Dr. Jason Wyse to this paper, who provided many helpful insights as well as C++ code for the label-switching methodology employed.
- Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695:1–9Google Scholar
- McGrory C, Ahfock D (2014) Transdimensional sequential Monte Carlo for hidden Markov models using variational Bayes-SMCVB. In: Proceedings of the 2014 federated conference on computer science and information systems, vol 3. pp 61–66Google Scholar
- McLachlan GJ, Peel D (1998) Advances in pattern recognition: joint IAPR international workshops on structual and syntactic pattern recognition (SSPR) and statistical pattern recognition (SPR) Sydney, Australia, August 11–13, 1998 Proceedings, Springer, Berlin, chap Robust cluster analysis via mixtures of multivariate t-distributions, pp 658–666CrossRefGoogle Scholar
- Meng XL, Rubin DB (1992) Recent extensions of the EM algorithm (with discussion). In: Bayesian statistics 4. Oxford University Press, Oxford, pp 307–320Google Scholar
- Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2012) e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-1. http://CRAN.R-project.org/package=e1071
- Neal RM, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 355–368Google Scholar
- Rokach L, Maimon O (2010) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, Berlin, pp 321–352Google Scholar
- Walsh C (2006) Latent class analysis identification of syndromes in Alzheimer’s disease: a Bayesian approach. Metodološki Zvezki-Adv Methodol Stat 3:147–162Google Scholar