EM for mixtures
- 579 Downloads
Maximum likelihood through the EM algorithm is widely used to estimate the parameters in hidden structure models such as Gaussian mixture models. But the EM algorithm has well-documented drawbacks: its solution could be highly dependent from its initial position and it may fail as a result of degeneracies. We stress the practical dangers of theses limitations and how carefully they should be dealt with. Our main conclusion is that no method enables to address them satisfactory in all situations. But improvements are introduced, first, using a penalized log-likelihood of Gaussian mixture models in a Bayesian regularization perspective and, second, choosing the best among several relevant initialisation strategies. In this perspective, we also propose new recursive initialization strategies which prove helpful. They are compared with standard initialization procedures through numerical experiments and their effects on model selection criteria are analyzed.
KeywordsGaussian mixture models EM algorithm Initialization strategies Recursive initialization Regularized likelihood Model selection criteria
- Baudry, J.-P.: Sélection de modèle pour la classification non supervisée. Choix du nombre de classes. PhD thesis, Université Paris-Sud (2009)Google Scholar
- Papastamoulis, P., Martin-Magniette, M.-L., Maugis-Rabusseau, C.: On the estimation of mixtures of poisson regression models with large numbers of components. Computat. Stat. Data Anal. (to appear) (2014)Google Scholar
- Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley, P. (ed.) ICML, pp. 727–734. Morgan Kaufmann (2000)Google Scholar
- Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux, G.: Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics. (to appear) (2015)Google Scholar