Abstract
The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.
Similar content being viewed by others
References
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Roy. Stat. Soc. B 39, 1–38 (1977).
C. M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006).
M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Mach. Learn. Res. 1, 211–244 (2001).
D. J. C. MacKay, “Bayesian Interpolation,” Neural Comput. 4, 415–447 (1992).
I. O. Kyrgyzov, O. O. Kyrgyzov, H. Maitre, and M. Campedel, “Kernel MDL to Determine the Number of Clusters,” in Proc. Int. Conf. Mach. Learninig Data Mining, Leipzig, 2007.
L. Xu and M. I. Jordan, “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Comput. 8, 129–151 (1996).
J. Rissanen, “Modeling by Shortest Data Description,” Automatica 14, 465–471 (1978).
Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Comp. Syst. Sci. 55(1), 119–139 (1997).
H. A. Akaike, “A New Look at the Statistical Model Identification,” IEEE Trans. Autom. Control 19, 716–723 (1974).
L. Hubert and P. Arabie, “Comparing Partitions,” J. Classif. 2, 193–218 (1985).
N. Vlassis and A. Likas, “A Greedy EM Algorithm for Gaussian Mixture Learning,” Neural Proc. Lett., 77–87 (2000).
J. J. Verbeek, N. Vlassis, and B. Krose, “Efficient Greedy Learning of Gaussian Mixture Models,” Neural Comput. (2003).
L. I. Kuncheva and D. P. Vetrov, “Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 1798–1808 (2005).
V. V. Ryazanov, “On the Synthesis of Classifying Algorithms on Finite Sets of Classification (Taxonomy) Algorithms,” Zh. Vychisl. Mat. Mat. Fiz. 22, 429–440 (1982).
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © D.P. Vetrov, D.A. Kropotov, A.A. Osokin, 2010, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2010, Vol. 50, No. 4, pp. 770–783.
Rights and permissions
About this article
Cite this article
Vetrov, D.P., Kropotov, D.A. & Osokin, A.A. Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions. Comput. Math. and Math. Phys. 50, 733–746 (2010). https://doi.org/10.1134/S0965542510040147
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542510040147