Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions

Vetrov, D. P.; Kropotov, D. A.; Osokin, A. A.

doi:10.1134/S0965542510040147

Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions

Published: 05 May 2010

Volume 50, pages 733–746, (2010)
Cite this article

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

D. P. Vetrov¹,
D. A. Kropotov² &
A. A. Osokin¹

116 Accesses
3 Citations
Explore all metrics

Abstract

The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Relevance Determination based on Information Criterion Minimization

Article Open access 03 July 2020

Model-based clustering with determinant-and-shape constraint

Article 29 May 2020

Weighted likelihood latent class linear regression

Article 23 July 2020

References

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Roy. Stat. Soc. B 39, 1–38 (1977).
MATH MathSciNet Google Scholar
C. M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006).
Book MATH Google Scholar
M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Mach. Learn. Res. 1, 211–244 (2001).
Article MATH MathSciNet Google Scholar
D. J. C. MacKay, “Bayesian Interpolation,” Neural Comput. 4, 415–447 (1992).
Article Google Scholar
I. O. Kyrgyzov, O. O. Kyrgyzov, H. Maitre, and M. Campedel, “Kernel MDL to Determine the Number of Clusters,” in Proc. Int. Conf. Mach. Learninig Data Mining, Leipzig, 2007.
L. Xu and M. I. Jordan, “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Comput. 8, 129–151 (1996).
Article Google Scholar
J. Rissanen, “Modeling by Shortest Data Description,” Automatica 14, 465–471 (1978).
Article MATH Google Scholar
Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Comp. Syst. Sci. 55(1), 119–139 (1997).
Article MATH MathSciNet Google Scholar
H. A. Akaike, “A New Look at the Statistical Model Identification,” IEEE Trans. Autom. Control 19, 716–723 (1974).
Article MATH MathSciNet Google Scholar
L. Hubert and P. Arabie, “Comparing Partitions,” J. Classif. 2, 193–218 (1985).
Article Google Scholar
N. Vlassis and A. Likas, “A Greedy EM Algorithm for Gaussian Mixture Learning,” Neural Proc. Lett., 77–87 (2000).
J. J. Verbeek, N. Vlassis, and B. Krose, “Efficient Greedy Learning of Gaussian Mixture Models,” Neural Comput. (2003).
L. I. Kuncheva and D. P. Vetrov, “Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 1798–1808 (2005).
Article Google Scholar
V. V. Ryazanov, “On the Synthesis of Classifying Algorithms on Finite Sets of Classification (Taxonomy) Algorithms,” Zh. Vychisl. Mat. Mat. Fiz. 22, 429–440 (1982).
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computational Mathematics and Cybernetics, Moscow State University, Moscow, 119992, Russia
D. P. Vetrov & A. A. Osokin
Dorodnicyn Computing Center, Russian Academy of Sciences, ul. Vavilova 40, Moscow, 119333, Russia
D. A. Kropotov

Authors

D. P. Vetrov
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Kropotov
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Osokin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. P. Vetrov.

Additional information

Original Russian Text © D.P. Vetrov, D.A. Kropotov, A.A. Osokin, 2010, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2010, Vol. 50, No. 4, pp. 770–783.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vetrov, D.P., Kropotov, D.A. & Osokin, A.A. Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions. Comput. Math. and Math. Phys. 50, 733–746 (2010). https://doi.org/10.1134/S0965542510040147

Download citation

Received: 24 July 2009
Accepted: 11 November 2009
Published: 05 May 2010
Issue Date: April 2010
DOI: https://doi.org/10.1134/S0965542510040147

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions

Abstract

Access this article

Similar content being viewed by others

Hierarchical Relevance Determination based on Information Criterion Minimization

Model-based clustering with determinant-and-shape constraint

Weighted likelihood latent class linear regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions

Abstract

Access this article

Similar content being viewed by others

Hierarchical Relevance Determination based on Information Criterion Minimization

Model-based clustering with determinant-and-shape constraint

Weighted likelihood latent class linear regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation