Skip to main content
Log in

Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions

  • Published:
Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Abstract

The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Roy. Stat. Soc. B 39, 1–38 (1977).

    MATH  MathSciNet  Google Scholar 

  2. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006).

    Book  MATH  Google Scholar 

  3. M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Mach. Learn. Res. 1, 211–244 (2001).

    Article  MATH  MathSciNet  Google Scholar 

  4. D. J. C. MacKay, “Bayesian Interpolation,” Neural Comput. 4, 415–447 (1992).

    Article  Google Scholar 

  5. I. O. Kyrgyzov, O. O. Kyrgyzov, H. Maitre, and M. Campedel, “Kernel MDL to Determine the Number of Clusters,” in Proc. Int. Conf. Mach. Learninig Data Mining, Leipzig, 2007.

  6. L. Xu and M. I. Jordan, “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Comput. 8, 129–151 (1996).

    Article  Google Scholar 

  7. J. Rissanen, “Modeling by Shortest Data Description,” Automatica 14, 465–471 (1978).

    Article  MATH  Google Scholar 

  8. Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Comp. Syst. Sci. 55(1), 119–139 (1997).

    Article  MATH  MathSciNet  Google Scholar 

  9. H. A. Akaike, “A New Look at the Statistical Model Identification,” IEEE Trans. Autom. Control 19, 716–723 (1974).

    Article  MATH  MathSciNet  Google Scholar 

  10. L. Hubert and P. Arabie, “Comparing Partitions,” J. Classif. 2, 193–218 (1985).

    Article  Google Scholar 

  11. N. Vlassis and A. Likas, “A Greedy EM Algorithm for Gaussian Mixture Learning,” Neural Proc. Lett., 77–87 (2000).

  12. J. J. Verbeek, N. Vlassis, and B. Krose, “Efficient Greedy Learning of Gaussian Mixture Models,” Neural Comput. (2003).

  13. L. I. Kuncheva and D. P. Vetrov, “Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 1798–1808 (2005).

    Article  Google Scholar 

  14. V. V. Ryazanov, “On the Synthesis of Classifying Algorithms on Finite Sets of Classification (Taxonomy) Algorithms,” Zh. Vychisl. Mat. Mat. Fiz. 22, 429–440 (1982).

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. P. Vetrov.

Additional information

Original Russian Text © D.P. Vetrov, D.A. Kropotov, A.A. Osokin, 2010, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2010, Vol. 50, No. 4, pp. 770–783.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vetrov, D.P., Kropotov, D.A. & Osokin, A.A. Automatic determination of the number of components in the EM algorithm of restoration of a mixture of normal distributions. Comput. Math. and Math. Phys. 50, 733–746 (2010). https://doi.org/10.1134/S0965542510040147

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0965542510040147

Key words

Navigation