Data Mining and Knowledge Discovery

, Volume 13, Issue 3, pp 291–307 | Cite as

Accelerated EM-based clustering of large data sets

  • Jakob J. Verbeek
  • Jan R. J. Nunnink
  • Nikos Vlassis


Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal re-estimation of the model parameters based on average sufficient statistics computed over groups of data points. The proposed method naturally generalizes to mixtures of other members of the exponential family. Experimental results show the potential of the proposed method over other state-of-the-art acceleration techniques.


Gaussian mixtures EM algorithm Free energy kd-trees Large data sets 



We would like to thank the reviewers for their useful comments which helped to improve this manuscript. We are indebted to Tijn Schmits for part of the experimental work. JJV is supported by the Technology Foundation STW (project AIF 4997) applied science division of NWO and the technology program of the Dutch Ministry of Economic Affairs.


  1. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Comm ACM 18(9):509–517Google Scholar
  2. Bishop CM, Svensén M, Williams CKI (1998) GTM: The generative topographic mapping. Neur Comput 10:215–234Google Scholar
  3. Bradley PS, Fayyad UM, Reina CA (1998) Scaling EM (expectation maximization) clustering to large databases. Technical Report MSR-TR-98-35, Microsoft ResearchGoogle Scholar
  4. Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings of the IEEE Symposium on Foundations of Computer Science, vol. 40. IEEE Computer Society Press, Los Alamitos, CA, USA, pp 634–644Google Scholar
  5. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B (Methodological) 39(1):1–38Google Scholar
  6. Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic Publishers, BostonGoogle Scholar
  7. Kanungo T, Mount DM, Netanyahu N, Piatko C, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: Analysis and implementation. Trans Patt Anal Mach Intell 24:881–892Google Scholar
  8. Li JQ, Barron AR (2000) Mixture density estimation. In: Solla SA, Leen TK, Müller K-R (eds) Advances in neural information processing systems, vol. 12. MIT Press, Cambridge, MA, USA, pp 279–285Google Scholar
  9. Lindsay BG (1983) The geometry of mixture likelihoods: A general theory. Ann Stat 11(1):86–94Google Scholar
  10. McCallum A, Nigam K, Ungar L (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Ramakrishnan R, Stolfo S (eds) Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, vol. 6. ACM Press, New-York, NY, USAGoogle Scholar
  11. McLachlan GJ, Peel D (2000) Finite mixture models. John Wiley & SonsGoogle Scholar
  12. Moore A (1999) Very fast EM-based mixture model clustering using multiresolution kd-trees. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in Neural information processing systems, vol. 11. MIT Press, Cambridge, MA, USA, pp 543–549Google Scholar
  13. Moore A, Pelleg D (1999) Accelerating exact k-means algorithms with geometric reasoning. In: Proc 5th Int Conf Knowledge Discovery and Data Mining, pp 277–281Google Scholar
  14. Moore AW (2000) The anchors hierarchy: Using the triangle inequality to survive high-dimensional data. In: Boutilier C, Goldszmidt M (eds) Proceedings of the Annual conference on uncertainty in artificial intelligence, vol. 16. Morgan Kaufmann, San Mateo, CA, USA, pp 397–405Google Scholar
  15. Moore AW, Lee MS (1998) Cached sufficient statistics for efficient machine learning with large data sets. J Arti Intell Res 8:67–91Google Scholar
  16. Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (eds) Learning in graphical models. Kluwer, Boston, MA, USA, pp 355–368Google Scholar
  17. Nunnink JRJ (2003) Large scale Gaussian mixture modelling using a greedy expectation-maximisation algorithm. Master's thesis, Informatics Institute, University of Amsterdam. Scholar
  18. Omohundro SM (1989) Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, BerkeleyGoogle Scholar
  19. Rose K (1998) Deterministic annealing for clustering, compression, classification, regression and related optimization proble ms. IEEE Trans Inform The 86(11):2210–2239Google Scholar
  20. Sand P, Moore AW (2001) Repairing faulty mixture models using density estimation. In: Brodley CE, Danyluk AP (eds) Proceedings of the international conference on machine learning, vol. 18. Morgan Kaufmann, San Mateo, CA, USA, pp 457–464Google Scholar
  21. Sproull RF (1991) Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6:579–589Google Scholar
  22. Thiesson B, Meek C, Heckerman D (2001). Accelerating EM for large databases. Mach Learn 45(3):279–299Google Scholar
  23. Titsias M, Likas A (2001) Shared kernel models for class conditional density estimation. IEEE Trans Neur Netw 12(5):987–997Google Scholar
  24. Verbeek JJ, Vlassis N, Kröse BJA (2003) Efficient greedy learning of Gaussian mixture models. Neur Comput 15(2):469–485Google Scholar
  25. Vlassis N, Likas A (2002) A greedy EM algorithm for Gaussian mixture learning. Neur Proc Lett 15(1):77–87Google Scholar
  26. Zhang T (2002) A general greedy approximation algorithm with applications. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol. 14. MIT Press, Cambridge, MA, USAGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Jakob J. Verbeek
    • 1
  • Jan R. J. Nunnink
    • 2
  • Nikos Vlassis
    • 2
  1. 1.INRIA Rhone-AlpesMontbonnot Saint-MartinFrance
  2. 2.Informatics InstituteUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations