An improved clustering algorithm based on finite Gaussian mixture model

  • Zhilin HeEmail author
  • Chun-Hsing Ho


The Finite Gaussian Mixture Model (FGMM) is the most commonly used model for describing mixed density distribution in cluster analysis. An important feature of the FGMM is that it can infinitely approximate any continuous distribution, as long as the model contains enough number of components. In the clustering analysis based on the FGMM, the EM algorithm is usually used to estimate the parameters of the model. The advantage is that the computation is stable and the convergence speed is fast. However, the EM algorithm relies heavily on the estimation of incomplete data. It does not use any information to reduce the uncertainty of missing data. To solve this problem, an EM algorithm based on entropy penalized maximum likelihood estimation is proposed. The novel algorithm constructs the conditional entropy model between incomplete data and missing data, and reduces the uncertainty of missing data through incomplete data. Theoretical analysis and experimental results show that the novel algorithm can effectively adapt to the FGMM, improve the clustering results and improve the efficiency of the algorithm.


Gaussian mixture model EM algorithm Cluster analysis 



This research was supported by National Natural Science Foundation of China(11241005).


  1. 1.
    Alfò M, Nieddu L, Vicari D (2008) A finite mixture model for image segmentation. Stat Comput 18(2):137–150MathSciNetCrossRefGoogle Scholar
  2. 2.
    Attorre F, Francesconi F, Sanctis MD et al (2014) Classifying and mapping potential distribution of forest types using a finite mixture model. Folia Geobotanica 49(3):313–335CrossRefGoogle Scholar
  3. 3.
    Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis 41(3–4):561–575MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cheng Z, Shen J (2014) Just-for-me: An adaptive personalization system for location-aware social music recommendation. ICMRGoogle Scholar
  5. 5.
    Cheng Z, Shen J (2016) On effective location-aware music recommendation. TOISGoogle Scholar
  6. 6.
    Cheng Z, Shen J, Nie L, Kankanhalli M (2017) Exploiting music play sequence for music recommendation. IJCAIGoogle Scholar
  7. 7.
    Dempster AP (1977) Maximum likelihood estimation from incomplete data via the EM algorithm. Elearn 39(1):1–38zbMATHGoogle Scholar
  8. 8.
    Dr DR, Dr DR (2009) Gaussian mixture models. Encyclopedia of Biometrics 03(4):93–105Google Scholar
  9. 9.
    Li J, Lu K, Huang Z et al (2018) Transfer Independently Together: A Generalized Framework for Domain Adaptation. IEEE Transactions on Cybernetics PP(99):1–12Google Scholar
  10. 10.
    Li H, Yamanishi K (1999) Document Classification Using a Finite Mixture Model. Proceedings of Annual Meeting of the Association for Computational Linguistics 97(4):39–47Google Scholar
  11. 11.
    Liu AA, Nie WZ, Gao Y et al (2017) View-based 3-D model retrieval: a benchmark. IEEE Transactions on Cybernetics 48(3):916–928Google Scholar
  12. 12.
    Liu Z, Song YQ, Xie CH et al (2016) A new clustering method of gene expression data based on multivariate Gaussian mixture models. SIViP 10(2):359–368CrossRefGoogle Scholar
  13. 13.
    Maitra R (2007) Initializing partition-optimization algorithms. IEEE/ACM Transactions on Computational Biology & Bioinformatics 6(1):144CrossRefGoogle Scholar
  14. 14.
    Mclachlan GJ, Peel D (2000) Finite mixture model. Technometrics 44Google Scholar
  15. 15.
    Meil M, Heckerman D (1998) An experimental comparison of several clustering and initialization methods. Fourteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. pp. 386–395Google Scholar
  16. 16.
    Melnykov V, Melnykov I (2012) Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Computational Statistics & Data Analysis 56(6):1381–1395MathSciNetCrossRefGoogle Scholar
  17. 17.
    Metallinou A, Lee S, Narayanan S (2008) Audio-visual emotion recognition using Gaussian mixture models for face and voice. Tenth IEEE International Symposium on Multimedia. DBLP, pp. 250–257Google Scholar
  18. 18.
    Nie W, Cheng H, Su Y (2017) Modeling temporal information of mitotic for mitotic event detection. IEEE Transactions on Big Data 3(4):458–469CrossRefGoogle Scholar
  19. 19.
    Nie L, Song X, Chua TS (2016) Learning from multiple social networks. Synthesis Lectures on Information Concepts Retrieval & Services 8(2):118CrossRefGoogle Scholar
  20. 20.
    Nie L, Zhang L, Yang Y, et al (2016) Beyond Doctors: future health prediction from multimedia and multimodal observations. ACM International Conference on Multimedia. ACM, 2015:591–600.ncepts Retrieval & Services 8(2):118Google Scholar
  21. 21.
    Uebersax JS, Grove WM (1993) A Latent Trait Finite Mixture Model for the Analysis of Rating Agreement. Biometrics 49(3):823–835MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wang Y, Luo L, Freedman MT et al (2000) Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization. IEEE Trans Neural Netw 11(3):625–636CrossRefGoogle Scholar
  23. 23.
    Wedel M, Desarbo WS (2002) Market Segment Derivation and Profiling Via a Finite Mixture Model Framework. Mark Lett 13(1):17–25CrossRefGoogle Scholar
  24. 24.
    Xie CH, Chang JY, Liu YJ (2013) Estimating the number of components in Gaussian mixture models adaptively for medical image. Journal of Information & Computational Science 10(14):4453–4460CrossRefGoogle Scholar
  25. 25.
    Yu Y, Xu QF, Sun PF (2006) Bayesian clustering based on finite mixture models of Dirichlet distributions. Mathematica Applicata 19(3):600–605MathSciNetzbMATHGoogle Scholar
  26. 26.
    Zhao Q, Hautamäki V, Kärkkäinen I, et al (2009) Random swap EM algorithm for finite mixture models in image segmentation. IEEE International Conference on Image Processing. IEEE, pp. 2373–2376Google Scholar
  27. 27.
    Zhu L, Huang Z, Chang X, et al (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search. ACM, pp. 726–734Google Scholar
  28. 28.
    Zhu L, Huang Z, Li Z et al (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks & Learning Systems PP(99):1–13Google Scholar
  29. 29.
    Zhu L, Shen J, Jin H, Xie L, Zheng R (2013) Landmark Classification with Hierarchical Multi-Modal Exemplar Feature. IEEE Transactions on Multimedia 17(7):981–993 26CrossRefGoogle Scholar
  30. 30.
    Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based Visual Landmark Search via Multi-modal Hypergraph Learning. IEEE Transactions on Cybernetics 45(12):2756–2769CrossRefGoogle Scholar
  31. 31.
    Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised Visual Hashing with Semantic Assistant for Content-based Image Retrieval. IEEE Trans Knowl Data Eng 29(2):472–486 42CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Mathematics and Information Technology School of Yuncheng UniversityYunchengChina
  2. 2.Department of Civil Engineering, Construction Management & Environmental EngineeringNorthern Arizona UniversityFlagstaffUSA

Personalised recommendations