Skip to main content

Combining Mixture Models and Spectral Clustering for Data Partitioning

  • Conference paper
  • First Online:
Book cover Image Analysis and Recognition (ICIAR 2020)

Abstract

Gaussian Mixture Models are widely used nowadays, thanks to the simplicity and efficiency of the Expectation-Maximization algorithm. However, determining the optimal number of components is tricky and, in the context of data partitioning, may differ from the actual number of clusters. We propose to apply a post-processing step by means of Spectral Clustering: it allows a clever merging of similar Gaussians thanks to the Bhattacharyya distance so that clusters of any shape are automatically discovered. The proposed method shows a significant improvement compared to the classical Gaussian Mixture clustering approach and promising results against well-known partitioning algorithms with respect to the number of parameters.

Supported by Auvergne-Rhône-Alpes region.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/deric/clustering-benchmark.

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Auto. Control 19(6), 716–723 (1974)

    Article  MathSciNet  Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, January 2007

    Google Scholar 

  3. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 7, 99–109 (1943)

    MathSciNet  MATH  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Ester, M., Hans-Peter, K., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231, December 1997

    Google Scholar 

  6. Figueiredo, M., Jain, A.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)

    Article  Google Scholar 

  7. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)

    Article  Google Scholar 

  8. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  Google Scholar 

  9. Ghosal, A., Nandy, A., Das, A.K., Goswami, S., Panday, M.: A short review on different clustering techniques and their applications. In: Mandal, J., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics, pp. 69–83. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_9

    Chapter  Google Scholar 

  10. Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā: The Indian Journal of Statistics, Series A, pp. 49–66 (2000)

    Google Scholar 

  11. Leroux, B.G.: Consistent estimation of a mixing distribution. Ann. Stat. 20, 1350–1360 (1992)

    Article  MathSciNet  Google Scholar 

  12. McLachlan, G.J., Rathnayake, S.: On the number of components in a gaussian mixture model. Wiley Interdisciplinary Rev. Data Min. Knowl. Disc. 4(5), 341–355 (2014)

    Article  Google Scholar 

  13. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)

    Google Scholar 

  14. Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: International Conference on Machine Learning, pp. 727–734 (2000)

    Google Scholar 

  15. Roeder, K., Wasserman, L.: Practical bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 92(439), 894–902 (1997)

    Article  MathSciNet  Google Scholar 

  16. Ruan, L., Yuan, M., Zou, H.: Regularized parameter estimation in high-dimensional gaussian mixture models. Neural Comput. 23(6), 1605–1622 (2011)

    Article  MathSciNet  Google Scholar 

  17. Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)

    Article  Google Scholar 

  18. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans. Database Syst. 42(3), 1–21 (2017)

    Article  MathSciNet  Google Scholar 

  19. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  20. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, New York (2005). https://doi.org/10.1007/0-387-27656-4

    Book  MATH  Google Scholar 

  21. Zhang, Z., Chen, C., Sun, J., Chan, K.L.: EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recogn. 36(9), 1973–1983 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Muzeau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Muzeau, J., Oliver-Parera, M., Ladret, P., Bertolino, P. (2020). Combining Mixture Models and Spectral Clustering for Data Partitioning. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12132. Springer, Cham. https://doi.org/10.1007/978-3-030-50516-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-50516-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-50515-8

  • Online ISBN: 978-3-030-50516-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics