Abstract
Gaussian Mixture Models are widely used nowadays, thanks to the simplicity and efficiency of the Expectation-Maximization algorithm. However, determining the optimal number of components is tricky and, in the context of data partitioning, may differ from the actual number of clusters. We propose to apply a post-processing step by means of Spectral Clustering: it allows a clever merging of similar Gaussians thanks to the Bhattacharyya distance so that clusters of any shape are automatically discovered. The proposed method shows a significant improvement compared to the classical Gaussian Mixture clustering approach and promising results against well-known partitioning algorithms with respect to the number of parameters.
Supported by Auvergne-Rhône-Alpes region.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Auto. Control 19(6), 716–723 (1974)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, January 2007
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 7, 99–109 (1943)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)
Ester, M., Hans-Peter, K., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231, December 1997
Figueiredo, M., Jain, A.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Ghosal, A., Nandy, A., Das, A.K., Goswami, S., Panday, M.: A short review on different clustering techniques and their applications. In: Mandal, J., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics, pp. 69–83. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_9
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā: The Indian Journal of Statistics, Series A, pp. 49–66 (2000)
Leroux, B.G.: Consistent estimation of a mixing distribution. Ann. Stat. 20, 1350–1360 (1992)
McLachlan, G.J., Rathnayake, S.: On the number of components in a gaussian mixture model. Wiley Interdisciplinary Rev. Data Min. Knowl. Disc. 4(5), 341–355 (2014)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)
Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: International Conference on Machine Learning, pp. 727–734 (2000)
Roeder, K., Wasserman, L.: Practical bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 92(439), 894–902 (1997)
Ruan, L., Yuan, M., Zou, H.: Regularized parameter estimation in high-dimensional gaussian mixture models. Neural Comput. 23(6), 1605–1622 (2011)
Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans. Database Syst. 42(3), 1–21 (2017)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, New York (2005). https://doi.org/10.1007/0-387-27656-4
Zhang, Z., Chen, C., Sun, J., Chan, K.L.: EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recogn. 36(9), 1973–1983 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Muzeau, J., Oliver-Parera, M., Ladret, P., Bertolino, P. (2020). Combining Mixture Models and Spectral Clustering for Data Partitioning. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12132. Springer, Cham. https://doi.org/10.1007/978-3-030-50516-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-50516-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50515-8
Online ISBN: 978-3-030-50516-5
eBook Packages: Computer ScienceComputer Science (R0)