Modal Clustering Using Semiparametric Mixtures and Mode Flattening

Hu, Shengwei; Wang, Yong

doi:10.1007/s11222-020-09985-z

Modal Clustering Using Semiparametric Mixtures and Mode Flattening

Published: 12 January 2021

Volume 31, article number 5, (2021)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

283 Accesses
2 Citations
Explore all metrics

Abstract

Modal clustering has a clear population goal, where density estimation plays a critical role. In this paper, we study how to provide better density estimation so as to serve the objective of modal clustering. In particular, we use semiparametric mixtures for density estimation, aided with a novel mode-flattening technique. The use of semiparametric mixtures helps to produce better density estimates, especially in the multivariate situation, and the mode-flattening technique is intended to identify and smooth out spurious and minor modes. With mode flattening, the number of clusters can be sequentially reduced until there is only one mode left. In addition, we adopt the likelihood function in a coherent manner to measure the relative importance of a mode and let the current least important mode disappear in each step. For both simulated and real-world data sets, the proposed method performs very well, as compared with some well-known clustering methods in the literature, and can successfully solve some fairly difficult clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture model modal clustering

Article 13 January 2018

José E. Chacón

A fresh look at mean-shift based modal clustering

Article 14 December 2023

Jose Ameijeiras-Alonso & Jochen Einbeck

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

References

Anderson, E.: The irises of the Gaspe peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
Google Scholar
Arias-Castro, E., Mason, D., Pelletier, B.: On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. J. Mach. Learn. Res. 17(1), 1–28 (2016)
MathSciNet MATH Google Scholar
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007)
Article MathSciNet Google Scholar
Azzalini, A., Menardi, G.: Clustering via nonparametric density estimation: the R package pdfCluster. J. Stat. Softw. 57(11), 1–26 (2014)
Article MATH Google Scholar
Cadre, B., Pelletier, B., Pudlo, P.: Estimation of density level sets with a given probability content. J. Nonparametr. Stat. 25(1), 261–272 (2013)
Article MathSciNet MATH Google Scholar
Carmichael, J.W., George, J.A., Julius, R.S.: Finding natural clusters. Syst. Zool. 17(2), 144–150 (1968)
Article Google Scholar
Chacón, J.E.: A population background for nonparametric density-based clustering. Stat. Sci. 30(4), 518–532 (2015)
Article MathSciNet MATH Google Scholar
Chen, Y., Genovese, C.R., Wasserman, L.: A comprehensive approach to mode clustering. Electron. J. Stat. 10(1), 210–241 (2016)
Article MathSciNet MATH Google Scholar
Chen, Y., Genovese, C.R., Wasserman, L.: Statistical inference using the Morse-Smale complex. Electron. J. Stat. 11(1), 1390–1433 (2017)
Article MathSciNet MATH Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36(4), 441–459 (2001)
Article MathSciNet MATH Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Article Google Scholar
Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Food Research and Data Analysis, pp. 189–214. Applied Science Publishers, London (1983)
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25(3), 189–201 (1986)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Article MathSciNet MATH Google Scholar
Fukunaga, K., Hostetler, L.D.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)
Article MathSciNet MATH Google Scholar
Geman, S., Hwang, C.: Nonparametric maximum-likelihood estimation by the method of sieves. Ann. Stat. 10(2), 401–414 (1982)
Article MathSciNet MATH Google Scholar
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. R. Stat. Soc. Ser. C - Appl. Stat. 18(1), 54–64 (1969)
MathSciNet Google Scholar
Grenander, U.: Abstract Inference. Wiley, New York, NY (1981)
MATH Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York, NY (1975)
MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar
Laird, N.: Nonparametric maximum likelihood estimation of a mixing distribution. J. Am. Stat. Assoc. 73(364), 805–811 (1978)
Article MATH Google Scholar
Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall Inc, London (1974)
MATH Google Scholar
Li, J., Gray, R.M.: Image Segmentation and Compression Using Hidden Markov Models. Springer, Berlin (2000)
Book MATH Google Scholar
Li, J., Ray, S., Lindsay, B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)
MathSciNet MATH Google Scholar
Lindsay, B.G.: The geometry of mixture likelihoods: a general theory. Ann. Stat. 11(1), 86–94 (1983)
Article MathSciNet MATH Google Scholar
Lindsay, B.G.: Mixture models: theory, geometry and applications. NSF-CBMS Regional Conference Series in Probability and Statistics 5, i–163 (1995)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Melnykov, V.: On the distribution of posterior probabilities in finite mixture models with application in clustering. J. Multivariate Anal. 122, 175–189 (2013)
Article MathSciNet MATH Google Scholar
Menardi, G.: A review on modal clustering. Int. Stat. Rev. 84, 413–433 (2016)
Article MathSciNet Google Scholar
Menardi, G., Azzalini, A.: An advancement in clustering via nonparametric density estimation. Stat. Comput. 24(5), 753–767 (2014)
Article MathSciNet MATH Google Scholar
Minnotte, M.C., Scott, D.W.: The mode tree: a tool for visualization of nonparametric density features. J. Comput. Graph. Stat. 2(1), 51–68 (1992)
Google Scholar
Murrell, P.: R Graphics. CRC Press, Boca Raton (2011)
MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2019)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Article MathSciNet MATH Google Scholar
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Graph. Stat. 19, 397–418 (2010)
Article MathSciNet Google Scholar
Sugiura, N.: Further analysts of the data by Akaike’s information criterion and the finite corrections. Commun. Stat. - Theory Methods 7(1), 13–26 (1978)
Article MATH Google Scholar
Urbanek, S.: jpeg: read and write JPEG images. R package version 0.1-8 (2014)
Wand, M.P., Jones, M.C.: Comparison of smoothing parameterizations in bivariate kernel density estimation. J. Am. Stat. Assoc. 88(422), 520–528 (1993)
Article MathSciNet MATH Google Scholar
Wang, Y.: On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution. J. R. Stat. Soc. B 69(2), 185–198 (2007)
Article MathSciNet MATH Google Scholar
Wang, Y.: Maximum likelihood computation for fitting semiparametric mixture models. Stat. Comput. 20(1), 75–86 (2010)
Article MathSciNet Google Scholar
Wang, Y., Chee, C.-S.: Density estimation using non-parametric and semi-parametric mixtures. Stat. Comput. 12, 67–92 (2012)
MathSciNet MATH Google Scholar
Wang, X., Wang, Y.: Nonparametric multivariate density estimation using mixtures. Stat. Comput. 25(2), 349–364 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank the associated editor and two referees for their constructive and insightful suggestions, which led to many improvements in the manuscript.

Author information

Authors and Affiliations

Department of Statistics, The University of Auckland, Private Bag 92019, Auckland, 1142, New Zealand
Shengwei Hu & Yong Wang

Authors

Shengwei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, S., Wang, Y. Modal Clustering Using Semiparametric Mixtures and Mode Flattening. Stat Comput 31, 5 (2021). https://doi.org/10.1007/s11222-020-09985-z

Download citation

Received: 16 August 2019
Accepted: 17 September 2020
Published: 12 January 2021
DOI: https://doi.org/10.1007/s11222-020-09985-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modal Clustering Using Semiparametric Mixtures and Mode Flattening

Abstract

Access this article

Similar content being viewed by others

Mixture model modal clustering

A fresh look at mean-shift based modal clustering

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modal Clustering Using Semiparametric Mixtures and Mode Flattening

Abstract

Access this article

Similar content being viewed by others

Mixture model modal clustering

A fresh look at mean-shift based modal clustering

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation