Skip to main content
Log in

Modal Clustering Using Semiparametric Mixtures and Mode Flattening

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Modal clustering has a clear population goal, where density estimation plays a critical role. In this paper, we study how to provide better density estimation so as to serve the objective of modal clustering. In particular, we use semiparametric mixtures for density estimation, aided with a novel mode-flattening technique. The use of semiparametric mixtures helps to produce better density estimates, especially in the multivariate situation, and the mode-flattening technique is intended to identify and smooth out spurious and minor modes. With mode flattening, the number of clusters can be sequentially reduced until there is only one mode left. In addition, we adopt the likelihood function in a coherent manner to measure the relative importance of a mode and let the current least important mode disappear in each step. For both simulated and real-world data sets, the proposed method performs very well, as compared with some well-known clustering methods in the literature, and can successfully solve some fairly difficult clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Anderson, E.: The irises of the Gaspe peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)

    Google Scholar 

  • Arias-Castro, E., Mason, D., Pelletier, B.: On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. J. Mach. Learn. Res. 17(1), 1–28 (2016)

    MathSciNet  MATH  Google Scholar 

  • Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007)

    Article  MathSciNet  Google Scholar 

  • Azzalini, A., Menardi, G.: Clustering via nonparametric density estimation: the R package pdfCluster. J. Stat. Softw. 57(11), 1–26 (2014)

    Article  MATH  Google Scholar 

  • Cadre, B., Pelletier, B., Pudlo, P.: Estimation of density level sets with a given probability content. J. Nonparametr. Stat. 25(1), 261–272 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Carmichael, J.W., George, J.A., Julius, R.S.: Finding natural clusters. Syst. Zool. 17(2), 144–150 (1968)

    Article  Google Scholar 

  • Chacón, J.E.: A population background for nonparametric density-based clustering. Stat. Sci. 30(4), 518–532 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Genovese, C.R., Wasserman, L.: A comprehensive approach to mode clustering. Electron. J. Stat. 10(1), 210–241 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Genovese, C.R., Wasserman, L.: Statistical inference using the Morse-Smale complex. Electron. J. Stat. 11(1), 1390–1433 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36(4), 441–459 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Dua, D., Graff, C.: UCI machine learning repository (2017)

  • Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)

    Article  Google Scholar 

  • Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Food Research and Data Analysis, pp. 189–214. Applied Science Publishers, London (1983)

  • Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25(3), 189–201 (1986)

    Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fukunaga, K., Hostetler, L.D.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  • Geman, S., Hwang, C.: Nonparametric maximum-likelihood estimation by the method of sieves. Ann. Stat. 10(2), 401–414 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. R. Stat. Soc. Ser. C - Appl. Stat. 18(1), 54–64 (1969)

    MathSciNet  Google Scholar 

  • Grenander, U.: Abstract Inference. Wiley, New York, NY (1981)

    MATH  Google Scholar 

  • Hartigan, J.A.: Clustering Algorithms. Wiley, New York, NY (1975)

    MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  • Laird, N.: Nonparametric maximum likelihood estimation of a mixing distribution. J. Am. Stat. Assoc. 73(364), 805–811 (1978)

    Article  MATH  Google Scholar 

  • Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall Inc, London (1974)

    MATH  Google Scholar 

  • Li, J., Gray, R.M.: Image Segmentation and Compression Using Hidden Markov Models. Springer, Berlin (2000)

    Book  MATH  Google Scholar 

  • Li, J., Ray, S., Lindsay, B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)

    MathSciNet  MATH  Google Scholar 

  • Lindsay, B.G.: The geometry of mixture likelihoods: a general theory. Ann. Stat. 11(1), 86–94 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B.G.: Mixture models: theory, geometry and applications. NSF-CBMS Regional Conference Series in Probability and Statistics 5, i–163 (1995)

    Google Scholar 

  • MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)

  • Melnykov, V.: On the distribution of posterior probabilities in finite mixture models with application in clustering. J. Multivariate Anal. 122, 175–189 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Menardi, G.: A review on modal clustering. Int. Stat. Rev. 84, 413–433 (2016)

    Article  MathSciNet  Google Scholar 

  • Menardi, G., Azzalini, A.: An advancement in clustering via nonparametric density estimation. Stat. Comput. 24(5), 753–767 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Minnotte, M.C., Scott, D.W.: The mode tree: a tool for visualization of nonparametric density features. J. Comput. Graph. Stat. 2(1), 51–68 (1992)

    Google Scholar 

  • Murrell, P.: R Graphics. CRC Press, Boca Raton (2011)

    MATH  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2019)

  • Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  • Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Graph. Stat. 19, 397–418 (2010)

    Article  MathSciNet  Google Scholar 

  • Sugiura, N.: Further analysts of the data by Akaike’s information criterion and the finite corrections. Commun. Stat. - Theory Methods 7(1), 13–26 (1978)

    Article  MATH  Google Scholar 

  • Urbanek, S.: jpeg: read and write JPEG images. R package version 0.1-8 (2014)

  • Wand, M.P., Jones, M.C.: Comparison of smoothing parameterizations in bivariate kernel density estimation. J. Am. Stat. Assoc. 88(422), 520–528 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Y.: On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution. J. R. Stat. Soc. B 69(2), 185–198 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Y.: Maximum likelihood computation for fitting semiparametric mixture models. Stat. Comput. 20(1), 75–86 (2010)

    Article  MathSciNet  Google Scholar 

  • Wang, Y., Chee, C.-S.: Density estimation using non-parametric and semi-parametric mixtures. Stat. Comput. 12, 67–92 (2012)

    MathSciNet  MATH  Google Scholar 

  • Wang, X., Wang, Y.: Nonparametric multivariate density estimation using mixtures. Stat. Comput. 25(2), 349–364 (2015)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the associated editor and two referees for their constructive and insightful suggestions, which led to many improvements in the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, S., Wang, Y. Modal Clustering Using Semiparametric Mixtures and Mode Flattening. Stat Comput 31, 5 (2021). https://doi.org/10.1007/s11222-020-09985-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-020-09985-z

Keywords

Navigation