Neighborhood density information in clustering

Abstract

Density Based Clustering (DBC) methods are capable of identifying arbitrary shaped data clusters in the presence of noise. DBC methods are based on the notion of local neighborhood density estimation. A major drawback of DBC methods is their poor performance in high-dimensions. In this work, a novel DBC method that performs well in high-dimensions is presented. The novelty of the proposed method can be summed up as follows: a hybrid first-second order optimization algorithm for identifying high-density data points; an adaptive scan radius for identifying reachable points. Theoretical results on the validity of the proposed method are presented in this work. The effectiveness and efficiency of the proposed approach are illustrated via rigorous experimental evaluations. The proposed method is compared with the well known DBC methods on synthetic and real data from the literature. Both internal and external cluster validation measures are used to evaluate the performance of the proposed method.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, pp. 281–297 (1967)

  2. 2.

    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  3. 3.

    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  4. 4.

    Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications, vol. 20. Siam (2007)

  5. 5.

    Yang, M-S: A survey of fuzzy clustering. Math. Comput. Modell. 18(11), 1–16 (1993)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (3), 231–240 (2011)

    Google Scholar 

  7. 7.

    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp. 226–231 (1996)

  8. 8.

    Campello, R.JGB, Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Disc. 27(3), 344–371 (2013)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Mount, D.M.: http://www.cs.umd.edu/~mount/ANN/ (2010)

  10. 10.

    Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)

    Article  Google Scholar 

  11. 11.

    Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure. In: ACM Sigmod record. ACM, pp. 49–60 (1999)

  12. 12.

    Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM, pp. 70–81 (2000)

  13. 13.

    Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Amer. Stat. Assoc. 97(458), 611–631 (2002)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Spurek, P., Tabor, J., Byrski, K.: Active function cross-entropy clustering. Expert Syst. Appl. 72, 49–66 (2017)

    Article  Google Scholar 

  15. 15.

    Ertöz, L, Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM, pp. 47–58 (2003)

  16. 16.

    Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Azzalini, A., Menardi, G., et al.: Clustering via nonparametric density estimation: The r package pdfcluster. J. Stat. Softw. 57(11), 1–26 (2014)

    Article  Google Scholar 

  18. 18.

    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  19. 19.

    Tabor, J., Spurek, P.: Cross-entropy clustering. Pattern Recogn. 47(9), 3046–3059 (2014)

    Article  Google Scholar 

  20. 20.

    Sander, J.: Density-based clustering, pp 270–273. Springer US, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_211

    Google Scholar 

  21. 21.

    Celebi, M.E.: Partitional clustering algorithms. Springer (2014)

  22. 22.

    Ultsch, A.: Clustering with som: U*c. In: Proceedings of the 5th Workshop on Self-Organizing Maps, vol. 2, pp. 75–82 (2005)

  23. 23.

    Leisch, F., Dimitriadou, E.: Package ‘mlbench’ (2013)

  24. 24.

    Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28 (11), 1875–1881 (2006)

    Article  Google Scholar 

Download references

Acknowledgements

The author would like to acknowledge the research support provided by King Fahd University of Petroleum & Minerals (KFUPM).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mujahid N. Syed.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Syed, M.N. Neighborhood density information in clustering. Ann Math Artif Intell (2021). https://doi.org/10.1007/s10472-021-09744-4

Download citation

Keywords

  • Data clustering
  • Nonlinear optimization
  • Density estimation

Mathematics Subject Classification 2010

  • 16:90XX
  • 27:90C30
  • 62H30:1:91C20