Skip to main content

A New Accurate Clustering Approach for Detecting Different Densities in High Dimensional Data

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12925))

Included in the following conference series:

Abstract

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Density-based clustering methods have proven to be effective for arbitrary-shaped clusters, but they have difficulties to find low-density clusters, near clusters with similar densities, and clusters in high-dimensional data. Our proposal consists in a new clustering algorithm based on spatial density and probabilistic approach. Sub-clusters are constituted using spatial density represented as probability density function (p.d.f) of pairwise distances between points. To agglomerate similar sub-clusters we combine spatial and probabilistic distances. We show that our approach outperforms main state-of-the-art density-based clustering methods on a wide variety of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    Article  Google Scholar 

  2. Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering, pp. 231–240, May 2011

    Google Scholar 

  3. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)

    Book  Google Scholar 

  4. Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3, 1–58 (2009)

    Article  Google Scholar 

  5. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise (1996)

    Google Scholar 

  6. Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure, vol. 28, pp. 49–60, June 1999

    Google Scholar 

  7. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14

    Chapter  Google Scholar 

  8. Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases, pp. 324–331, January 1998

    Google Scholar 

  9. Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74825-0_7

    Chapter  Google Scholar 

  10. Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)

    MATH  Google Scholar 

  11. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20(1), 68–86 (1971)

    Google Scholar 

  12. Davis, R.A., Lii, K.-S., Politis, D.N.: Remarks on some nonparametric estimates of a density function. In: Selected Works of Murray Rosenblatt. SWPS, pp. 95–100. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-8339-8_13

    Chapter  Google Scholar 

  13. Villani, C.: Optimal Transport: Old and New. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9

    Book  MATH  Google Scholar 

  14. Ramdas, A., Garcia, N., Cuturi, M.: On Wasserstein two sample testing and related families of nonparametric tests. Entropy 19, 47 (2015)

    Article  MathSciNet  Google Scholar 

  15. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)

    Article  MathSciNet  Google Scholar 

  16. Nešetřil, J., Milková, E., Nešetřilová, H.: Otakar borůvka on minimum spanning tree problem translation of both the 1926 papers, comments, history. Discrete Math. 233, 3–36 (2001)

    Article  MathSciNet  Google Scholar 

  17. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)

    Article  Google Scholar 

  18. Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. J. Royal Stat. Soc. Ser. C 18, 54–64 (1969)

    MathSciNet  Google Scholar 

  19. Brito, M., Chávez, E., Quiroz, A., Yukich, J.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Prob. Lett. 35, 33–42 (1997)

    Article  MathSciNet  Google Scholar 

  20. Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabil El Malki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Malki, N., Cugny, R., Teste, O., Ravat, F. (2021). A New Accurate Clustering Approach for Detecting Different Densities in High Dimensional Data. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86534-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86533-7

  • Online ISBN: 978-3-030-86534-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics