A New Accurate Clustering Approach for Detecting Different Densities in High Dimensional Data

El Malki, Nabil; Cugny, Robin; Teste, Olivier; Ravat, Franck

doi:10.1007/978-3-030-86534-4_16

Nabil El Malki¹³,
Robin Cugny¹³,
Olivier Teste¹³ &
…
Franck Ravat¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12925))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

751 Accesses
2 Citations

Abstract

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Density-based clustering methods have proven to be effective for arbitrary-shaped clusters, but they have difficulties to find low-density clusters, near clusters with similar densities, and clusters in high-dimensional data. Our proposal consists in a new clustering algorithm based on spatial density and probabilistic approach. Sub-clusters are constituted using spatial density represented as probability density function (p.d.f) of pairwise distances between points. To agglomerate similar sub-clusters we combine spatial and probabilistic distances. We show that our approach outperforms main state-of-the-art density-based clustering methods on a wide variety of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering, pp. 231–240, May 2011
Google Scholar
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)
Book Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3, 1–58 (2009)
Article Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise (1996)
Google Scholar
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure, vol. 28, pp. 49–60, June 1999
Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Chapter Google Scholar
Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases, pp. 324–331, January 1998
Google Scholar
Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74825-0_7
Chapter Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20(1), 68–86 (1971)
Google Scholar
Davis, R.A., Lii, K.-S., Politis, D.N.: Remarks on some nonparametric estimates of a density function. In: Selected Works of Murray Rosenblatt. SWPS, pp. 95–100. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-8339-8_13
Chapter Google Scholar
Villani, C.: Optimal Transport: Old and New. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9
Book MATH Google Scholar
Ramdas, A., Garcia, N., Cuturi, M.: On Wasserstein two sample testing and related families of nonparametric tests. Entropy 19, 47 (2015)
Article MathSciNet Google Scholar
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)
Article MathSciNet Google Scholar
Nešetřil, J., Milková, E., Nešetřilová, H.: Otakar borůvka on minimum spanning tree problem translation of both the 1926 papers, comments, history. Discrete Math. 233, 3–36 (2001)
Article MathSciNet Google Scholar
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)
Article Google Scholar
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. J. Royal Stat. Soc. Ser. C 18, 54–64 (1969)
MathSciNet Google Scholar
Brito, M., Chávez, E., Quiroz, A., Yukich, J.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Prob. Lett. 35, 33–42 (1997)
Article MathSciNet Google Scholar
Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IRIT, Toulouse, France
Nabil El Malki, Robin Cugny, Olivier Teste & Franck Ravat

Authors

Nabil El Malki
View author publications
You can also search for this author in PubMed Google Scholar
Robin Cugny
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Teste
View author publications
You can also search for this author in PubMed Google Scholar
Franck Ravat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabil El Malki .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Forli/Cesena, Italy
Matteo Golfarelli
Poznań University of Technology, Poznan, Poland
Robert Wrembel
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
TU Wien, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Malki, N., Cugny, R., Teste, O., Ravat, F. (2021). A New Accurate Clustering Approach for Detecting Different Densities in High Dimensional Data. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-86534-4_16
Published: 05 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86533-7
Online ISBN: 978-3-030-86534-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics