MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise

Louhichi, Soumaya; Gzara, Mariem; Ben-Abdallah, Hanêne

doi:10.1007/s10619-018-7253-1

MDCUT²: a multi-density clustering algorithm with automatic detection of density variation in data with noise

Published: 16 October 2018

Volume 37, pages 73–99, (2019)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Soumaya Louhichi¹,
Mariem Gzara^1,2 &
Hanêne Ben-Abdallah³

343 Accesses
3 Citations
Explore all metrics

Abstract

Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (k_i,ε_i) to characterize the density levels in the data, where k_i and ε_i stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT² algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

References

Louhichi, S., Gzara, M., Ben-Abdallah, H.: Unsupervised varied density based clustering algorithm using spline. Pattern Recogn. Lett. 93, 48–57 (2017)
Article Google Scholar
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. New Directions in Statistical Physics, pp. 273–309. Springer, Berlin Heidelberg (2004)
Chapter Google Scholar
Ashour, W., Murtaja, M.: Finding within cluster dense regions using distance based technique. I.J. Intell. Syst. Appl. 14, 42–48 (2012)
Google Scholar
Parimala, M., Lophne, D., Senthilkumar, N.C.: A survey on density based clustering algorithms for mining large spatial databases. Int. J. Adv. Sci. Technol. 31, 59 (2011)
Google Scholar
Ester, M., Kriegel, H., Sander, J., and Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Karypis, G., Han, E., Kumar, V.: Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Article Google Scholar
Hinneburg, A., and Keirn, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings on Fourth International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Ertoz, L., Steinbach, M. and Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings on 2nd SIAM International Conference on Data Mining, San Francisco, pp. 1–12 (2003)
Mihael, A., Markus, M.B., Hans-Peter, K., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD, pp. 49–60 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2006)
MATH Google Scholar
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)
Google Scholar
Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
Google Scholar
Xiong, C., Yufang, M., Yan, Z., and Ping, W., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid. In: IEEE International Conference on e-Business Engineering (2008)
Borah, B., Bhattacharyya, D.K.: DDSC: a density differentiated spatial clustering technique. J. Comput. 3, 72–79 (2008)
Article Google Scholar
Ram, A., Jalal, S., Jalal, A.S., Kumar, M.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3, 1 (2010)
Google Scholar
Borah, B. and Bhattacharyya, D. K.: A clustering technique using density difference. In: Proceedings of International Conference on Signal Processing, Communications and Networking, pp. 585–588 (2007)
Tsai, C. and Huang, Y.: DDCT: detecting density differences using a novel clustering technique. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems and Signal Processing, pp. 243–248 (2009)
Wang, W., Zhou, S. and Xiao, Q.: Optimum Vdbscan (O-VDBSCAN) for identifying downtown areas. Int. J. Digit. Inf. Wirel. Commun. 66–71 (2013)
Obula Reddy, B.G., Ussenaiah, M.: Literature survey on clustering techniques. IOSR J. Comput. Eng. 3(1), 01–12 (2012)
Article Google Scholar
Das S., Abraham, A. and Konar, A.: Metaheuristic Clustering 178 (2009)
Estivill-Castroand, V. and Lee, I.: AUTOCLUST: automatic clustering via boundary extraction for massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation (2000)
Gold, C.M.: Problems with handling spatial data—the Voronoi approach. CISM J 45(65–80), 1991 (1991)
Google Scholar
Späth, H.: Exponential spline interpolation. Computing 4, 225–233 (1969)
Article MathSciNet MATH Google Scholar
Grevile, T.N.E.: Spline Functions, interpolation and numerical quadrature. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, vol. II, pp. 156–168. Wiley, New York (1967)
Google Scholar
Rentrop, P.: An algorithm for the computation of the exponential spline. Numer. Math. 35, 81–93 (1980)
Article MathSciNet MATH Google Scholar
Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: IEEE SMC International Conference on Systems, Man and Cybernetics (2004)

Download references

Author information

Authors and Affiliations

MIRACL: Multimedia InforRmation Systems and Advanced Computing Laboratory, BP 1030, Sfax, 3018, Tunisia
Soumaya Louhichi & Mariem Gzara
Higher School of Computer Science and Mathematics at the University of Monastir, Avenue de la Korniche, B.P. 223, Monastir, 5000, Tunisia
Mariem Gzara
Higher Institute of Technology, DBC, Dubai, United Arab Emirates
Hanêne Ben-Abdallah

Authors

Soumaya Louhichi
View author publications
You can also search for this author in PubMed Google Scholar
Mariem Gzara
View author publications
You can also search for this author in PubMed Google Scholar
Hanêne Ben-Abdallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumaya Louhichi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Louhichi, S., Gzara, M. & Ben-Abdallah, H. MDCUT²: a multi-density clustering algorithm with automatic detection of density variation in data with noise. Distrib Parallel Databases 37, 73–99 (2019). https://doi.org/10.1007/s10619-018-7253-1

Download citation

Published: 16 October 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10619-018-7253-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MDCUT²: a multi-density clustering algorithm with automatic detection of density variation in data with noise

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

MDCUT²: a multi-density clustering algorithm with automatic detection of density variation in data with noise