Abstract
Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (ki,εi) to characterize the density levels in the data, where ki and εi stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT2 algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.
Similar content being viewed by others
References
Louhichi, S., Gzara, M., Ben-Abdallah, H.: Unsupervised varied density based clustering algorithm using spline. Pattern Recogn. Lett. 93, 48–57 (2017)
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. New Directions in Statistical Physics, pp. 273–309. Springer, Berlin Heidelberg (2004)
Ashour, W., Murtaja, M.: Finding within cluster dense regions using distance based technique. I.J. Intell. Syst. Appl. 14, 42–48 (2012)
Parimala, M., Lophne, D., Senthilkumar, N.C.: A survey on density based clustering algorithms for mining large spatial databases. Int. J. Adv. Sci. Technol. 31, 59 (2011)
Ester, M., Kriegel, H., Sander, J., and Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Karypis, G., Han, E., Kumar, V.: Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Hinneburg, A., and Keirn, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings on Fourth International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Ertoz, L., Steinbach, M. and Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings on 2nd SIAM International Conference on Data Mining, San Francisco, pp. 1–12 (2003)
Mihael, A., Markus, M.B., Hans-Peter, K., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD, pp. 49–60 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2006)
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)
Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
Xiong, C., Yufang, M., Yan, Z., and Ping, W., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid. In: IEEE International Conference on e-Business Engineering (2008)
Borah, B., Bhattacharyya, D.K.: DDSC: a density differentiated spatial clustering technique. J. Comput. 3, 72–79 (2008)
Ram, A., Jalal, S., Jalal, A.S., Kumar, M.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3, 1 (2010)
Borah, B. and Bhattacharyya, D. K.: A clustering technique using density difference. In: Proceedings of International Conference on Signal Processing, Communications and Networking, pp. 585–588 (2007)
Tsai, C. and Huang, Y.: DDCT: detecting density differences using a novel clustering technique. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems and Signal Processing, pp. 243–248 (2009)
Wang, W., Zhou, S. and Xiao, Q.: Optimum Vdbscan (O-VDBSCAN) for identifying downtown areas. Int. J. Digit. Inf. Wirel. Commun. 66–71 (2013)
Obula Reddy, B.G., Ussenaiah, M.: Literature survey on clustering techniques. IOSR J. Comput. Eng. 3(1), 01–12 (2012)
Das S., Abraham, A. and Konar, A.: Metaheuristic Clustering 178 (2009)
Estivill-Castroand, V. and Lee, I.: AUTOCLUST: automatic clustering via boundary extraction for massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation (2000)
Gold, C.M.: Problems with handling spatial data—the Voronoi approach. CISM J 45(65–80), 1991 (1991)
Späth, H.: Exponential spline interpolation. Computing 4, 225–233 (1969)
Grevile, T.N.E.: Spline Functions, interpolation and numerical quadrature. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, vol. II, pp. 156–168. Wiley, New York (1967)
Rentrop, P.: An algorithm for the computation of the exponential spline. Numer. Math. 35, 81–93 (1980)
Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: IEEE SMC International Conference on Systems, Man and Cybernetics (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Louhichi, S., Gzara, M. & Ben-Abdallah, H. MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise. Distrib Parallel Databases 37, 73–99 (2019). https://doi.org/10.1007/s10619-018-7253-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-018-7253-1