Advertisement

Distributed and Parallel Databases

, Volume 37, Issue 1, pp 73–99 | Cite as

MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise

  • Soumaya LouhichiEmail author
  • Mariem Gzara
  • Hanêne Ben-Abdallah
Article
  • 64 Downloads
Part of the following topical collections:
  1. Special Issue on Scientific and Statistical Data Management

Abstract

Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (kii) to characterize the density levels in the data, where ki and εi stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT2 algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.

Keywords

Clustering MDCUT Arbitrary shaped clusters Spline interpolation Varied density 

References

  1. 1.
    Louhichi, S., Gzara, M., Ben-Abdallah, H.: Unsupervised varied density based clustering algorithm using spline. Pattern Recogn. Lett. 93, 48–57 (2017)CrossRefGoogle Scholar
  2. 2.
    Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. New Directions in Statistical Physics, pp. 273–309. Springer, Berlin Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Ashour, W., Murtaja, M.: Finding within cluster dense regions using distance based technique. I.J. Intell. Syst. Appl. 14, 42–48 (2012)Google Scholar
  4. 4.
    Parimala, M., Lophne, D., Senthilkumar, N.C.: A survey on density based clustering algorithms for mining large spatial databases. Int. J. Adv. Sci. Technol. 31, 59 (2011)Google Scholar
  5. 5.
    Ester, M., Kriegel, H., Sander, J., and Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)Google Scholar
  6. 6.
    Karypis, G., Han, E., Kumar, V.: Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)CrossRefGoogle Scholar
  7. 7.
    Hinneburg, A., and Keirn, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings on Fourth International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)Google Scholar
  8. 8.
    Ertoz, L., Steinbach, M. and Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings on 2nd SIAM International Conference on Data Mining, San Francisco, pp. 1–12 (2003)Google Scholar
  9. 9.
    Mihael, A., Markus, M.B., Hans-Peter, K., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD, pp. 49–60 (1999)Google Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2006)zbMATHGoogle Scholar
  11. 11.
    Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)Google Scholar
  12. 12.
    Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)Google Scholar
  13. 13.
    Xiong, C., Yufang, M., Yan, Z., and Ping, W., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid. In: IEEE International Conference on e-Business Engineering (2008)Google Scholar
  14. 14.
    Borah, B., Bhattacharyya, D.K.: DDSC: a density differentiated spatial clustering technique. J. Comput. 3, 72–79 (2008)CrossRefGoogle Scholar
  15. 15.
    Ram, A., Jalal, S., Jalal, A.S., Kumar, M.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3, 1 (2010)Google Scholar
  16. 16.
    Borah, B. and Bhattacharyya, D. K.: A clustering technique using density difference. In: Proceedings of International Conference on Signal Processing, Communications and Networking, pp. 585–588 (2007)Google Scholar
  17. 17.
    Tsai, C. and Huang, Y.: DDCT: detecting density differences using a novel clustering technique. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems and Signal Processing, pp. 243–248 (2009)Google Scholar
  18. 18.
    Wang, W., Zhou, S. and Xiao, Q.: Optimum Vdbscan (O-VDBSCAN) for identifying downtown areas. Int. J. Digit. Inf. Wirel. Commun. 66–71 (2013)Google Scholar
  19. 19.
    Obula Reddy, B.G., Ussenaiah, M.: Literature survey on clustering techniques. IOSR J. Comput. Eng. 3(1), 01–12 (2012)CrossRefGoogle Scholar
  20. 20.
    Das S., Abraham, A. and Konar, A.: Metaheuristic Clustering 178 (2009)Google Scholar
  21. 21.
    Estivill-Castroand, V. and Lee, I.: AUTOCLUST: automatic clustering via boundary extraction for massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation (2000)Google Scholar
  22. 22.
    Gold, C.M.: Problems with handling spatial data—the Voronoi approach. CISM J 45(65–80), 1991 (1991)Google Scholar
  23. 23.
    Späth, H.: Exponential spline interpolation. Computing 4, 225–233 (1969)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Grevile, T.N.E.: Spline Functions, interpolation and numerical quadrature. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, vol. II, pp. 156–168. Wiley, New York (1967)Google Scholar
  25. 25.
    Rentrop, P.: An algorithm for the computation of the exponential spline. Numer. Math. 35, 81–93 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: IEEE SMC International Conference on Systems, Man and Cybernetics (2004)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Soumaya Louhichi
    • 1
    Email author
  • Mariem Gzara
    • 1
    • 2
  • Hanêne Ben-Abdallah
    • 3
  1. 1.MIRACL: Multimedia InforRmation Systems and Advanced Computing LaboratorySfaxTunisia
  2. 2.Higher School of Computer Science and Mathematics at the University of MonastirMonastirTunisia
  3. 3.Higher Institute of Technology, DBCDubaiUnited Arab Emirates

Personalised recommendations