Skip to main content
Log in

MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (kii) to characterize the density levels in the data, where ki and εi stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT2 algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Louhichi, S., Gzara, M., Ben-Abdallah, H.: Unsupervised varied density based clustering algorithm using spline. Pattern Recogn. Lett. 93, 48–57 (2017)

    Article  Google Scholar 

  2. Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. New Directions in Statistical Physics, pp. 273–309. Springer, Berlin Heidelberg (2004)

    Chapter  Google Scholar 

  3. Ashour, W., Murtaja, M.: Finding within cluster dense regions using distance based technique. I.J. Intell. Syst. Appl. 14, 42–48 (2012)

    Google Scholar 

  4. Parimala, M., Lophne, D., Senthilkumar, N.C.: A survey on density based clustering algorithms for mining large spatial databases. Int. J. Adv. Sci. Technol. 31, 59 (2011)

    Google Scholar 

  5. Ester, M., Kriegel, H., Sander, J., and Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)

  6. Karypis, G., Han, E., Kumar, V.: Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)

    Article  Google Scholar 

  7. Hinneburg, A., and Keirn, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings on Fourth International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)

  8. Ertoz, L., Steinbach, M. and Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings on 2nd SIAM International Conference on Data Mining, San Francisco, pp. 1–12 (2003)

  9. Mihael, A., Markus, M.B., Hans-Peter, K., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD, pp. 49–60 (1999)

  10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2006)

    MATH  Google Scholar 

  11. Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)

    Google Scholar 

  12. Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)

    Google Scholar 

  13. Xiong, C., Yufang, M., Yan, Z., and Ping, W., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid. In: IEEE International Conference on e-Business Engineering (2008)

  14. Borah, B., Bhattacharyya, D.K.: DDSC: a density differentiated spatial clustering technique. J. Comput. 3, 72–79 (2008)

    Article  Google Scholar 

  15. Ram, A., Jalal, S., Jalal, A.S., Kumar, M.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3, 1 (2010)

    Google Scholar 

  16. Borah, B. and Bhattacharyya, D. K.: A clustering technique using density difference. In: Proceedings of International Conference on Signal Processing, Communications and Networking, pp. 585–588 (2007)

  17. Tsai, C. and Huang, Y.: DDCT: detecting density differences using a novel clustering technique. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems and Signal Processing, pp. 243–248 (2009)

  18. Wang, W., Zhou, S. and Xiao, Q.: Optimum Vdbscan (O-VDBSCAN) for identifying downtown areas. Int. J. Digit. Inf. Wirel. Commun. 66–71 (2013)

  19. Obula Reddy, B.G., Ussenaiah, M.: Literature survey on clustering techniques. IOSR J. Comput. Eng. 3(1), 01–12 (2012)

    Article  Google Scholar 

  20. Das S., Abraham, A. and Konar, A.: Metaheuristic Clustering 178 (2009)

  21. Estivill-Castroand, V. and Lee, I.: AUTOCLUST: automatic clustering via boundary extraction for massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation (2000)

  22. Gold, C.M.: Problems with handling spatial data—the Voronoi approach. CISM J 45(65–80), 1991 (1991)

    Google Scholar 

  23. Späth, H.: Exponential spline interpolation. Computing 4, 225–233 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  24. Grevile, T.N.E.: Spline Functions, interpolation and numerical quadrature. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, vol. II, pp. 156–168. Wiley, New York (1967)

    Google Scholar 

  25. Rentrop, P.: An algorithm for the computation of the exponential spline. Numer. Math. 35, 81–93 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: IEEE SMC International Conference on Systems, Man and Cybernetics (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumaya Louhichi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Louhichi, S., Gzara, M. & Ben-Abdallah, H. MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise. Distrib Parallel Databases 37, 73–99 (2019). https://doi.org/10.1007/s10619-018-7253-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7253-1

Keywords

Navigation