Advertisement

The Journal of Supercomputing

, Volume 75, Issue 1, pp 142–169 | Cite as

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

  • Jeong-Hun Kim
  • Jong-Hyeok Choi
  • Kwan-Hee Yoo
  • Aziz NasridinovEmail author
Article
  • 161 Downloads

Abstract

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, \( \varepsilon \)-distance and \( MinPts \). However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive \( \varepsilon \)-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.

Keywords

Density-based clustering DBSCAN Approximation Adaptation Partitioning 

Notes

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035729).

References

  1. 1.
    Lv Y, Ma T, Tang M et al (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22.  https://doi.org/10.1016/j.neucom.2015.05.109 CrossRefGoogle Scholar
  2. 2.
    Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, WalthamzbMATHGoogle Scholar
  3. 3.
    Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997.  https://doi.org/10.1016/j.patcog.2016.07.007 CrossRefGoogle Scholar
  4. 4.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231Google Scholar
  5. 5.
    Wang X, Hamilton HJ (2003) DBRS: a density-based spatial clustering method with random sampling. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 563–575.  https://doi.org/10.1007/3-540-36175-8_56
  6. 6.
    Roy S, Bhattacharyya DK (2005) An approach to find embedded clusters using density based techniques. In: International Conference on Distributed Computing and Internet Technology, pp 523–535.  https://doi.org/10.1007/11604655_59
  7. 7.
    Zhou A, Zhou S, Cao J et al (2000) Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol 15(6):509–526.  https://doi.org/10.1007/BF02948834 CrossRefzbMATHGoogle Scholar
  8. 8.
    Xiong Z, Chen R, Zhang Y, Zhang X (2012) Multi-density DBSCAN algorithm based on density levels partitioning. J Inform Comput Sci 9(10):2739–2749Google Scholar
  9. 9.
    El-Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp 673–677.  https://doi.org/10.1109/ictai.2004.27
  10. 10.
    Xiaoyun C, Yufang M, Yan Z, Ping W (2008) GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on e-Business Engineering, pp 780–783.  https://doi.org/10.1109/ICEBE.2008.54
  11. 11.
    Jiang H, Li J, Yi S et al (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38(8):9373–9381.  https://doi.org/10.1016/j.eswa.2011.01.135 CrossRefGoogle Scholar
  12. 12.
    Chen X, Liu W, Qiu H, Lai J (2011) APSCAN: a parameter free algorithm for clustering. Pattern Recogn Lett 32(7):973–986.  https://doi.org/10.1016/j.patrec.2011.02.001 CrossRefGoogle Scholar
  13. 13.
    Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193.  https://doi.org/10.1109/TIP.2016.2559803 MathSciNetCrossRefGoogle Scholar
  14. 14.
    Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60.  https://doi.org/10.1145/304182.304187 CrossRefGoogle Scholar
  15. 15.
    Liu P, Zhou D, Wu N (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: International Conference on Service Systems and Service Management, pp 1–4.  https://doi.org/10.1109/ICSSSM.2007.4280175
  16. 16.
    Jahirabadkar S, Kulkarni P (2014) Algorithm to determine ε-distance parameter in density based clustering. Expert Syst Appl 41(6):2939–2946.  https://doi.org/10.1016/j.eswa.2013.10.025 CrossRefGoogle Scholar
  17. 17.
    Huang TQ, Yu YQ, Li K, Zeng WF (2009) Reckon the parameter of dbscan for multi-density data sets with constraints. Int Conf Artif Intell Comput Intell 4:375–379.  https://doi.org/10.1109/AICI.2009.393 Google Scholar
  18. 18.
    Xu X, Jäger J, Kriegel H-P (1999) A fast parallel clustering algorithm for large spatial databases. Data Min Knowl Disccov 3(3):263–290.  https://doi.org/10.1007/0-306-47011-X_3 CrossRefGoogle Scholar
  19. 19.
    Lumer ED, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508Google Scholar
  20. 20.
    Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108zbMATHGoogle Scholar
  21. 21.
    Limwattanapibool O, Arch-int S (2017) Determination of the appropriate parameters for K-means clustering using selection of region clusters based on density DBSCAN (SRCD-DBSCAN). Expert Syst.  https://doi.org/10.1111/exsy.12204 Google Scholar
  22. 22.
    Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp 47–58.  https://doi.org/10.1137/1.9781611972733.5
  23. 23.
    Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496.  https://doi.org/10.1126/science.1242072 CrossRefGoogle Scholar
  24. 24.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619.  https://doi.org/10.1109/34.1000236 CrossRefGoogle Scholar
  25. 25.
    Liu X, Yang Q, He L (2017) A novel DBSCAN with entropy and probability for mixed data. Cluster Comput 20(2):1313–1323.  https://doi.org/10.1007/s10586-017-0818-3 CrossRefGoogle Scholar
  26. 26.
    Kim J, Lee W, Song JJ, Lee SB (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148.  https://doi.org/10.1007/s10586-017-0763-1 CrossRefGoogle Scholar
  27. 27.
    Lulli A, Dell’Amico M, Michiardi P, Ricci L (2016) NG-DBSCAN: scalable density-based clustering for arbitrary data. Proc VLDB Endow 10(3):157–168.  https://doi.org/10.14778/3021924.3021932 CrossRefGoogle Scholar
  28. 28.
    Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are Evaluation Methods, Metrics and Resources Reusable? pp 51–56Google Scholar
  29. 29.
    Duan L, Xu L, Guo F et al (2007) A local-density based spatial clustering algorithm with noise. Inform Syst 32(7):978–986.  https://doi.org/10.1016/j.is.2006.10.006 CrossRefGoogle Scholar
  30. 30.
    Machine Learning. Clustering datasets (2016) http://cs.joensuu.fi/sipu/datasets
  31. 31.
    Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
  32. 32.
    Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220.  https://doi.org/10.1016/j.knosys.2017.07.010 CrossRefGoogle Scholar
  33. 33.
    Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM Sigmod Rec 19(2):322–331.  https://doi.org/10.1145/93597.98741 CrossRefGoogle Scholar
  34. 34.
    Loh WK, Yu H (2015) Fast density-based clustering through dataset partition using graphics processing units. Inf Sci 308:94–112.  https://doi.org/10.1016/j.ins.2014.10.023 CrossRefGoogle Scholar
  35. 35.
    Andrade G, Ramos G et al (2013) G-dbscan: a gpu accelerated algorithm for density-based clustering. Proc Comput Sci 18:369–378.  https://doi.org/10.1016/j.procs.2013.05.200 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Jeong-Hun Kim
    • 1
  • Jong-Hyeok Choi
    • 1
  • Kwan-Hee Yoo
    • 1
  • Aziz Nasridinov
    • 1
    Email author
  1. 1.Department of Computer ScienceChungbuk National UniversityCheongju-siKorea

Personalised recommendations