Skip to main content
Log in

K-DBSCAN: An improved DBSCAN algorithm for big data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Storey V, Song I (2017) Big data technologies and management: What conceptual modeling can do. Data KnowlEng 108:50–67

    Article  Google Scholar 

  2. Ianni M, Masciari E, Mazzeo G, Zaniolo C (2018) Efficient big data clustering. In: 22nd International Database Engineering & Applications Symposium, pp 103–109. ACM

  3. Arora P, Deepali D, Varshney S (2016) Analysis of K-Means and K-Medoids algorithm for big data. ProcediaCompuSci 78:507–512

    Google Scholar 

  4. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM ComputSurv 31(3):264–323

    Google Scholar 

  5. Zhu J, Zeng M, Huang J, Liao S, Cai C, Zheng L (2020) Vehicle re-identification using quadruple directional deep learning features. IEEE Trans IntellTranspSyst 21(1):410–420

    Google Scholar 

  6. Liu S, Liu M, Li P, Zhao J, Zhu Z, Wang X (2017) SAR image denoising via sparse representation in Shearlet domain based on continuous cycle spinning. IEEE Trans Geosci Remote Sens 55(5):2985–2992

    Article  Google Scholar 

  7. Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3DACN: 3D augmented convolutional network for time series data. InfSci 513:17–29

    Google Scholar 

  8. Qiao S, Li T, Li H, Peng J, Chen H (2012) A new blockmodeling based hierarchical clustering algorithm for web social networks. EngApplArtifIntell 25(3):640–647

    Google Scholar 

  9. Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: International Conference on Database Systems for Advanced Applications, pp 1–15. Springer

  10. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp 226–231.

  11. David A, Sergei V (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp 1027–1035. ACM

  12. Katal A, Wazid M, and Goudar R (2013) Big data: Issues, challenges, tools and good practices. In: 2013 6th International Conference on Contemporary Computing, pp 404–409. IEEE

  13. Shahrivari S (2014) Beyond batch processing: Towards real-time and streaming big data. Computers 3(4):117–129

    Article  Google Scholar 

  14. Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob NetwAppl 19(2):171–209

    Article  Google Scholar 

  15. Shirkhorshidi A, Aghabozorgi S, Wah T, Herawan T (2014) Big data clustering: a review. In: International Conference on Computational Science and its Applications, pp 707–720. Springer

  16. LIU B (2006) A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp 996–1000. IEEE

  17. Wu Y, Guo J, ZHANG X (2007) A linear DBSCAN algorithm based on LSH. In: 2007 International Conference on Machine Learning and Cybernetics, Vol 5, pp 2608–2614. IEEE

  18. Dogan Y, Birant D, Kut A (2013) SOM++: Integration of self-organizing map and K-Means++ algorithms. In: International Conference on Machine Learning and Data Mining in Pattern Recognition, pp 246–259. Springer

  19. Bakr A, Ghanem N, Ismail M (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54(4):1147–1154

    Article  Google Scholar 

  20. Xu T, Chiang H, Liu G, Tan C (2015) Hierarchical K-means method for clustering large-scale advanced metering infrastructure data. IEEE Trans Power Delivery 32(2):609–616

    Article  Google Scholar 

  21. Ismkhan H (2018) I-k-means++: An iterative clustering algorithm based on an enhanced version of the k-means. PattRecogn 79:402–413

    Google Scholar 

  22. Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method Daniel Brown. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference, pp 0048–0054. IEEE

  23. Mathur V, Mehta J, Singh S (2019) "HCA-DBSCAN: HyperCube accelerated density based spatial clustering for applications with noise," in 33rd Conference on Neural Information Processing Systems (arXiv preprint).

  24. Luchi D, Rodrigues A, Varejao F (2019) Sampling approaches for applying DBSCAN to large datasets. Pattern RecognLett 117:90–96

    Article  Google Scholar 

  25. Chen Y, Zhou L, Pei S, Yu Z, Chen Y, Liu X, Du J, Xiong N (2019) KNN-BLOCK DBSCAN Fast Clustering for Large-Scale Data. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp 1–15.

  26. Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2020) BLOCK-DBSCAN Fast clustering for large scale data. PattRecogn 109:107627

    Google Scholar 

  27. Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercompu 70(3):1249–1259

    Article  Google Scholar 

  28. Sinha A, Jana P (2016) A novel K-means based clustering algorithm for big data. In: Conference on Advances in Computing, Communications and Informatics, pp 1875–1879. IEEE

  29. Song H, Lee J (2018) RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In: 2018 International Conference on Management of Data, pp 1173–1187.

  30. Li S (2020) An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query. IEEE Access 8:47468–47476

    Article  Google Scholar 

  31. José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: A survey. Appl Soft Compu 41:192–213

    Article  Google Scholar 

  32. Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227

    Article  Google Scholar 

  33. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J ComputAppl Math 20:53–65

    Article  Google Scholar 

  34. UCI. http://archive.ics.uci.edu/ml/index.php. Accessed 1 June 2020

  35. GitHub. https://vincentarelbundock.github.io/Rdatasets/datasets.html. Accessed 1 June 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Saadatfar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 3051 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gholizadeh, N., Saadatfar, H. & Hanafi, N. K-DBSCAN: An improved DBSCAN algorithm for big data. J Supercomput 77, 6214–6235 (2021). https://doi.org/10.1007/s11227-020-03524-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03524-3

Keywords

Navigation