K-DBSCAN: An improved DBSCAN algorithm for big data

Gholizadeh, Nahid; Saadatfar, Hamid; Hanafi, Nooshin

doi:10.1007/s11227-020-03524-3

K-DBSCAN: An improved DBSCAN algorithm for big data

Published: 26 November 2020

Volume 77, pages 6214–6235, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

2659 Accesses
38 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Article 06 February 2020

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Article 26 December 2022

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

References

Storey V, Song I (2017) Big data technologies and management: What conceptual modeling can do. Data KnowlEng 108:50–67
Article Google Scholar
Ianni M, Masciari E, Mazzeo G, Zaniolo C (2018) Efficient big data clustering. In: 22nd International Database Engineering & Applications Symposium, pp 103–109. ACM
Arora P, Deepali D, Varshney S (2016) Analysis of K-Means and K-Medoids algorithm for big data. ProcediaCompuSci 78:507–512
Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM ComputSurv 31(3):264–323
Google Scholar
Zhu J, Zeng M, Huang J, Liao S, Cai C, Zheng L (2020) Vehicle re-identification using quadruple directional deep learning features. IEEE Trans IntellTranspSyst 21(1):410–420
Google Scholar
Liu S, Liu M, Li P, Zhao J, Zhu Z, Wang X (2017) SAR image denoising via sparse representation in Shearlet domain based on continuous cycle spinning. IEEE Trans Geosci Remote Sens 55(5):2985–2992
Article Google Scholar
Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3DACN: 3D augmented convolutional network for time series data. InfSci 513:17–29
Google Scholar
Qiao S, Li T, Li H, Peng J, Chen H (2012) A new blockmodeling based hierarchical clustering algorithm for web social networks. EngApplArtifIntell 25(3):640–647
Google Scholar
Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: International Conference on Database Systems for Advanced Applications, pp 1–15. Springer
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp 226–231.
David A, Sergei V (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp 1027–1035. ACM
Katal A, Wazid M, and Goudar R (2013) Big data: Issues, challenges, tools and good practices. In: 2013 6th International Conference on Contemporary Computing, pp 404–409. IEEE
Shahrivari S (2014) Beyond batch processing: Towards real-time and streaming big data. Computers 3(4):117–129
Article Google Scholar
Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob NetwAppl 19(2):171–209
Article Google Scholar
Shirkhorshidi A, Aghabozorgi S, Wah T, Herawan T (2014) Big data clustering: a review. In: International Conference on Computational Science and its Applications, pp 707–720. Springer
LIU B (2006) A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp 996–1000. IEEE
Wu Y, Guo J, ZHANG X (2007) A linear DBSCAN algorithm based on LSH. In: 2007 International Conference on Machine Learning and Cybernetics, Vol 5, pp 2608–2614. IEEE
Dogan Y, Birant D, Kut A (2013) SOM++: Integration of self-organizing map and K-Means++ algorithms. In: International Conference on Machine Learning and Data Mining in Pattern Recognition, pp 246–259. Springer
Bakr A, Ghanem N, Ismail M (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54(4):1147–1154
Article Google Scholar
Xu T, Chiang H, Liu G, Tan C (2015) Hierarchical K-means method for clustering large-scale advanced metering infrastructure data. IEEE Trans Power Delivery 32(2):609–616
Article Google Scholar
Ismkhan H (2018) I-k-means++: An iterative clustering algorithm based on an enhanced version of the k-means. PattRecogn 79:402–413
Google Scholar
Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method Daniel Brown. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference, pp 0048–0054. IEEE
Mathur V, Mehta J, Singh S (2019) "HCA-DBSCAN: HyperCube accelerated density based spatial clustering for applications with noise," in 33rd Conference on Neural Information Processing Systems (arXiv preprint).
Luchi D, Rodrigues A, Varejao F (2019) Sampling approaches for applying DBSCAN to large datasets. Pattern RecognLett 117:90–96
Article Google Scholar
Chen Y, Zhou L, Pei S, Yu Z, Chen Y, Liu X, Du J, Xiong N (2019) KNN-BLOCK DBSCAN Fast Clustering for Large-Scale Data. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp 1–15.
Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2020) BLOCK-DBSCAN Fast clustering for large scale data. PattRecogn 109:107627
Google Scholar
Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercompu 70(3):1249–1259
Article Google Scholar
Sinha A, Jana P (2016) A novel K-means based clustering algorithm for big data. In: Conference on Advances in Computing, Communications and Informatics, pp 1875–1879. IEEE
Song H, Lee J (2018) RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In: 2018 International Conference on Management of Data, pp 1173–1187.
Li S (2020) An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query. IEEE Access 8:47468–47476
Article Google Scholar
José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: A survey. Appl Soft Compu 41:192–213
Article Google Scholar
Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227
Article Google Scholar
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J ComputAppl Math 20:53–65
Article Google Scholar
UCI. http://archive.ics.uci.edu/ml/index.php. Accessed 1 June 2020
GitHub. https://vincentarelbundock.github.io/Rdatasets/datasets.html. Accessed 1 June 2020

Download references

Author information

Authors and Affiliations

University of Birjand, Birjand, South Khorasan, Iran
Nahid Gholizadeh, Hamid Saadatfar & Nooshin Hanafi

Authors

Nahid Gholizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Saadatfar
View author publications
You can also search for this author in PubMed Google Scholar
Nooshin Hanafi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Saadatfar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 3051 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gholizadeh, N., Saadatfar, H. & Hanafi, N. K-DBSCAN: An improved DBSCAN algorithm for big data. J Supercomput 77, 6214–6235 (2021). https://doi.org/10.1007/s11227-020-03524-3

Download citation

Accepted: 16 November 2020
Published: 26 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11227-020-03524-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-DBSCAN: An improved DBSCAN algorithm for big data

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 3051 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

K-DBSCAN: An improved DBSCAN algorithm for big data

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 3051 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation