Abstract
The use of distributed clustering is an important method of solving large-scale data mining problems. There are still some problems associated with distributed clustering, such as a performance bottleneck on the master node and network congestion caused by global broadcasting. This paper proposes a decentralized clustering method based on density clustering and the content-addressable network technique. It can form a cluster with excellent scalability and load balancing capabilities based on several surrounding nodes. In addition, a method is presented for optimizing the way clustering results are gathered in different application scenarios. Based on our extensive experiments, the proposed approach performs three times better than benchmark algorithms in terms of efficiency and has a stable expanding ratio of about 0.6 for large-scale data sets.
Similar content being viewed by others
Data availability
Availability of data is dependent on the request of the researchers.
References
Zhang Y, Zhou Y, School S (2019) Review of clustering algorithms. J Comput Appl
Barbakh WA, Ying W, Fyfe C (2009) Review of clustering algorithms. Springer, Berlin Heidelb
Bajal E, Katara V, Bhatia M, Hooda M (2021) A review of clustering algorithms: comparison of DBSCAN and K-mean with oversampling and t-SNE. Recent Patents Eng 15:17–31
Hai M, Zhang SY, Yan-Lin MA (2013) Algorithm review of distributed clustering problem in distributed environments. Appl Res Comput 30(9):2561–2564
Djouzi K, Beghdad-Bey K (2019) A review of clustering algorithms for big data. In: international conference on networking and advanced systems
Luo P, Huang Q, Tung A (2021) A generic distributed clustering framework for massive data
Januzaj E, Kriegel HP, Pfeifle M (2004) DBDC: density based distributed clustering, DBLP
Liu LI (2010) K-DmeansWM: an effective distributed clustering algorithm based on P2P. Comput Sci 37(1):39–41
Ester M (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc int conf knowledg Discov Data Min
Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content-addressable network. ACM SIGCOMM Comput Commun Rev 31(4)
Ryu HC, Jung S (2020) MapReduce-based distributed clustering method using CF+ tree. IEEE Access 8:104232–104246
Sardar TH, Ansari Z (2021) MapReduce-based Fuzzy C-means algorithm for distributed document clustering
Sardar TH, Ansari Z (2021) Distributed big data clustering using MapReduce-based fuzzy C-medoids. J Inst Eng Ser B 103:1–10
Dasari CM, Bhukya R (2022) MapReduce paradigm: DNA sequence clustering based on repeats as features. Expert Syst 39:e12827
Hu QZYLJZKZQWL (2022) Parallel spectral clustering based on MapReduce. Zte Commun Technol English version no. 2
Abdallah AE (2021) A robust distributed clustering of large data sets on a grid of commodity machines. Data 6:73
Yu D, Ying Y, Ha Ng LZ, Liu C, Zheng H (2020) Balanced scheduling of distributed workflow tasks based on clustering. Knowledge-Based Syst 199:105930
Geng YA, Li Q, Liang M, Chi CY, Tan J, Huang H (2020) Local-density subspace distributed clustering for high-dimensional data. IEEE Trans Parallel Distrib Syst 31(8):1799–1814
Tong HE, Wei-Hong XU, Hong-Hua MA, Zeng SL (2019) An efficient distributed clustering algorithm based on peak density. Comput Technol Autom
Corizzo R, Pio G, Ceci M, Malerba D (2019) DENCAST: distributed density-based clustering for multi-target regression. J Big Data 6:1–27
Januzaj E, Kriegel HP, Pfeifle M (2004) Towards effective and efficient distributed clustering. Work Clust Large Data Sets
Demirci S, Yardimci A, Sayit M, Tunali ET, Bulut H (2017) A hierarchical P2P clustering framework for video streaming systems. Comput Stand Interfaces 49:44–58
Kai G, Liu Z (2008) A new efficient hierarchical distributed P2P clustering algorithm. In: fifth international conference on fuzzy systems & knowledge discovery
Yang L, Zhong C, Xiang-Yan LU (2009) Advances for distributed clustering algorithms based on P2P networks. Microelectron Comput 26(8):83–85
Mo H, Guo S (2010) A distributed node clustering mechanism in P2P networks. In: advanced data mining and applications-6th international conference, ADMA 2010, Chongqing, China, Proceedings, Part II, 19-21 November 2010
Li M, Lee G, Lee WC, Sivasubramaniam A (2006) PENS: an algorithm for density-based clustering in peer-to-peer systems. In: international conference on scalable information systems
Jagadish HV (2005) BATON: a balanced tree structure for peer-to-peer networks. In: international conference on very large data bases
Rowstron A (2003) Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Ifip/acm Int Conf Distrib Syst Platforms Open Distrib Process, Springer, 2003
Stoica I, Morris R, Karger D, Kaashoek F, Balakr-Ishnan H (2001) Chord: a scalable content-addressable network. In: Proc Acm Sigcomm
He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front Comput Sci 8:83–99
Acknowledgements
Support for this work has been provided by Shandong University of Finance and Economics, Jinan, China. Furthermore, the author wishes to express his deep appreciation for the valuable time that was spent by the anonymous referees.
Funding
In this study, the Science and Technology Plan for Colleges and Universities in Shandong Province provided support (KJ2018BAN046).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest among the authors. Meanwhile, the authors declared that the work described was original research that had not previously been published and that the work was not being considered for publication elsewhere, in whole or in part.
Informed consent
A copy of this manuscript has been read by all authors, and they are willing to proceed with its publication.
Ethical approval
The study in this manuscript does not require ethical approval.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, L. DDCM: a decentralized density clustering and its results gathering approach. Neural Comput & Applic 35, 24743–24754 (2023). https://doi.org/10.1007/s00521-023-08392-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08392-5