Skip to main content
Log in

DDCM: a decentralized density clustering and its results gathering approach

  • S.I.: Applications and Techniques in Cyber Intelligence (ATCI2022)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The use of distributed clustering is an important method of solving large-scale data mining problems. There are still some problems associated with distributed clustering, such as a performance bottleneck on the master node and network congestion caused by global broadcasting. This paper proposes a decentralized clustering method based on density clustering and the content-addressable network technique. It can form a cluster with excellent scalability and load balancing capabilities based on several surrounding nodes. In addition, a method is presented for optimizing the way clustering results are gathered in different application scenarios. Based on our extensive experiments, the proposed approach performs three times better than benchmark algorithms in terms of efficiency and has a stable expanding ratio of about 0.6 for large-scale data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Availability of data is dependent on the request of the researchers.

References

  1. Zhang Y, Zhou Y, School S (2019) Review of clustering algorithms. J Comput Appl

  2. Barbakh WA, Ying W, Fyfe C (2009) Review of clustering algorithms. Springer, Berlin Heidelb

    Book  Google Scholar 

  3. Bajal E, Katara V, Bhatia M, Hooda M (2021) A review of clustering algorithms: comparison of DBSCAN and K-mean with oversampling and t-SNE. Recent Patents Eng 15:17–31

    Google Scholar 

  4. Hai M, Zhang SY, Yan-Lin MA (2013) Algorithm review of distributed clustering problem in distributed environments. Appl Res Comput 30(9):2561–2564

    Google Scholar 

  5. Djouzi K, Beghdad-Bey K (2019) A review of clustering algorithms for big data. In: international conference on networking and advanced systems

  6. Luo P, Huang Q, Tung A (2021) A generic distributed clustering framework for massive data

  7. Januzaj E, Kriegel HP, Pfeifle M (2004) DBDC: density based distributed clustering, DBLP

  8. Liu LI (2010) K-DmeansWM: an effective distributed clustering algorithm based on P2P. Comput Sci 37(1):39–41

    Google Scholar 

  9. Ester M (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc int conf knowledg Discov Data Min

  10. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content-addressable network. ACM SIGCOMM Comput Commun Rev 31(4)

  11. Ryu HC, Jung S (2020) MapReduce-based distributed clustering method using CF+ tree. IEEE Access 8:104232–104246

    Article  Google Scholar 

  12. Sardar TH, Ansari Z (2021) MapReduce-based Fuzzy C-means algorithm for distributed document clustering

  13. Sardar TH, Ansari Z (2021) Distributed big data clustering using MapReduce-based fuzzy C-medoids. J Inst Eng Ser B 103:1–10

    Google Scholar 

  14. Dasari CM, Bhukya R (2022) MapReduce paradigm: DNA sequence clustering based on repeats as features. Expert Syst 39:e12827

    Article  Google Scholar 

  15. Hu QZYLJZKZQWL (2022) Parallel spectral clustering based on MapReduce. Zte Commun Technol English version no. 2

  16. Abdallah AE (2021) A robust distributed clustering of large data sets on a grid of commodity machines. Data 6:73

    Article  Google Scholar 

  17. Yu D, Ying Y, Ha Ng LZ, Liu C, Zheng H (2020) Balanced scheduling of distributed workflow tasks based on clustering. Knowledge-Based Syst 199:105930

    Article  Google Scholar 

  18. Geng YA, Li Q, Liang M, Chi CY, Tan J, Huang H (2020) Local-density subspace distributed clustering for high-dimensional data. IEEE Trans Parallel Distrib Syst 31(8):1799–1814

    Article  Google Scholar 

  19. Tong HE, Wei-Hong XU, Hong-Hua MA, Zeng SL (2019) An efficient distributed clustering algorithm based on peak density. Comput Technol Autom

  20. Corizzo R, Pio G, Ceci M, Malerba D (2019) DENCAST: distributed density-based clustering for multi-target regression. J Big Data 6:1–27

    Article  Google Scholar 

  21. Januzaj E, Kriegel HP, Pfeifle M (2004) Towards effective and efficient distributed clustering. Work Clust Large Data Sets

  22. Demirci S, Yardimci A, Sayit M, Tunali ET, Bulut H (2017) A hierarchical P2P clustering framework for video streaming systems. Comput Stand Interfaces 49:44–58

    Article  Google Scholar 

  23. Kai G, Liu Z (2008) A new efficient hierarchical distributed P2P clustering algorithm. In: fifth international conference on fuzzy systems & knowledge discovery

  24. Yang L, Zhong C, Xiang-Yan LU (2009) Advances for distributed clustering algorithms based on P2P networks. Microelectron Comput 26(8):83–85

    Google Scholar 

  25. Mo H, Guo S (2010) A distributed node clustering mechanism in P2P networks. In: advanced data mining and applications-6th international conference, ADMA 2010, Chongqing, China, Proceedings, Part II, 19-21 November 2010

  26. Li M, Lee G, Lee WC, Sivasubramaniam A (2006) PENS: an algorithm for density-based clustering in peer-to-peer systems. In: international conference on scalable information systems

  27. Jagadish HV (2005) BATON: a balanced tree structure for peer-to-peer networks. In: international conference on very large data bases

  28. Rowstron A (2003) Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Ifip/acm Int Conf Distrib Syst Platforms Open Distrib Process, Springer, 2003

  29. Stoica I, Morris R, Karger D, Kaashoek F, Balakr-Ishnan H (2001) Chord: a scalable content-addressable network. In: Proc Acm Sigcomm

  30. He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front Comput Sci 8:83–99

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Support for this work has been provided by Shandong University of Finance and Economics, Jinan, China. Furthermore, the author wishes to express his deep appreciation for the valuable time that was spent by the anonymous referees.

Funding

In this study, the Science and Technology Plan for Colleges and Universities in Shandong Province provided support (KJ2018BAN046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lida Zou.

Ethics declarations

Conflict of interest

There is no conflict of interest among the authors. Meanwhile, the authors declared that the work described was original research that had not previously been published and that the work was not being considered for publication elsewhere, in whole or in part.

Informed consent

A copy of this manuscript has been read by all authors, and they are willing to proceed with its publication.

Ethical approval

The study in this manuscript does not require ethical approval.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, L. DDCM: a decentralized density clustering and its results gathering approach. Neural Comput & Applic 35, 24743–24754 (2023). https://doi.org/10.1007/s00521-023-08392-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08392-5

Keywords

Navigation