Skip to main content
Log in

Parallel gravitational clustering based on grid partitioning for large-scale data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The gravitational clustering algorithm is a dynamic clustering model that achieves outstanding performance in uncovering the hidden clusters of a complex dataset with any shape, density and distribution. This algorithm is very suitable for mining irregular and unbalanced clusters from large-scale datasets with noise. However, the unbearable time overhead makes this algorithm ineffective to apply at large scales. Therefore, a parallel gravitational clustering algorithm based on grid partitioning (PGCGP) is developed in this paper. First, a grid partitioning strategy is designed to divide a large-scale dataset into multiple grids as evenly as possible. Second, a neighbourhood repair strategy is proposed to work with the gravitational clustering algorithm to accurately mine the clusters of a single grid. Finally, a border point alignment strategy is devised to determine whether to merge two small clusters located in different grids to discover the real clusters of the original large dataset by merging multiple grids. Extensive experiments on multiple artificial and real-world datasets verify that our PGCGP approach achieves good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Saxena A, Prasad M, Gupta A, et al. (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Article  Google Scholar 

  2. Boxiang Z, Shuliang W, Chuanlu L (2021) State: A clustering algorithm focusing on edges instead of centers. Chin J Electron 30(5):902–908

    Article  Google Scholar 

  3. Wang S, Li Q, Zhao C, et al. (2021) Extreme clustering–a clustering method via density extreme points. Inf Sci 542:24–39

    Article  MathSciNet  MATH  Google Scholar 

  4. Kumar H (2019) Clustering techniques: A review on some clustering algorithms. Emerging Trends and Applications in Cognitive Computing, pp 198–223

  5. Bae J, Helldin T, Riveiro M, et al. (2020) Interactive clustering: A comprehensive review. ACM Computing Surveys (CSUR) 53(1):1–39

    Article  Google Scholar 

  6. Jafarzadegan M, Safi-Esfahani F, Beheshti Z (2019) Combining hierarchical clustering approaches using the pca method. Expert Syst Appl 137:1–10

    Article  Google Scholar 

  7. Wang S, Wang D, Li C et al (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402

    Article  MathSciNet  Google Scholar 

  8. Khan K, Rehman SU, Aziz K et al (2014) Dbscan: Past, present and future. In: The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), IEEE, pp 232–238

  9. Chen L, Zhang J, Cai L, et al. (2017) Fast community detection based on distance dynamics. Tsinghua Sci Technol 22(6):564– 585

    Article  MATH  Google Scholar 

  10. Pang N, Zhang J, Zhang C, et al. (2018) Parallel hierarchical subspace clustering of categorical data. IEEE Trans Comput 68(4):542–555

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen L, Guo Q, Liu Z, et al. (2021) Enhanced synchronization-inspired clustering for high-dimensional data. Complex & Intelligent Systems 7(1):203–223

    Article  Google Scholar 

  12. Ianni M, Masciari E, Mazzeo GM, et al. (2020) Fast and effective big data exploration by clustering. Futur Gener Comput Syst 102:84–94

    Article  Google Scholar 

  13. Pandove D, Goel S, Rani R (2018) Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(2):1–68

    Article  Google Scholar 

  14. Lin WC, Tsai CF, Hu YH, et al. (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17– 26

    Article  Google Scholar 

  15. Wen L, Zhou K, Yang S, et al. (2018) Compression of smart meter big data: A survey. Renew Sust Energ Rev 91:59–69

    Article  Google Scholar 

  16. Dafir Z, Lamari Y, Slaoui SC (2021) A survey on parallel clustering algorithms for big data. Artif Intell Rev 54(4):2411–2443

    Article  Google Scholar 

  17. Shen Y, Pedrycz W, Chen Y et al (2019) Hyperplane division in fuzzy c-means: Clustering big data. IEEE Trans Fuzzy Syst 28(11):3032–3046

    Article  Google Scholar 

  18. Gomez J, Dasgupta D, Nasraoui O (2003) A new gravitational clustering algorithm. In: Proceedings of the 2003 SIAM international conference on data mining, SIAM, pp 83–94

  19. Binder P, Muma M, Zoubir AM (2018) Gravitational clustering: A simple, robust and adaptive approach for distributed networks. Signal Process 149:36–48

    Article  Google Scholar 

  20. Alswaitti M, Ishak MK, Isa NAM (2018) Optimized gravitational-based data clustering algorithm. Eng Appl Artif Intell 73:126– 148

    Article  Google Scholar 

  21. Li Q, Wang S, Zhao C, et al. (2021) Hibog: Improving the clustering accuracy by ameliorating dataset with gravitation. Inf Sci 550:41–56

    Article  MathSciNet  Google Scholar 

  22. Shi Y, Song Y, Zhang A (2005) A shrinking-based clustering approach for multidimensional data. IEEE Trans Knowl Data Eng 17(10):1389–1403

    Article  Google Scholar 

  23. Wong KC, Peng C, Li Y, et al. (2014) Herd clustering: A synergistic data clustering approach using collective intelligence. Appl Soft Comput 23:61–75

    Article  Google Scholar 

  24. Zhang J, Zhang X (2018) Gravitational clustering of cosmic relic neutrinos in the milky way. Nat Commun 9(1):1–7

    Google Scholar 

  25. Kim JH, Choi JH, Yoo KH, et al. (2019) Aa-dbscan: An approximate adaptive dbscan for finding clusters with varying densities. The Journal of Supercomputing 75(1):142–169

    Article  Google Scholar 

  26. Andrade G, Ramos G, Madeira D, et al. (2013) G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science 18:369–378

    Article  Google Scholar 

  27. Huo Z, Mei G, Casolla G, et al. (2020) Designing an efficient parallel spectral clustering algorithm on multi-core processors in julia. Journal of Parallel and Distributed Computing 138:211–221

    Article  Google Scholar 

  28. Shao J, Tan Y, Gao L, et al. (2019) Synchronization-based clustering on evolving data stream. Inf Sci 501:573–587

    Article  MathSciNet  Google Scholar 

  29. Ying W, Chung FL, Wang S (2013) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057

    Article  Google Scholar 

  30. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Article  Google Scholar 

  31. AL-Sharuee MT, Liu F, Pratama M (2021) Sentiment analysis: Dynamic and temporal clustering of product reviews. Appl Intell 51(1):51–70

    Article  Google Scholar 

  32. Mojarad M, Nejatian S, Parvin H, et al. (2019) A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters. Appl Intell 49(7):2567–2581

    Article  Google Scholar 

  33. Chen Y, Hu X, Fan W et al (2020) Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems 187:104,824

    Article  Google Scholar 

  34. Galán SF (2019) Comparative evaluation of region query strategies for dbscan clustering. Inf Sci 502:76–90

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos.62103143 and 61702180); the Hunan Provincial Natural Science Foundation of China (Nos.2020JJ5199 and 2021JJ40214); the National Defense Basic Research Program of China (JCKY2019403D006); the National Key Research and Development Program (No.2019YFE0105300); the Scientific Research Fund of Hunan Provincial Education Department (Nos.20C0786 and 20C0781); and the Hunan Province Science and Technology Project Funds (No.2018TP1036).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Chen.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest regarding the publication of the article.

Additional information

Availability of data and materials

The MAGIC, SHUTTLE, SKIN, and Poker Hand datasets are the public datasets, they are available in the UCI machine learning repository (http://archive.ics.uci.edu/ml/). The DS1, DS2, DS3, DS4 datasets are available on request from the corresponding author.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Fadong Chen, Zhaohua Liu, Mingyang Lv, Tingqin He and Shiwen Zhang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Chen, F., Liu, Z. et al. Parallel gravitational clustering based on grid partitioning for large-scale data. Appl Intell 53, 2506–2526 (2023). https://doi.org/10.1007/s10489-022-03661-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03661-7

Keywords

Navigation