Skip to main content
Log in

Two-stage clustering algorithm based on evolution and propagation patterns

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

To solve the problem of current popular clustering algorithms needing to set the number of clusters and hyperparameters according to prior knowledge, we use the average nearest neighbour distance, a statistic that represents the characteristics of sample aggregation in the data space, and propose a two-stage clustering algorithm based on evolution and propagation patterns (EPC). In the evolution stage, the EPC algorithm obtains the initial clustering results and the number of clusters by evolving a small number of samples from random sampling in the data space in an incremental way. According to the nearest neighbour principle, the EPC propagates the cluster labels of the initial clustering results to the unlabelled samples in the propagation stage. Furthermore, the EPC algorithm uses a correction mechanism. It adopts Monte Carlo multiple simulation methods in the evolution stage to improve the stability of clustering results obtained by random sampling. Experiments on datasets and applications on image segmentation datasets show that the EPC algorithm is superior to the current popular clustering algorithm in performance. Finally, we conducted a systematic and comprehensive analysis of the EPC algorithm through ablation experiments, showing that the EPC algorithm has good robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/CHLWR/KDD2019_K-Multiple-Means

  2. https://github.com/grcai/LGD

  3. https://github.com/amjadseyedi/DPC-DLP

References

  1. Ackerman M, Dasgupta S (2014) Incremental clustering: The case for extra clusters. In: Proceedings of the 27th international conference on neural information processing systems, NIPS’14, vol 1. MIT Press, Cambridge, pp 307–315

  2. Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: A clustering algorithm for data streams. ACM J Exp Algorithmics 17(30). https://doi.org/10.1145/2133803.2184450

  3. Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE T Pattern Anal 33(5):898–916

    Article  Google Scholar 

  4. Bachem O, Lucic M, Krause A (2018) Scalable k-means clustering via lightweight coresets. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’18. https://doi.org/10.1145/3219819.3219973, pp 1119–1127

  5. Ball GH, Hall DJ (1965) A novel method of data analysis and pattern classification: Isodata. Tech. rep. Stanford research inst Menlo Park CA

  6. Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, pp 25–71

  7. Chen J, Yu PS (2021) A domain adaptive density clustering algorithm for data with varying density distribution. IEEE Trans Knowl Data Eng 33(6):2310–2321. https://doi.org/10.1109/TKDE.2019.2954133

    Article  Google Scholar 

  8. Chen X, Xu X, Huang JZ, Ye Y (2011) Tw-k-means: Automated two-level variable weighting clustering algorithm for multiview data. IEEE T Knowl Data En 25(4):932–944

    Article  Google Scholar 

  9. Chien IE, Pan C, Milenkovic O (2018) Query k-means clustering and the double dixie cup problem. In: Proceedings of the 32nd international conference on neural information processing systems, NIPS’18, Red Hook, NY, USA, pp 6650–6659

  10. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: Spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’04. https://doi.org/10.1145/1014052.1014118, pp 551–556

  11. Dinh T, Huynh VN (2020) k-pbc: an improved cluster center initialization for categorical data clustering. Appl Intell. https://doi.org/10.1007/s10489-020-01677-5

  12. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  13. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 226–231

  14. Ester M, Kriegel HP, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24rd international conference on very large data bases, VLDB’98. Morgan Kaufmann Publishers Inc., San Francisco, pp 323–333

  15. Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE T Emerg Top Com 2(3):267–279

    Google Scholar 

  16. Guo X, Li S (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems, NIPS’18, pp 7849–7857

  17. Hou J, Gao H, Li X (2016) Dsets-dbscan: A parameter-free clustering algorithm. IEEE T Image Process 25(7):3182–3193

    Article  MathSciNet  Google Scholar 

  18. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  19. Jabi M, Pedersoli M, Mitiche A, Ayed IB (2021) Deep clustering: on the link between discriminative models and k-means. IEEE Trans Pattern Anal Mach Intell 43(6):1887–1896. https://doi.org/10.1109/TPAMI.2019.2962683

    Article  Google Scholar 

  20. Jiawei H, Micheline K (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algoritm Second Ed 5(4):1–18

    MATH  Google Scholar 

  21. Li R, Yang X, Qin X, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl-Based Syst 184:104905.1–104905.8

    Google Scholar 

  22. Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: Sixth international conference on data mining. IEEE, pp 362–371

  23. Lin Y, Chen S (2021) A centroid auto-fused hierarchical fuzzy c-means clustering. IEEE Trans Fuzzy Syst 29(7):2006–2017. https://doi.org/10.1109/TFUZZ.2020.2991306

    Article  Google Scholar 

  24. Lu Y, Cheung YM, Tang YY (2021) Self-adaptive multiprototype-based competitive learning approach: a k-means type algorithm for imbalanced data clustering. IEEE Trans Cybern 51(3):1598–1612. https://doi.org/10.1109/TCYB.2019.2916196

    Article  Google Scholar 

  25. Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. Adv Neural Inf Process Syst 28:1063–1071

    Google Scholar 

  26. Mojena R (1977) Hierarchical grouping methods and stopping rules: An evaluation. Comput J 20(4):359–363

    Article  Google Scholar 

  27. Mukhoty B, Gupta R, Lakshmanan K, Kumar M (2020) A parameter-free affinity based clustering. Appl Intell 50(12):4543–4556

    Article  Google Scholar 

  28. Nie F, Wang CL, Li X (2019) K-multiple-means: a multiple-means clustering method with specified k clusters. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD’19. https://doi.org/10.1145/3292500.3330846, pp 959–967

  29. Nie F, Xue J, Wu D, Wang R, Li H, Li X (2021) Coordinate descent method for k-means. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/TPAMI.2021.3085739

  30. Nock R, Canyasse R, Boreli R, Nielsen F (2016) K-variates++: More pluses in the k-means++. In: Proceedings of the 33rd international conference on international conference on machine learning, ICML’16, vol 48, pp 145–154

  31. Ntelemis F, Jin Y, Thomas SA (2021) Image clustering using an augmented generative adversarial network and information maximization. IEEE Trans Neural Netw Learn Syst, 1–14. https://doi.org/10.1109/TNNLS.2021.3085125

  32. Pérez-Suárez A, Martínez-Trinidad JF, Carrasco-Ochoa JA (2019) A review of conceptual clustering algorithms. Artif Intell Rev 52(2):1267–1296

    Article  Google Scholar 

  33. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496

    Article  Google Scholar 

  34. Rutkowski L (2007) Clustering for data mining: A data recovery approach. Psychometrika 72 (1):109–110

    Article  Google Scholar 

  35. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Article  Google Scholar 

  36. Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328

    Article  Google Scholar 

  37. Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: A review. In: International conference on computational science and its applications. Springer, pp 707–720

  38. Song H, Lee JG, Han WS (2017) Pamae: Parallel k-medoids clustering with high accuracy and efficiency. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’17. https://doi.org/10.1145/3097983.3098098, pp 1087–1096

  39. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. 400:525–526

  40. Still S, Bialek W (2004) How many clusters? an information-theoretic perspective. Neural Comput 16(12):2483–2506

    Article  Google Scholar 

  41. Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(3):583–617

    MathSciNet  MATH  Google Scholar 

  42. Strouse D, Schwab DJ (2019) The information bottleneck and geometric clustering. Neural Comput 31(3):596–612

    Article  MathSciNet  Google Scholar 

  43. Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11): 2731–2744. https://doi.org/10.1109/TKDE.2014.2310215

    Article  Google Scholar 

  44. De la Torre F, Kanade T (2006) Discriminative cluster analysis. In: Proceedings of the 23rd international conference on Machine learning, ICML’06. https://doi.org/10.1145/1143844.1143875, pp 241–248

  45. Viswanath P, Babu VS (2009) Rough-dbscan: a fast hybrid density based clustering method for large data sets. Pattern Recogn Lett 30(16):1477–1488

    Article  Google Scholar 

  46. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2 (2):165–193

    Article  MathSciNet  Google Scholar 

  47. Xu J, Lange K (2019) Power k-means clustering. In: International conference on machine learning, PMLR, pp 6921–6931

  48. Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy k-means clustering. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16, pp 2224–2230

  49. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987

    Article  Google Scholar 

  50. Young S, Arel I, Karnowski TP, Rose D (2010) A fast and stable incremental clustering algorithm. In: Proceedings of the 2010 seventh international conference on information technology: new generations, IEEE Computer Society, USA, ITNG’10. https://doi.org/10.1109/ITNG.2010.148, pp 204–209

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61825305.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibin Xie.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, P., Xie, H. Two-stage clustering algorithm based on evolution and propagation patterns. Appl Intell 52, 11555–11568 (2022). https://doi.org/10.1007/s10489-021-03016-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03016-8

Keywords

Navigation