Two-stage clustering algorithm based on evolution and propagation patterns

Li, Peng; Xie, Haibin

doi:10.1007/s10489-021-03016-8

Two-stage clustering algorithm based on evolution and propagation patterns

Published: 27 January 2022

Volume 52, pages 11555–11568, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

413 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

To solve the problem of current popular clustering algorithms needing to set the number of clusters and hyperparameters according to prior knowledge, we use the average nearest neighbour distance, a statistic that represents the characteristics of sample aggregation in the data space, and propose a two-stage clustering algorithm based on evolution and propagation patterns (EPC). In the evolution stage, the EPC algorithm obtains the initial clustering results and the number of clusters by evolving a small number of samples from random sampling in the data space in an incremental way. According to the nearest neighbour principle, the EPC propagates the cluster labels of the initial clustering results to the unlabelled samples in the propagation stage. Furthermore, the EPC algorithm uses a correction mechanism. It adopts Monte Carlo multiple simulation methods in the evolution stage to improve the stability of clustering results obtained by random sampling. Experiments on datasets and applications on image segmentation datasets show that the EPC algorithm is superior to the current popular clustering algorithm in performance. Finally, we conducted a systematic and comprehensive analysis of the EPC algorithm through ablation experiments, showing that the EPC algorithm has good robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle swarm optimization algorithm: an overview

Article 17 January 2017

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Notes

References

Ackerman M, Dasgupta S (2014) Incremental clustering: The case for extra clusters. In: Proceedings of the 27th international conference on neural information processing systems, NIPS’14, vol 1. MIT Press, Cambridge, pp 307–315
Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: A clustering algorithm for data streams. ACM J Exp Algorithmics 17(30). https://doi.org/10.1145/2133803.2184450
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE T Pattern Anal 33(5):898–916
Article Google Scholar
Bachem O, Lucic M, Krause A (2018) Scalable k-means clustering via lightweight coresets. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’18. https://doi.org/10.1145/3219819.3219973, pp 1119–1127
Ball GH, Hall DJ (1965) A novel method of data analysis and pattern classification: Isodata. Tech. rep. Stanford research inst Menlo Park CA
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, pp 25–71
Chen J, Yu PS (2021) A domain adaptive density clustering algorithm for data with varying density distribution. IEEE Trans Knowl Data Eng 33(6):2310–2321. https://doi.org/10.1109/TKDE.2019.2954133
Article Google Scholar
Chen X, Xu X, Huang JZ, Ye Y (2011) Tw-k-means: Automated two-level variable weighting clustering algorithm for multiview data. IEEE T Knowl Data En 25(4):932–944
Article Google Scholar
Chien IE, Pan C, Milenkovic O (2018) Query k-means clustering and the double dixie cup problem. In: Proceedings of the 32nd international conference on neural information processing systems, NIPS’18, Red Hook, NY, USA, pp 6650–6659
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: Spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’04. https://doi.org/10.1145/1014052.1014118, pp 551–556
Dinh T, Huynh VN (2020) k-pbc: an improved cluster center initialization for categorical data clustering. Appl Intell. https://doi.org/10.1007/s10489-020-01677-5
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 226–231
Ester M, Kriegel HP, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24rd international conference on very large data bases, VLDB’98. Morgan Kaufmann Publishers Inc., San Francisco, pp 323–333
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE T Emerg Top Com 2(3):267–279
Google Scholar
Guo X, Li S (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems, NIPS’18, pp 7849–7857
Hou J, Gao H, Li X (2016) Dsets-dbscan: A parameter-free clustering algorithm. IEEE T Image Process 25(7):3182–3193
Article MathSciNet Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article Google Scholar
Jabi M, Pedersoli M, Mitiche A, Ayed IB (2021) Deep clustering: on the link between discriminative models and k-means. IEEE Trans Pattern Anal Mach Intell 43(6):1887–1896. https://doi.org/10.1109/TPAMI.2019.2962683
Article Google Scholar
Jiawei H, Micheline K (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algoritm Second Ed 5(4):1–18
MATH Google Scholar
Li R, Yang X, Qin X, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl-Based Syst 184:104905.1–104905.8
Google Scholar
Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: Sixth international conference on data mining. IEEE, pp 362–371
Lin Y, Chen S (2021) A centroid auto-fused hierarchical fuzzy c-means clustering. IEEE Trans Fuzzy Syst 29(7):2006–2017. https://doi.org/10.1109/TFUZZ.2020.2991306
Article Google Scholar
Lu Y, Cheung YM, Tang YY (2021) Self-adaptive multiprototype-based competitive learning approach: a k-means type algorithm for imbalanced data clustering. IEEE Trans Cybern 51(3):1598–1612. https://doi.org/10.1109/TCYB.2019.2916196
Article Google Scholar
Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. Adv Neural Inf Process Syst 28:1063–1071
Google Scholar
Mojena R (1977) Hierarchical grouping methods and stopping rules: An evaluation. Comput J 20(4):359–363
Article Google Scholar
Mukhoty B, Gupta R, Lakshmanan K, Kumar M (2020) A parameter-free affinity based clustering. Appl Intell 50(12):4543–4556
Article Google Scholar
Nie F, Wang CL, Li X (2019) K-multiple-means: a multiple-means clustering method with specified k clusters. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD’19. https://doi.org/10.1145/3292500.3330846, pp 959–967
Nie F, Xue J, Wu D, Wang R, Li H, Li X (2021) Coordinate descent method for k-means. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/TPAMI.2021.3085739
Nock R, Canyasse R, Boreli R, Nielsen F (2016) K-variates++: More pluses in the k-means++. In: Proceedings of the 33rd international conference on international conference on machine learning, ICML’16, vol 48, pp 145–154
Ntelemis F, Jin Y, Thomas SA (2021) Image clustering using an augmented generative adversarial network and information maximization. IEEE Trans Neural Netw Learn Syst, 1–14. https://doi.org/10.1109/TNNLS.2021.3085125
Pérez-Suárez A, Martínez-Trinidad JF, Carrasco-Ochoa JA (2019) A review of conceptual clustering algorithms. Artif Intell Rev 52(2):1267–1296
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496
Article Google Scholar
Rutkowski L (2007) Clustering for data mining: A data recovery approach. Psychometrika 72 (1):109–110
Article Google Scholar
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Article Google Scholar
Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Article Google Scholar
Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: A review. In: International conference on computational science and its applications. Springer, pp 707–720
Song H, Lee JG, Han WS (2017) Pamae: Parallel k-medoids clustering with high accuracy and efficiency. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’17. https://doi.org/10.1145/3097983.3098098, pp 1087–1096
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. 400:525–526
Still S, Bialek W (2004) How many clusters? an information-theoretic perspective. Neural Comput 16(12):2483–2506
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(3):583–617
MathSciNet MATH Google Scholar
Strouse D, Schwab DJ (2019) The information bottleneck and geometric clustering. Neural Comput 31(3):596–612
Article MathSciNet Google Scholar
Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11): 2731–2744. https://doi.org/10.1109/TKDE.2014.2310215
Article Google Scholar
De la Torre F, Kanade T (2006) Discriminative cluster analysis. In: Proceedings of the 23rd international conference on Machine learning, ICML’06. https://doi.org/10.1145/1143844.1143875, pp 241–248
Viswanath P, Babu VS (2009) Rough-dbscan: a fast hybrid density based clustering method for large data sets. Pattern Recogn Lett 30(16):1477–1488
Article Google Scholar
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2 (2):165–193
Article MathSciNet Google Scholar
Xu J, Lange K (2019) Power k-means clustering. In: International conference on machine learning, PMLR, pp 6921–6931
Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy k-means clustering. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16, pp 2224–2230
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article Google Scholar
Young S, Arel I, Karnowski TP, Rose D (2010) A fast and stable incremental clustering algorithm. In: Proceedings of the 2010 seventh international conference on information technology: new generations, IEEE Computer Society, USA, ITNG’10. https://doi.org/10.1109/ITNG.2010.148, pp 204–209

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61825305.

Author information

Authors and Affiliations

College of Intelligence Science and Technology, National University of Defense Technology, Changsha, 410000, China
Peng Li & Haibin Xie

Authors

Peng Li
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibin Xie.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, P., Xie, H. Two-stage clustering algorithm based on evolution and propagation patterns. Appl Intell 52, 11555–11568 (2022). https://doi.org/10.1007/s10489-021-03016-8

Download citation

Accepted: 03 November 2021
Published: 27 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10489-021-03016-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage clustering algorithm based on evolution and propagation patterns

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-stage clustering algorithm based on evolution and propagation patterns

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation