Abstract
Spectral clustering algorithm has become more popular in data clustering problems in recent years, due to the idea of optimally dividing the graph to solve the data clustering problems. However, the performance of the spectral clustering algorithm is affected by the quality of the similarity matrix. In addition, the traditional spectral clustering algorithm is unstable because it uses the K-means algorithm in the final clustering stage. Therefore, we propose a spectral clustering algorithm based on fast diffusion search for natural neighbor and affinity propagation (FDAP-SC). The algorithm obtains neighbor information more efficiently by changing the way of determining the number of neighbors. And it uses the shared nearest neighbors and the shared reverse neighbors between two points to construct the similarity matrix. Moreover, the algorithm regards all data points as nodes in the network and then calculates the clustering center of each sample through message passing between nodes. In this paper, we first experimentally on real datasets to verify that our proposed method for determining the number of neighbors outperforms the traditional natural nearest neighbor algorithm. We then demonstrate on synthetic datasets that FDAP-SC can handle complex shape datasets well. Finally, we compare FDAP-SC with several existing classical and novel algorithms on real datasets and Olivetti face datasets, proving the superiority and stability of FDAP-SC algorithm performance. Among the seven real datasets, FDAP-SC has the best performance on five datasets, and in the Olivetti face datasets, FDAP-SC achieves more than 87.5% accuracy.
Similar content being viewed by others
Change history
24 August 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11227-022-04743-6
References
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297. Oakland, CA, USA
Arthur D, Vassilvitskii S (2006) k-means++: The advantages of careful seeding. Technical report, Stanford
Ester M, Kriegel H-P, Sander J, Xu X, et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, pp 226–231
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. ACM SIGMOD Rec 27(2):73–84
Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Ng Andrew Y, Jordan Michael I, Weiss Y(2002) On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: advances in neural information processing systems (NIPS)
Liu X-Y, Li J-W, Hong Yu, You Q-Z, Lin H-F (2011) Adaptive spectral clustering based on shared nearest neighbors. J Chinese Comput Syst 32(9):1876–1880
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36
Yuan CY, Zhang LS (2020) Spectral averagely-dense clustering based on dynamic shared nearest neighbors. In: 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), pp 138–144. IEEE
Fix E, Hodges JL (1989) Discriminatory analysis. nonparametric discrimination: consistency properties. International Statistical Review/Revue Internationale de Statistique 57(3):238–247
Zou Xian L, Zhu Qing S, Yang Rui L(2011) Natural nearest neighbor for isomap algorithm without free-parameter. In: Advanced Materials Research, pp 994–998. Trans Tech Publ
Cheng D, Zhu Q, Huang J, Yang L, Quanwang W (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123:238–253
Barlow HB (1989) Unsupervised learning. Neural Comput 1(3):295–311
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305
Fan N, Pardalos PM (2012) Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs. J Comb Optim 23(2):224–251
Alpert Charles J, Yao S-Z (1995) Spectral partitioning: The more eigenvectors, the better. In: Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference, pp 195–200
Chung Fan RK, Graham FC (1997) Spectral graph theory. Number 92. American Mathematical Soc
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Meimei G (2019) Research on spectral clustering algorithm based on nearest neighbor graph analysis. PhD thesis, Shaanxi Normal University
Güzel Kadir, Kurşun Olcay (2015) Improving spectral clustering using path-based connectivity. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU), pp 2110–2113. IEEE
Cheng D, Zhu Q, Huang J, Quanwang W, Yang L (2019) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387
Cheng D, Huang J, Zhang S, Zhang X, Luo X (2021) A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE transactions on systems, man, and cybernetics: systems, (2021)
Wang L, Ding S, Wang Y, Ding L (2021) A robust spectral clustering algorithm based on grid-partition and decision-graph. Int J Mach Learn Cybern 12(5):1243–1254
Wang Y, Ding S, Wang L, Ding L (2021) An improved density-based adaptive p-spectral clustering algorithm. Int J Mach Learn Cybern 12(6):1571–1582
Wang L, Ding S, Jia H (2019) An improvement of spectral clustering via message passing and density sensitive similarity. IEEE Access 7:101054–101062
Givoni I, Frey B (2009) Semi-supervised affinity propagation with instance-level constraints. In: Artificial intelligence and statistics, pp 161–168. PMLR
Jia H, Wang L, Song H, Mao Q, Ding S (2018) A k-ap clustering algorithm basedon manifold similarity measure. In: International Conference on Intelligent Information Processing, pp 20–29. Springer
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining, volume 8
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision, pp 138–142. IEEE
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant No. 61972179), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2020A1515011476), Guangdong Basic and Applied Basic Research Foundation(2021B1515120048).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: the headers in Table 3 were interchanged.
Appendix: Notation
Appendix: Notation
See Table 9.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Q., Li, Z., Han, G. et al. An improvement of spectral clustering algorithm based on fast diffusion search for natural neighbor and affinity propagation. J Supercomput 78, 14597–14625 (2022). https://doi.org/10.1007/s11227-022-04456-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04456-w