Abstract
Spectral clustering (SC) is currently one of the most popular clustering techniques because of its advantages over conventional approaches such as K-means and hierarchical clustering. However, SC requires the use of computing eigenvectors, making it time consuming. To overcome this limitation, Lin and Cohen proposed the power iteration clustering (PIC) technique (Lin and Cohen in Proceedings of the 27th International Conference on Machine Learning, pp. 655–662, 2010), which is a simple and fast version of SC. Instead of finding the eigenvectors, PIC finds only one pseudo-eigenvector, which is a linear combination of the eigenvectors in linear time. However, in certain critical situations, using only one pseudo-eigenvector is not enough for clustering because of the inter-class collision problem. In this paper, we propose a novel method based on the deflation technique to compute multiple orthogonal pseudo-eigenvectors (orthogonality is used to avoid redundancy). Our method is more accurate than PIC but has the same computational complexity. Experiments on synthetic and real datasets demonstrate the improvement of our approach.
Similar content being viewed by others
References
Cai D Mnist dataset. URL http://www.cad.zju.edu.cn/home/dengcai/Data/MNIST/10kTrain.mat
Cai D Tdt2 dataset. URL http://www.cad.zju.edu.cn/home/dengcai/Data/TDT2/TDT2.mat
Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence, San Francisco, California, pp 313–318
Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612
Drineas P, Mahoney MW (2005) On the Nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175
Durrett R (2010) Some features of the spread of epidemics and information on a random graph. Proc Natl Acad Sci 107(10):4491–4498
Erdős P, Rényi A (1960) On the evolution of random graphs. Magy Tud Akad Mat Kut Intéz Közl 5:17–61
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514
He J, Tong H, Carbonell J (2010) Rare category characterization. In: Proceedings of the 10th IEEE international conference on data mining, Sydney, Australia, pp 226–235
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 389–396
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Keshet J, Bengio S (2009) Automatic speech and speaker recognition: large margin and kernel methods. Wiley Online Library
Lehoucq RB, Sorensen DC (1996) Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM J Matrix Anal Appl 17(4):789–821
Lewis DD Reuters-21578 dataset. URL http://www.daviddlewis.com/resources/testcollections/reuters21578/
Lin F, Cohen WW (2010) Power iteration clustering. In: Proceedings of the 27th international conference on machine learning, Haifa, Israel, pp 655–662
Lin F, Cohen WW (2010) A very fast method for clustering big text datasets. In: Proceedings of the 19th European conference on artificial intelligence, Lisbon, Portugal, pp 303–308
Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Mackey L (2008) Deflation methods for sparse pca. Adv Neural Inf Process Syst 21:1017–1024
Mavroeidis D (2010) Accelerating spectral clustering with partial supervision. Data Min Knowl Discov 21(2):241–258
Mishra N, Schreiber R, Stanton I, Tarjan RE (2007) Clustering social networks. In: Proceedings of the 5th international conference on algorithms and models for the web-graph, San Diego, CA, pp 56–67
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172
Peña JM, Lozano JA, Larrañaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognit Lett 20(10):1027–1040
Peng W, Li T (2011) On the equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis. Appl Intell 35(2):285–295
Rennie J 20 newsgroups. URL http://qwone.com/~jason/20Newsgroups/
Saha S, Bandyopadhyay S (2011) Automatic MR brain image segmentation using a multiseed based multiobjective clustering approach. Appl Intell 35(3):411–427
Shang F, Jiao LC, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recognit 45(1):474–486
Sheffield Face database. URL http://www.sheffield.ac.uk/eee/research/iel/research/face
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Smola A, Schölkopf B Datasets for benchmarks and applications. URL http://www.kernel-machines.org/data/
Taşdemir K (2012) Vector quantization based approximate spectral clustering of large datasets. Pattern Recognit 45(8):3034–3044
Tung F, Wong A, Clausi DA (2010) Enabling scalable spectral clustering for image segmentation. Pattern Recognit 43(12):4069–4076
Wu S, Chow TWS (2004) Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recognit 37(2):175–188
Wu M, Schölkopf B (2006) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529–1536
Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 907–916
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608
Zhang K, Kwok JT (2009) Density-weighted Nyström method for computing large kernel eigensystems. Neural Comput 21(1):121–146
Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, pp 1232–1239
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0013689).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
The, A.P., Thang, N.D., Vinh, L.T. et al. Deflation-based power iteration clustering. Appl Intell 39, 367–385 (2013). https://doi.org/10.1007/s10489-012-0418-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0418-0