Deflation-based power iteration clustering

Abstract

Spectral clustering (SC) is currently one of the most popular clustering techniques because of its advantages over conventional approaches such as K-means and hierarchical clustering. However, SC requires the use of computing eigenvectors, making it time consuming. To overcome this limitation, Lin and Cohen proposed the power iteration clustering (PIC) technique (Lin and Cohen in Proceedings of the 27th International Conference on Machine Learning, pp. 655–662, 2010), which is a simple and fast version of SC. Instead of finding the eigenvectors, PIC finds only one pseudo-eigenvector, which is a linear combination of the eigenvectors in linear time. However, in certain critical situations, using only one pseudo-eigenvector is not enough for clustering because of the inter-class collision problem. In this paper, we propose a novel method based on the deflation technique to compute multiple orthogonal pseudo-eigenvectors (orthogonality is used to avoid redundancy). Our method is more accurate than PIC but has the same computational complexity. Experiments on synthetic and real datasets demonstrate the improvement of our approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. 1.

    Cai D Mnist dataset. URL http://www.cad.zju.edu.cn/home/dengcai/Data/MNIST/10kTrain.mat

  2. 2.

    Cai D Tdt2 dataset. URL http://www.cad.zju.edu.cn/home/dengcai/Data/TDT2/TDT2.mat

  3. 3.

    Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence, San Francisco, California, pp 313–318

    Google Scholar 

  4. 4.

    Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612

    Article  Google Scholar 

  5. 5.

    Drineas P, Mahoney MW (2005) On the Nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Durrett R (2010) Some features of the spread of epidemics and information on a random graph. Proc Natl Acad Sci 107(10):4491–4498

    Article  Google Scholar 

  7. 7.

    Erdős P, Rényi A (1960) On the evolution of random graphs. Magy Tud Akad Mat Kut Intéz Közl 5:17–61

    Google Scholar 

  8. 8.

    Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  9. 9.

    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514

    Google Scholar 

  10. 10.

    He J, Tong H, Carbonell J (2010) Rare category characterization. In: Proceedings of the 10th IEEE international conference on data mining, Sydney, Australia, pp 226–235

    Google Scholar 

  11. 11.

    Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 389–396

    Google Scholar 

  12. 12.

    Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  13. 13.

    Keshet J, Bengio S (2009) Automatic speech and speaker recognition: large margin and kernel methods. Wiley Online Library

    Google Scholar 

  14. 14.

    Lehoucq RB, Sorensen DC (1996) Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM J Matrix Anal Appl 17(4):789–821

    MathSciNet  MATH  Article  Google Scholar 

  15. 15.

    Lewis DD Reuters-21578 dataset. URL http://www.daviddlewis.com/resources/testcollections/reuters21578/

  16. 16.

    Lin F, Cohen WW (2010) Power iteration clustering. In: Proceedings of the 27th international conference on machine learning, Haifa, Israel, pp 655–662

    Google Scholar 

  17. 17.

    Lin F, Cohen WW (2010) A very fast method for clustering big text datasets. In: Proceedings of the 19th European conference on artificial intelligence, Lisbon, Portugal, pp 303–308

    Google Scholar 

  18. 18.

    Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    MathSciNet  Article  Google Scholar 

  19. 19.

    Mackey L (2008) Deflation methods for sparse pca. Adv Neural Inf Process Syst 21:1017–1024

    Google Scholar 

  20. 20.

    Mavroeidis D (2010) Accelerating spectral clustering with partial supervision. Data Min Knowl Discov 21(2):241–258

    MathSciNet  Article  Google Scholar 

  21. 21.

    Mishra N, Schreiber R, Stanton I, Tarjan RE (2007) Clustering social networks. In: Proceedings of the 5th international conference on algorithms and models for the web-graph, San Diego, CA, pp 56–67

    Google Scholar 

  22. 22.

    Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856

    Google Scholar 

  23. 23.

    Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172

    Article  Google Scholar 

  24. 24.

    Peña JM, Lozano JA, Larrañaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognit Lett 20(10):1027–1040

    Article  Google Scholar 

  25. 25.

    Peng W, Li T (2011) On the equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis. Appl Intell 35(2):285–295

    MathSciNet  MATH  Article  Google Scholar 

  26. 26.

    Rennie J 20 newsgroups. URL http://qwone.com/~jason/20Newsgroups/

  27. 27.

    Saha S, Bandyopadhyay S (2011) Automatic MR brain image segmentation using a multiseed based multiobjective clustering approach. Appl Intell 35(3):411–427

    Article  Google Scholar 

  28. 28.

    Shang F, Jiao LC, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recognit 45(1):474–486

    Article  Google Scholar 

  29. 29.

    Sheffield Face database. URL http://www.sheffield.ac.uk/eee/research/iel/research/face

  30. 30.

    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  31. 31.

    Smola A, Schölkopf B Datasets for benchmarks and applications. URL http://www.kernel-machines.org/data/

  32. 32.

    Taşdemir K (2012) Vector quantization based approximate spectral clustering of large datasets. Pattern Recognit 45(8):3034–3044

    Article  Google Scholar 

  33. 33.

    Tung F, Wong A, Clausi DA (2010) Enabling scalable spectral clustering for image segmentation. Pattern Recognit 43(12):4069–4076

    MATH  Article  Google Scholar 

  34. 34.

    Wu S, Chow TWS (2004) Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recognit 37(2):175–188

    MATH  Article  Google Scholar 

  35. 35.

    Wu M, Schölkopf B (2006) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529–1536

    Google Scholar 

  36. 36.

    Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 907–916

    Google Scholar 

  37. 37.

    Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608

    Google Scholar 

  38. 38.

    Zhang K, Kwok JT (2009) Density-weighted Nyström method for computing large kernel eigensystems. Neural Comput 21(1):121–146

    MathSciNet  MATH  Article  Google Scholar 

  39. 39.

    Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, pp 1232–1239

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0013689).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

The, A.P., Thang, N.D., Vinh, L.T. et al. Deflation-based power iteration clustering. Appl Intell 39, 367–385 (2013). https://doi.org/10.1007/s10489-012-0418-0

Download citation

Keywords

  • Spectral clustering
  • Deflation
  • Power iteration