Soft Computing

, Volume 21, Issue 19, pp 5815–5827 | Cite as

A Nyström spectral clustering algorithm based on probability incremental sampling

Methodologies and Application

Abstract

Spectral clustering will map the data points of the original space into a low-dimensional eigen-space to make them linearly separable, so it is able to process the data with complex structures. However, spectral clustering needs to store the entire similarity matrix and requires eigen-decomposition. Both procedures will consume a lot of time and space resources, limiting the application of spectral clustering algorithm in large-scale data environment. To reduce the complexity of spectral clustering algorithm, we may use the Nyström extension technique to calculate the approximate eigenvectors by sampling a few of data points. This method sacrifices the clustering accuracy in exchange for the improvement of the algorithm efficiency. To select more representative sample points to reflect the distribution of data sets much better, this paper designs a dynamic incremental sampling method used for the Nyström spectral clustering, in which the data points are sampled according to different probability distributions and we theoretically prove that the increase of sampling times can effectively decrease the sampling error. The feasibility and effectiveness of the proposed algorithm are analyzed by the experiments on UCI machine learning data sets.

Keywords

Spectral clustering Eigen-decomposition Nyström method Incremental sampling 

References

  1. Bai XD, Cao ZG, Wang Y et al (2014) Image segmentation using modified SLIC and Nyström based spectral clustering. Optik-Int J Light Electron Opt 125(16):4302–4307CrossRefGoogle Scholar
  2. Baker CG, Gallivan KA, Dooren PV (2012) Low-rank incremental methods for computing dominant singular subspaces. Linear Algebra Appl 436(8):2866–2888MathSciNetCrossRefMATHGoogle Scholar
  3. Belabbas M, Patrick JW (2009) Spectral methods in machine learning and new strategies for very large datasets. Proc Natl Acad Sci USA 51(6):369–374CrossRefGoogle Scholar
  4. Cao J, Chen P, Dai Q et al (2014) Local information-based fast approximate spectral clustering. Pattern Recognit Lett 38(3):63–69CrossRefGoogle Scholar
  5. Chen WY, Song Y, Bai H et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRefGoogle Scholar
  6. Dhanjal C, Gaudel R (2014) Efficient eigen-updating for spectral graph clustering. Neurocomputing 131(7):440–452CrossRefGoogle Scholar
  7. Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225Google Scholar
  8. Frederix K, Barel MV (2013) Sparse spectral clustering method based on the incomplete Cholesky decomposition. J Comput Appl Math 237(1):145–161MathSciNetCrossRefMATHGoogle Scholar
  9. Frieze A, Kannan R, Vempala S (2004) Fast Monte Carlo algorithms for finding low-rank approximations. J ACM 51:1025–1041MathSciNetCrossRefMATHGoogle Scholar
  10. García JFG, Venegas-Andraca SE (2015) Region-based approach for the spectral clustering Nyström approximation with an application to burn depth assessment. Mach Vis Appl 26(2–3):353–368CrossRefGoogle Scholar
  11. Hansen TJ, Mahoney MW (2014) Semi-supervised eigenvectors for large-scale locally-biased learning. J Mach Learn Res 15:3691–3734MathSciNetMATHGoogle Scholar
  12. Jia HJ, Ding SF, Xu XZ et al (2014) The latest research progress on spectral clustering. Neural Comput Appl 24(7–8):1477–1486Google Scholar
  13. Jia HJ, Ding SF, Du MJ (2015) Self-tuning p-spectral clustering based on shared nearest neighbors. Cognit Comput 7(5):622–632Google Scholar
  14. Jiang J, Yan X, Yu Z et al (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Int J Mach Learn Cybernet 6(2):197–204Google Scholar
  15. Ma W, Jiao L, Gong M et al (2014) Image change detection based on an improved rough fuzzy c-means clustering algorithm. Int J Mach Learn Cybernet 5(3):369–377CrossRefGoogle Scholar
  16. Ma J, Tian D, Gong M et al (2014) Fuzzy clustering with non-local information for image segmentation. Int J Mach Learn Cybernet 5(6):845–859CrossRefGoogle Scholar
  17. Nawaz W, Khan KU, Lee YK et al (2015) Intra graph clustering using collaborative similarity measure. Distrib Parallel Databases 33:1–21CrossRefGoogle Scholar
  18. Schuetter J, Shi T (2012) Multiple sample data spectroscopic clustering of large datasets using Nyström extension. J Comput Graph Stat 21(2):338–360CrossRefGoogle Scholar
  19. Semertzidis T, Rafailidis D, Strintzis MG et al (2015) Large-scale spectral clustering based on pairwise constraints. Inf Process Manag 51(5):616–624Google Scholar
  20. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  21. Wang L, Bezdek JC, Leckie C et al (2008) Selective sampling for approximate clustering of very large data sets. Int J Intell Syst 23(3):313–331CrossRefMATHGoogle Scholar
  22. Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. Adv Neural Inf Process Syst 13:682–688Google Scholar
  23. Zhang K, Kwok JT (2010) Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Trans Neural Netw 21(10):1576–1587CrossRefGoogle Scholar
  24. Zhang X, You Q (2011) Clusterability analysis and incremental sampling for nyström extension based spectral clustering. In: IEEE 11th international conference on data mining (ICDM) 2011:942–951Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  2. 2.Jiangsu Key Laboratory of Mine Mechanical and Electrical EquipmentChina University of Mining and TechnologyXuzhouChina

Personalised recommendations