Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data

  • Zhicheng Zeng
  • Ming Zhu
  • Hong Yu
  • Honglian Ma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8482)


Large-scale spectral clustering in high-dimensional space is among the most popular unsupervised problems. Existed sampling schemes have different limitations on high-dimensional data. This paper proposes an improved Nyström extension based spectral clustering algorithm with a designed sampling scheme for high-dimensional data. We first take insight into some existed sampling schemes. We illustrate their defects especially in high dimension scene. Furthermore we provide theoretical analysis on how the similarity between the sample set and non-sampling set influences the approximation error, and propose an improved sampling scheme, the minimum similarity sampling (MSS) for high-dimensional space clustering. Experiments on both synthetic and real datasets show that the proposed sampling scheme outperforms other algorithms when applied in Nyström based spectral clustering with higher accuracy, and lowers the time consumption for sampling.


Large-scale High dimensionality Spectral Clustering Nyström extension Sampling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Belabbas, M.A., Wolfe, P.J.: Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences 106(2), 369–374 (2009)CrossRefGoogle Scholar
  2. 2.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: AAAI (2011)Google Scholar
  4. 4.
    Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the nystrom method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)CrossRefGoogle Scholar
  5. 5.
    Huang, L., Yan, D., Taft, N., Jordan, M.I.: Spectral clustering with perturbed data. In: Advances in Neural Information Processing Systems, pp. 705–712 (2008)Google Scholar
  6. 6.
    Hunter, B., Strohmer, T.: Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements. arXiv preprint arXiv:1011.0997 (2010)Google Scholar
  7. 7.
    Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51(3), 497–515 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    MeilPa, M., Shi, J.: Learning segmentation by random walks (2000)Google Scholar
  9. 9.
    Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849–856 (2002)Google Scholar
  10. 10.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  11. 11.
    Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Preceedings of the Sixth International Language Resouces and Evaluation, LREC (2008)Google Scholar
  12. 12.
    Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)Google Scholar
  14. 14.
    Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)Google Scholar
  15. 15.
    Zhang, X., You, Q.: Clusterability analysis and incremental sampling for nyström extension based spectral clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 942–951. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zhicheng Zeng
    • 1
  • Ming Zhu
    • 1
  • Hong Yu
    • 1
  • Honglian Ma
    • 1
  1. 1.School of SoftwareDalian University of TechnologyDalianChina

Personalised recommendations