Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data
Large-scale spectral clustering in high-dimensional space is among the most popular unsupervised problems. Existed sampling schemes have different limitations on high-dimensional data. This paper proposes an improved Nyström extension based spectral clustering algorithm with a designed sampling scheme for high-dimensional data. We first take insight into some existed sampling schemes. We illustrate their defects especially in high dimension scene. Furthermore we provide theoretical analysis on how the similarity between the sample set and non-sampling set influences the approximation error, and propose an improved sampling scheme, the minimum similarity sampling (MSS) for high-dimensional space clustering. Experiments on both synthetic and real datasets show that the proposed sampling scheme outperforms other algorithms when applied in Nyström based spectral clustering with higher accuracy, and lowers the time consumption for sampling.
KeywordsLarge-scale High dimensionality Spectral Clustering Nyström extension Sampling
Unable to display preview. Download preview PDF.
- 3.Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: AAAI (2011)Google Scholar
- 5.Huang, L., Yan, D., Taft, N., Jordan, M.I.: Spectral clustering with perturbed data. In: Advances in Neural Information Processing Systems, pp. 705–712 (2008)Google Scholar
- 6.Hunter, B., Strohmer, T.: Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements. arXiv preprint arXiv:1011.0997 (2010)Google Scholar
- 8.MeilPa, M., Shi, J.: Learning segmentation by random walks (2000)Google Scholar
- 9.Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849–856 (2002)Google Scholar
- 11.Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Preceedings of the Sixth International Language Resouces and Evaluation, LREC (2008)Google Scholar
- 13.Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)Google Scholar
- 14.Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)Google Scholar
- 15.Zhang, X., You, Q.: Clusterability analysis and incremental sampling for nyström extension based spectral clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 942–951. IEEE (2011)Google Scholar