An anchor-based spectral clustering method
- 20 Downloads
Spectral clustering is one of the most popular and important clustering methods in pattern recognition, machine learning, and data mining. However, its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph Laplacian with O(n3) time complexity. To address this problem, we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically, m (m ≪ n) anchor points are selected from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC, is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.
Key wordsClustering Spectral clustering Graph Laplacian Anchors
Unable to display preview. Download preview PDF.
- Boutsidis C, Gittens A, Kambadur P, 2015. Spectral clustering via the power method—provably. 32nd Int Conf on Machine Learning, p.40–48.Google Scholar
- Chang XJ, Nie FP, Ma ZG, et al., 2015. A convex formulation for spectral shrunk clustering. 29th AAAI Conf on Artificial Intelligence, p.2532–2538.Google Scholar
- Chen XL, Cai D, 2011. Large scale spectral clustering with landmark-based representation. 25th AAAI Conf on Artificial Intelligence, p.313–318.Google Scholar
- Delalleau O, Bengio Y, Le Roux N, 2005. Efficient nonparametric function induction in semi-supervised learning. 10th Int Workshop on Artificial Intelligence and Statistics, p.96–103.Google Scholar
- Lin F, Cohen WW, 2010. Power iteration clustering. 27th Int Conf on Machine Learning, p.655–662.Google Scholar
- Liu JL, Wang C, Danilevsky M, et al., 2013. Large-scale spectral clustering on graphs. 23rd Int Joint Conf on Artificial Intelligence, p.1486–1492.Google Scholar
- Liu W, He JF, Chang SF, 2010. Large graph construction for scalable semi-supervised learning. 27th Int Conf on Machine Learning, p.679–686.Google Scholar
- Ng AY, Jordan MI, Weiss Y, et al., 2002. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, p.849–856.Google Scholar
- Tian F, Gao B, Cui Q, et al., 2014. Learning deep representations for graph clustering. 28th AAAI Conf on Artificial Intelligence, p.1293–1299.Google Scholar
- Xia RK, Pan Y, Du L, et al., 2014. Robust multi-view spectral clustering via low-rank and sparse decomposition. 28th AAAI Conf on Artificial Intelligence, p.2149–2155.Google Scholar
- Yang Y, Shen HT, Nie FP, et al., 2011. Nonnegative spectral clustering with discriminative regularization. 25th AAAI Conf on Artificial Intelligence, p.555–560.Google Scholar