Advertisement

An anchor-based spectral clustering method

  • Qin Zhang
  • Guo-qiang ZhongEmail author
  • Jun-yu Dong
Article
  • 20 Downloads

Abstract

Spectral clustering is one of the most popular and important clustering methods in pattern recognition, machine learning, and data mining. However, its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph Laplacian with O(n3) time complexity. To address this problem, we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically, m (mn) anchor points are selected from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC, is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.

Key words

Clustering Spectral clustering Graph Laplacian Anchors 

CLC number

TP311 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arthur D, Vassilvitskii S, 2007. K-means++: the advantages of careful seeding. 18th Annual ACM-SIAM Symp on Discrete Algorithms, p.1027–1035. https://doi.org/10.1145/1283383.1283494 Google Scholar
  2. Boutsidis C, Gittens A, Kambadur P, 2015. Spectral clustering via the power method—provably. 32nd Int Conf on Machine Learning, p.40–48.Google Scholar
  3. Chang XJ, Nie FP, Ma ZG, et al., 2015. A convex formulation for spectral shrunk clustering. 29th AAAI Conf on Artificial Intelligence, p.2532–2538.Google Scholar
  4. Chen WY, Song YQ, Bai HJ, et al., 2011. Parallel spectral clustering in distributed systems. IEEE Trans Patt Anal Mach Intell, 33(3):568–586. https://doi.org/10.1109/TPAMI.2010.88 CrossRefGoogle Scholar
  5. Chen XL, Cai D, 2011. Large scale spectral clustering with landmark-based representation. 25th AAAI Conf on Artificial Intelligence, p.313–318.Google Scholar
  6. Davies DL, Bouldin DW, 1979. A cluster separation measure. IEEE Trans Patt Anal Mach Intell, 1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909 CrossRefGoogle Scholar
  7. Delalleau O, Bengio Y, Le Roux N, 2005. Efficient nonparametric function induction in semi-supervised learning. 10th Int Workshop on Artificial Intelligence and Statistics, p.96–103.Google Scholar
  8. Demšar J, 2006. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 7:1–30.MathSciNetzbMATHGoogle Scholar
  9. Fowlkes C, Belongie S, Chung F, et al., 2004. Spectral grouping using the Nyström method. IEEE Trans Patt Anal Mach Intell, 26(2):214–225. https://doi.org/10.1109/TPAMI.2004.1262185 CrossRefGoogle Scholar
  10. Jia HJ, Ding SF, Xu XZ, et al., 2014. The latest research progress on spectral clustering. Neur Comput Appl, 24(7-8):1477–1486. https://doi.org/10.1007/s00521-013-1439-2 CrossRefGoogle Scholar
  11. Li HZ, Hu XG, Lin YJ, et al., 2016. A social tag clustering method based on common co-occurrence group similarity. Front Inform Technol Electron Eng, 17(2):122–134. https://doi.org/10.1631/FITEE.1500187 CrossRefGoogle Scholar
  12. Li JY, Xia YJ, Shan ZY, et al., 2015. Scalable constrained spectral clustering. IEEE Trans Knowl Data Eng, 27(2):589–593. https://doi.org/10.1109/TKDE.2014.2356471 CrossRefGoogle Scholar
  13. Lin F, Cohen WW, 2010. Power iteration clustering. 27th Int Conf on Machine Learning, p.655–662.Google Scholar
  14. Liu JL, Wang C, Danilevsky M, et al., 2013. Large-scale spectral clustering on graphs. 23rd Int Joint Conf on Artificial Intelligence, p.1486–1492.Google Scholar
  15. Liu W, He JF, Chang SF, 2010. Large graph construction for scalable semi-supervised learning. 27th Int Conf on Machine Learning, p.679–686.Google Scholar
  16. Luo MN, Nie FP, Chang XJ, et al., 2017. Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neur Netw Learn Syst, 29(4):944–956. https://doi.org/10.1109/TNNLS.2017.2650978 CrossRefGoogle Scholar
  17. Mall R, Langone R, Suykens JAK, 2013a. FURS: fast and unique representative subset selection retaining large-scale community structure. Soc Netw Anal Min, 3(4):1075–1095. https://doi.org/10.1007/s13278-013-0144-6 CrossRefGoogle Scholar
  18. Mall R, Langone R, Suykens JAK, 2013b. Kernel spectral clustering for big data networks. Entropy, 15(5):1567–1586. https://doi.org/10.3390/e15051567 MathSciNetCrossRefGoogle Scholar
  19. Mall R, Jumutc V, Langone R, et al., 2014. Representative subsets for big data learning using K-NN graphs. IEEE Int Conf on Big Data, p.37–42. https://doi.org/10.1109/BigData.2014.7004210 Google Scholar
  20. Ng AY, Jordan MI, Weiss Y, et al., 2002. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, p.849–856.Google Scholar
  21. Shi JB, Malik J, 2000. Normalized cuts and image segmentation. IEEE Trans Patt Anal Mach Intell, 22(8):888–905. https://doi.org/10.1109/34.868688 CrossRefGoogle Scholar
  22. Song YQ, Chen WY, Bai HJ, et al., 2008. Parallel spectral clustering. In: Daelemans W, Goethals B, Morik K (Eds.), Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, p.374–389. https://doi.org/10.1007/978-3-540-87481-2_25 CrossRefGoogle Scholar
  23. Tian F, Gao B, Cui Q, et al., 2014. Learning deep representations for graph clustering. 28th AAAI Conf on Artificial Intelligence, p.1293–1299.Google Scholar
  24. von Luxburg U, 2007. A tutorial on spectral clustering. Stat Comput, 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z MathSciNetCrossRefGoogle Scholar
  25. Wang L, Leckie C, Ramamohanarao K, et al., 2009. Approximate spectral clustering. In: Theeramunkong T, Kijsirikul B, Cercone N, et al. (Eds.), Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, p.134–146. https://doi.org/10.1007/978-3-642-01307-2_15 CrossRefGoogle Scholar
  26. Xia RK, Pan Y, Du L, et al., 2014. Robust multi-view spectral clustering via low-rank and sparse decomposition. 28th AAAI Conf on Artificial Intelligence, p.2149–2155.Google Scholar
  27. Xiang T, Gong SG, 2008. Spectral clustering with eigenvector selection. Patt Recog, 41(3):1012–1029. https://doi.org/10.1016/j.patcog.2007.07.023 CrossRefzbMATHGoogle Scholar
  28. Xiao P, Li ZY, Guo S, et al., 2016. A K self-adaptive SDN controller placement for wide area networks. Front Inform Technol Electron Eng, 17(7):620–633. https://doi.org/10.1631/FITEE.1500350 CrossRefGoogle Scholar
  29. Yan DH, Huang L, Jordan MI, 2009. Fast approximate spectral clustering. 15th Int Conf on Knowledge Discovery and Data Mining, p.907–916. https://doi.org/10.1145/1557019.1557118 Google Scholar
  30. Yang Y, Xu D, Nie FP, et al., 2010. Image clustering using local discriminant models and global integration. IEEE Trans Image Process, 19(10):2761–2773. https://doi.org/10.1109/TIP.2010.2049235 MathSciNetCrossRefzbMATHGoogle Scholar
  31. Yang Y, Shen HT, Nie FP, et al., 2011. Nonnegative spectral clustering with discriminative regularization. 25th AAAI Conf on Artificial Intelligence, p.555–560.Google Scholar
  32. Yang Y, Nie F, Xu D, et al., 2012. A multimedia retrieval framework based on semisupervised ranking and relevance feedback. IEEE Trans Patt Anal Mach Intell, 34(4):723–742. https://doi.org/10.1109/TPAMI.2011.170 CrossRefGoogle Scholar
  33. Zhang XC, Zong LL, You QZ, et al., 2016. Sampling for Nystrom extension-based spectral clustering: incremental perspective and novel analysis. ACM Trans Knowl Discov Data, 11(1):1–25. https://doi.org/10.1145/2934693 CrossRefGoogle Scholar
  34. Zhu XJ, Lafferty J, 2005. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. 22nd Int Conf on Machine Learning, p.1052–1059. https://doi.org/10.1145/1102351.1102484 Google Scholar

Copyright information

© Editorial Office of Journal of Zhejiang University Science and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyOcean University of ChinaQingdaoChina
  2. 2.Science and Information CollegeQingdao Agricultural UniversityQingdaoChina

Personalised recommendations