Statistics and Computing

, Volume 17, Issue 4, pp 395–416 | Cite as

A tutorial on spectral clustering

Article

Abstract

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

Keywords

Spectral clustering Graph Laplacian 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aldous, D., Fill, J.: Reversible Markov chains and random walks on graphs (in preparation). Online version available at http://www.stat.berkeley.edu/users/aldous/RWG/book.html
  2. Bach, F., Jordan, M.: Learning spectral clustering. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16 (NIPS), pp. 305–312. MIT Press, Cambridge (2004) Google Scholar
  3. Bapat, R., Gutman, I., Xiao, W.: A simple method for computing resistance distance. Z. Naturforsch. 58, 494–498 (2003) Google Scholar
  4. Barnard, S., Pothen, A., Simon, H.: A spectral algorithm for envelope reduction of sparse matrices. Numer. Linear Algebra Appl. 2(4), 317–334 (1995) MATHCrossRefMathSciNetGoogle Scholar
  5. Belkin, M.: Problems of learning on manifolds. Ph.D. Thesis, University of Chicago (2003) Google Scholar
  6. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003) MATHCrossRefGoogle Scholar
  7. Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. In: Auer, P., Meir, R. (eds.) Proceedings of the 18th Annual Conference on Learning Theory (COLT), pp. 486–500. Springer, New York (2005) Google Scholar
  8. Ben-David, S., von Luxburg, U., Pal, D.: A sober look on clustering stability. In: Lugosi, G., Simon, H. (eds.) Proceedings of the 19th Annual Conference on Learning Theory (COLT), pp. 5–19. Springer, Berlin (2006) Google Scholar
  9. Bengio, Y., Delalleau, O., Roux, N., Paiement, J., Vincent, P., Ouimet, M.: Learning eigenfunctions links spectral embedding and kernel PCA. Neural Comput. 16, 2197–2219 (2004) MATHCrossRefGoogle Scholar
  10. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6–17 (2002) Google Scholar
  11. Bhatia, R.: Matrix Analysis. Springer, New York (1997) Google Scholar
  12. Bie, T.D., Cristianini, N.: Fast SDP relaxations of graph cut clustering, transduction, and other combinatorial problems. J. Mach. Learn. Res. 7, 1409–1436 (2006) MathSciNetGoogle Scholar
  13. Bolla, M.: Relations between spectral and classification properties of multigraphs. Technical Report No. DIMACS-91-27, Center for Discrete Mathematics and Theoretical Computer Science (1991) Google Scholar
  14. Brémaud, P.: Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York (1999) MATHGoogle Scholar
  15. Brito, M., Chavez, E., Quiroz, A., Yukich, J.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35, 33–42 (1997) MATHCrossRefMathSciNetGoogle Scholar
  16. Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is NP-hard. Inf. Process. Lett. 42(3), 153–159 (1992) MATHCrossRefMathSciNetGoogle Scholar
  17. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006) Google Scholar
  18. Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series, vol. 92. Conference Board of the Mathematical Sciences, Washington (1997) MATHGoogle Scholar
  19. Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–274. ACM Press, New York (2001) CrossRefGoogle Scholar
  20. Dhillon, I., Guan, Y., Kulis, B.: A unified view of kernel k-means, spectral clustering, and graph partitioning. Technical Report No. UTCS TR-04-25, University of Texas at Austin (2005) Google Scholar
  21. Ding, C.: A tutorial on spectral clustering. Talk presented at ICML (2004). Slides available at http://crd.lbl.gov/~cding/Spectral/
  22. Ding, C., He, X., Zha, H., Gu, M., Simon, H.: A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings of the first IEEE International Conference on Data Mining (ICDM), pp. 107–114. IEEE Computer Society, Washington (2001) CrossRefGoogle Scholar
  23. Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17, 420–425 (1973) MATHCrossRefMathSciNetGoogle Scholar
  24. Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 298–305 (1973) MathSciNetGoogle Scholar
  25. Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19, 355–369 (2007) Google Scholar
  26. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002) MATHCrossRefMathSciNetGoogle Scholar
  27. Giné, E., Koltchinskii, V.: Empirical graph Laplacian approximation of Laplace-Beltrami operators: large sample results. In: Proceedings of the 4th International Conference on High Dimensional Probability, pp. 238–259 (2005) Google Scholar
  28. Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996) MATHGoogle Scholar
  29. Guattery, S., Miller, G.: On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19(3), 701–719 (1998) MATHCrossRefMathSciNetGoogle Scholar
  30. Gutman, I., Xiao, W.: Generalized inverse of the Laplacian matrix and some applications. Bull. Acad. Serb. Sci. Arts (Cl. Math. Natur.) 129, 15–23 (2004) MATHMathSciNetGoogle Scholar
  31. Hagen, L., Kahng, A.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput.-Aided Des. 11(9), 1074–1085 (1992) CrossRefGoogle Scholar
  32. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001) MATHGoogle Scholar
  33. Hein, M.: Uniform convergence of adaptive graph-based regularization. In: Proceedings of the 19th Annual Conference on Learning Theory (COLT), pp. 50–64. Springer, New York (2006) Google Scholar
  34. Hein, M., Audibert, J.-Y., von Luxburg, U.: From graphs to manifolds—weak and strong pointwise consistency of graph Laplacians. In: Auer, P., Meir, R. (eds.) Proceedings of the 18th Annual Conference on Learning Theory (COLT), pp. 470–485. Springer, New York (2005) Google Scholar
  35. Hein, M., Audibert, J.-Y., von Luxburg, U.: Graph Laplacians and their convergence on random neighborhood graphs. J. Mach. Learn. Res. 8, 1325–1370 (2007) MathSciNetGoogle Scholar
  36. Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16, 452–469 (1995) MATHCrossRefMathSciNetGoogle Scholar
  37. Joachims, T.: Transductive learning via spectral graph partitioning. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 290–297. AAAI Press (2003) Google Scholar
  38. Kannan, R., Vempala, S., Vetta, A.: On clusterings: good, bad and spectral. J. ACM 51(3), 497–515 (2004) CrossRefMathSciNetGoogle Scholar
  39. Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 561–568. ACM Press, New York (2004) Google Scholar
  40. Klein, D., Randic, M.: Resistance distance. J. Math. Chem. 12, 81–95 (1993) CrossRefMathSciNetGoogle Scholar
  41. Koren, Y.: Drawing graphs by eigenvectors: theory and practice. Comput. Math. Appl. 49, 1867–1888 (2005) MATHCrossRefMathSciNetGoogle Scholar
  42. Lafon, S.: Diffusion maps and geometric harmonics. Ph.D. Thesis, Yale University (2004) Google Scholar
  43. Lang, K.: Fixing two weaknesses of the spectral method. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, pp. 715–722. MIT Press, Cambridge (2006) Google Scholar
  44. Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004) MATHCrossRefGoogle Scholar
  45. Lovász, L.: Random walks on graphs: a survey. In: Combinatorics, Paul Erdös is Eighty, pp. 353–397. János Bolyai Math. Soc., Budapest (1993) Google Scholar
  46. Lütkepohl, H.: Handbook of Matrices. Wiley, Chichester (1997) Google Scholar
  47. Meila, M., Shi, J.: A random walks view of spectral segmentation. In: 8th International Workshop on Artificial Intelligence and Statistics (AISTATS) (2001) Google Scholar
  48. Mohar, B.: The Laplacian spectrum of graphs. In: Graph Theory, Combinatorics, and Applications. Kalamazoo, MI, 1988, vol. 2, pp. 871–898. Wiley, New York (1991) Google Scholar
  49. Mohar, B.: Some applications of Laplace eigenvalues of graphs. In: Hahn, G., Sabidussi, G. (eds.) Graph Symmetry: Algebraic Methods and Applications. NATO ASI Ser. C, vol. 497, pp. 225–275. Kluwer, Dordrecht (1997) Google Scholar
  50. Nadler, B., Lafon, S., Coifman, R., Kevrekidis, I.: Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, pp. 955–962. MIT Press, Cambridge (2006) Google Scholar
  51. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 849–856. MIT Press, Cambridge (2002) Google Scholar
  52. Norris, J.: Markov Chains. Cambridge University Press, Cambridge (1997) MATHGoogle Scholar
  53. Penrose, M.: A strong law for the longest edge of the minimal spanning tree. Ann. Probab. 27(1), 246–260 (1999) MATHCrossRefMathSciNetGoogle Scholar
  54. Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11, 430–452 (1990) MATHCrossRefMathSciNetGoogle Scholar
  55. Saerens, M., Fouss, F., Yen, L., Dupont, P.: The principal components analysis of a graph, and its relationships to spectral clustering. In: Proceedings of the 15th European Conference on Machine Learning (ECML), pp. 371–383. Springer, Berlin (2004) Google Scholar
  56. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRefGoogle Scholar
  57. Simon, H.: Partitioning of unstructured problems for parallel processing. Comput. Syst. Eng. 2, 135–148 (1991) CrossRefGoogle Scholar
  58. Spielman, D., Teng, S.: Spectral partitioning works: planar graphs and finite element meshes. In: 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996), pp. 96–105. CA: IEEE Comput. Soc. Press, Los Alamitos (1996) Google Scholar
  59. Stewart, G., Sun, J.: Matrix Perturbation Theory. Academic, New York (1990) MATHGoogle Scholar
  60. Still, S., Bialek, W.: How many clusters? An information-theoretic perspective. Neural Comput. 16(12), 2483–2506 (2004) MATHCrossRefGoogle Scholar
  61. Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM 44(4), 585–591 (1997) MATHCrossRefMathSciNetGoogle Scholar
  62. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc. B 63(2), 411–423 (2001) MATHCrossRefMathSciNetGoogle Scholar
  63. Van Driessche, R., Roose, D.: An improved spectral bisection algorithm and its application to dynamic load balancing. Parallel Comput. 21(1), 29–48 (1995) MATHCrossRefMathSciNetGoogle Scholar
  64. von Luxburg, U., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Stat. (to appear). See also Technical Report No. 134, Max Planck Institute for Biological Cybernetics (2004) Google Scholar
  65. von Luxburg, U., Bousquet, O., Belkin, M.: On the convergence of spectral clustering on random samples: the normalized case. In: Shawe-Taylor, J., Singer, Y. (eds.) Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 457–471. Springer, New York (2004) Google Scholar
  66. von Luxburg, U., Bousquet, O., Belkin, M.: Limits of spectral clustering. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems (NIPS) 17, pp. 857–864. MIT Press, Cambridge (2005) Google Scholar
  67. Wagner, D., Wagner, F.: Between min cut and graph bisection. In: Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science (MFCS), pp. 744–750. Springer, London (1993) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations