Variational perspective on local graph clustering

Abstract

Modern graph clustering applications require the analysis of large graphs and this can be computationally expensive. In this regard, local spectral graph clustering methods aim to identify well-connected clusters around a given “seed set” of reference nodes without accessing the entire graph. The celebrated Approximate Personalized PageRank (APPR) algorithm in the seminal paper by Andersen et al. (in: FOCS ’06 proceedings of the 47th annual IEEE symposium on foundations of computer science, pp 475–486, 2006) is one such method. APPR was introduced and motivated purely from an algorithmic perspective. In other words, there is no a priori notion of objective function/optimality conditions that characterizes the steps taken by APPR. Here, we derive a novel variational formulation which makes explicit the actual optimization problem solved by APPR. In doing so, we draw connections between the local spectral algorithm of Andersen et al. (2006) and an iterative shrinkage-thresholding algorithm (ISTA). In particular, we show that, appropriately initialized ISTA applied to our variational formulation can recover the sought-after local cluster in a time that only depends on the number of non-zeros of the optimal solution instead of the entire graph. In the process, we show that an optimization algorithm which apparently requires accessing the entire graph, can be made to behave in a completely local manner by accessing only a small number of nodes. This viewpoint builds a bridge across two seemingly disjoint fields of graph processing and numerical optimization, and it allows one to leverage well-studied, numerically robust, and efficient optimization algorithms for processing today’s large graphs.

This is a preview of subscription content, log in to check access.

Fig. 1

Notes

  1. 1.

    In between global and local algorithms, there is a class of locally-biased algorithms, e.g., [18], whose running time depends on the entire graph, however, the solution is locally-biased toward some input seed set of reference nodes. We don’t consider them in this paper.

  2. 2.

    Iteration complexity refers to the worst-case number of iterations to satisfy the termination criterion and running time refers to the total amount of work, i.e., the per-iteration cost times iteration complexity.

References

  1. 1.

    Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: FOCS ’06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 475–486 (2006)

  2. 2.

    Andersen, R., Lang, K.: An algorithm for improving graph partitions. In: SODA ’08 Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 651–660 (2008)

  3. 3.

    Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. J. ACM 56(2), 5 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  4. 4.

    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  5. 5.

    Cheeger, J.: A lower bound for the smallest eigenvalue of the Laplacian. In: Problems in Analysis, Papers Dedicated to Salomon Bochner, pp. 195–199. Princeton University Press (1969)

  6. 6.

    Chung, F.: Random walks and local cuts in graphs. Linear Algebra Appl. 423, 22–32 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  7. 7.

    Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Nearest neighbor based greedy coordinate descent. In: Advances in Neural Information Processing Systems 24 (NIPS 2011) (2011)

  8. 8.

    Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the web frontier. In: Proceedings of the 13th International Conference on World Wide Web, pp. 309–318 (2004)

  9. 9.

    Fountoulakis, K., Cheng, X., Shun, J., Roosta-Khorasani, F., Mahoney, M.W.: Exploiting optimization for local graph clustering. Technical report. Preprint arXiv:1602.01886 (2016)

  10. 10.

    Gleich, D.F.: Pagerank beyond the web. SIAM Rev. 57(3), 321–363 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  11. 11.

    Gleich, D.F., Mahoney, M.W.: Anti-differentiating approximation algorithms: a case study with min-cuts, spectral, and flow. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1018–1025 (2014)

  12. 12.

    Grady, L., Schwartz, E.L.: Isoperimetric partitioning:a new algorithm for graph partitioning. SIAM J. Sci. Comput. 27(6), 1844–1866 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    Hall, K.M.: An r-dimensional quadratic placement algorithm. Manag. Sci. 17(3), 219–229 (1970)

    MATH  Article  Google Scholar 

  14. 14.

    Jeub, L.G.S., Balachandran, P., Porter, M.A., Mucha, P.J., Mahoney, M.W.: Think locally, act locally: the detection of small, medium-sized, and large communities in large networks. Phys. Rev. E 91(1), 012,821 (2015)

    Article  Google Scholar 

  15. 15.

    Kloster, K., Gleich, D.F.: Heat kernel based community detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1386–1395 (2014)

  16. 16.

    Leighton, T., Rao, S.: An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In: 29th Annual Symposium on Foundations of Computer Science, pp. 422–431 (1988)

  17. 17.

    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  18. 18.

    Mahoney, M.W., Orecchia, L., Vishnoi, N.K.: A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally. J. Mach. Learn. Res. 13, 2339–2365 (2012)

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Orecchia, L., Zhu, Z.A.: Flow-based algorithms for local graph clustering. In: SODA ’14 Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1267–1286 (2014)

  20. 20.

    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

  21. 21.

    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  22. 22.

    Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)

    MathSciNet  MATH  Article  Google Scholar 

  23. 23.

    Spielman, D.A., Teng, S.H.: A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Sci. Comput. 42(1), 1–26 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  24. 24.

    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)

    Google Scholar 

  25. 25.

    Veldt, N., Gleich, D.F., Mahoney, M.W.: A simple and strongly-local flow-based method for cut improvement. Accepted to ICML (2016)

Download references

Acknowledgements

MM would like to thank the Army Research Office and the Defense Advanced Research Projects Agency for partial support of this work. JS was supported by the Miller Institute for Basic Research in Science at UC Berkeley. JS would also like to acknowledge the Miller Institute for Basic Research in Science at UC Berkeley.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kimon Fountoulakis.

Additional information

A preliminary version of this work appeared with the title “Exploiting Optimization for Local Graph Clustering” as a technical report [9].

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fountoulakis, K., Roosta-Khorasani, F., Shun, J. et al. Variational perspective on local graph clustering. Math. Program. 174, 553–573 (2019). https://doi.org/10.1007/s10107-017-1214-8

Download citation

Keywords

  • Local spectral graph clustering
  • Variational formulation
  • Approximate Personalized PageRank
  • Iterative shrinkage-thresholding

Mathematics Subject Classification

  • 05C85
  • 90C35
  • 65K10