Advertisement

Graph Clustering Using Early-Stopped Random Walks

  • Małgorzata Lucińska
  • Sławomir T. Wierzchoń
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9842)

Abstract

Very fast growth of empirical graphs demands clustering algorithms with nearly-linear time complexity. We propose a novel approach to clustering, based on random walks. The idea is to relax the standard spectral method and replace eigenvectors with vectors obtained by running early-stopped random walks. We abandoned iterating the random walk algorithm to convergence but instead stopped it after the time that is short compared with the mixing time. The computed vectors constitute a local approximation of the leading eigenvectors. The algorithm performance is competitive to the traditional spectral solutions in terms of computational complexity. We empirically evaluate the proposed approach against other exact and approximate methods. Experimental results show that the use of the early stop procedure does not influence the quality of the clustering on the tested real world data sets.

Keywords

Graph clustering Random walks Convergence rate 

References

  1. 1.
    Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using PageRank vectors. In: FOCS 2006, pp. 475–486 (2006)Google Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
  3. 3.
    Cao, J.Z., Chen, P., Dai, Q., Ling, B.W.K.: Local information-based fast approximate spectral clustering. Pattern Recogn. Lett. 38, 63–69 (2014)CrossRefGoogle Scholar
  4. 4.
    van Dongen, S.: Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30(1), 121–141 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Enright, A.J., van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  6. 6.
    Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23(98), 298–305 (1973)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Flake, G., Lawrence, S., Lee Giles, C., Coetzee, F.: Self-organization and identification of Web communities. IEEE Comput. 35(3), 66–71 (2002)CrossRefGoogle Scholar
  8. 8.
    Gavin, A.C., et al.: Functional organization of the yeast protein by systematic analysis of protein complexes. Nature 415, 141–147 (2002)CrossRefGoogle Scholar
  9. 9.
    Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99(12), 7821–7826 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16, 550–554 (1994)CrossRefGoogle Scholar
  11. 11.
    Lováasz, L.: Random walks on graphs: a survey, combinatorics, Paul Erdös is Eighty 2, pp. 146 (1993)Google Scholar
  12. 12.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Mahoney, M., Orecchia, L.: A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally. J. Mach. Learn. Res. 13, 2339–2365 (2012)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Manning, C., Raghavan, P., Schtauze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849–856 (2001)Google Scholar
  16. 16.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR 1997), pp. 731–752. IEEE Computer Society (1997)Google Scholar
  17. 17.
    Spielman, D.A., Teng, S.-H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: STOC 2004, pp. 81–90. ACM, New York (2004)Google Scholar
  18. 18.
    Spielman, D.A., Teng, S.-H.: A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning. CoRR, abs/0809.3232 (2008)Google Scholar
  19. 19.
    Spielman, D.A., Teng, S.-H.: Spectral sparsification of graphs. SIAM J. Comput. 40, 18–025 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Yann, L., Corinna, C.: The MNIST database of handwritten digits (2009). http://yannlecun.com/exdb/mnist/
  21. 21.
    Wu, F., Huberman, B.A.: Finding communities in linear time: a physics approach. Eur. Phys. J. B 38(2), 331–338 (2004)CrossRefGoogle Scholar
  22. 22.
    Zaki, N.M., Lazarova-Molnar, S., El-Hajj, W., Campbell, P.: Protein-protein interaction based on pairwise similarity. BMC Bioinf. 10, 1–12 (2009)CrossRefGoogle Scholar
  23. 23.
    Zhang, K., Kwok, J.: Improved Nyström low rank approximation and error analysis. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)Google Scholar
  24. 24.
    Zhou, H., Lipowsky, R.: Dynamic pattern evolution on scale-free networks. Proc. Nat. Acad. Sci. USA 102(29), 10052–10057 (2005)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Małgorzata Lucińska
    • 1
  • Sławomir T. Wierzchoń
    • 2
  1. 1.Kielce University of TechnologyKielcePoland
  2. 2.Institute of Computer Science Polish Academy of SciencesWarsawPoland

Personalised recommendations