Knowledge and Information Systems

, Volume 14, Issue 3, pp 327–346 | Cite as

Random walk with restart: fast solutions and applications

Regular Paper

Abstract

How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) block-wise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman–Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150 × speed up with 90%+ quality preservation.

Keywords

Relevance score Random walk with restart Graph Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Achlioptas D, McSherry F (2001) Fast computation of low rank matrix approximation. In: STOCGoogle Scholar
  2. 2.
    Aditya B, Bhalotia G, Chakrabarti S, Hulgeri A, Nakhe C, Parag SS (2002) Banks: Browsing and keyword searching in relational databases. In: VLDB, pp 1083–1086Google Scholar
  3. 3.
    Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: Authority-based keyword search in databases. In: VLDB, 564, 564–575Google Scholar
  4. 4.
    http://www.informatik.uni-trier.de/~ley/db/Google Scholar
  5. 5.
    Deerwester S, Dumais S, Landauer T, Furnas G and Harshman R (1990). Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6): 391–407 CrossRefGoogle Scholar
  6. 6.
    Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03), Washington, DC, August 24–27Google Scholar
  7. 7.
    Faloutsos C, McCurley KS, Tomkins A (2004) Fast discovery of connection subgraphs. In: KDD, pp 118–127Google Scholar
  8. 8.
    Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: KDD, pp 150–160Google Scholar
  9. 9.
    Fogaras D, Racz B (2004) Towards scaling fully personalized pagerank. In: Proc. WAW, pp 105–117Google Scholar
  10. 10.
    Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. In: VLDB, pp 552–563Google Scholar
  11. 11.
    Girvan M, Newman MEJ (2002) Community structure is social and biological networks. Proc Natl Acad Sci 7821–7826Google Scholar
  12. 12.
    Golub G, Loan C (1996) Matrix computation. Johns HopkinsGoogle Scholar
  13. 13.
    Haveliwala TH (2002) Topic-sensitive pagerank. WWW, pp 517–526Google Scholar
  14. 14.
    He J, Li M, Zhang H, Tong H, Zhang C (2004) Manifold-ranking based image retrieval. In: ACM Multimedia, pp 9–16Google Scholar
  15. 15.
    Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: KDD, pp 538–543Google Scholar
  16. 16.
    Jeh G, Widom J (2003) Scaling personalized web search. In: WWWGoogle Scholar
  17. 17.
    Jolliffe I (2002). Principal component analysis. Springer, Heidelberg MATHGoogle Scholar
  18. 18.
    Kamvar S, Haveliwala T, Manning C, Golub G (2003) Exploiting the block structure of the web for computing pagerank. Stanford University Technical ReportGoogle Scholar
  19. 19.
    Karypis G and Kumar V (1999). Parallel multilevel k-way partitioning for irregular graphs. SIAM Rev 41(2): 278–300 MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proc. CIKMGoogle Scholar
  21. 21.
    Lu W, Janssen JCM, Milios EE, Japkowicz N and Zhang Y (2007). Node similarity in the citation graph. J Knowledge Informat Syst 11(1): 105–129 CrossRefGoogle Scholar
  22. 22.
    Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: NIPS, pp 849–856Google Scholar
  23. 23.
    Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project. Paper SIDL-WP-1999-0120 (version of 11/11/1999)Google Scholar
  24. 24.
    Palopoli L, Rosaci D, Terracina G and Ursino D (2005). A graph-based approach for extracting terminological properties from information sources with heterogeneous formats. J Knowledge Informat Syst 8(4): 462–497 CrossRefGoogle Scholar
  25. 25.
    Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: KDD, pp 653–658Google Scholar
  26. 26.
    Piegorsch W and Casella GE (1990). Inverting a sum of matrices. SIAM Rev 32: 470 CrossRefMathSciNetGoogle Scholar
  27. 27.
    Rasmusen CE, Williams C (2006) Gaussian processes for machine learning. MIT PressGoogle Scholar
  28. 28.
    Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: ICDM, pp 418–425Google Scholar
  29. 29.
    Tong H, Faloutsos C (2006) Center-piece subgraphs: Problem definition and fast solutions. In: KDDGoogle Scholar
  30. 30.
    Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B (2003) Learning with local and global consistency. In: NIPSGoogle Scholar
  31. 31.
    Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian field and harmonic functions. In: ICML, pp 912–919Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations