Advertisement

Knowledge and Information Systems

, Volume 42, Issue 1, pp 127–146 | Cite as

Fast PageRank approximation by adaptive sampling

  • Wenting Liu
  • Guangxia Li
  • James Cheng
Regular Paper

Abstract

PageRank is typically computed from the power of transition matrix in a Markov Chain model. It is therefore computationally expensive, and efficient approximation methods to accelerate the computation are necessary, especially when it comes to large graphs. In this paper, we propose two sampling algorithms for PageRank efficient approximation: Direct sampling and Adaptive sampling. Both methods sample the transition matrix and use the sample in PageRank computation. Direct sampling method samples the transition matrix once and uses the sample directly in PageRank computation, whereas adaptive sampling method samples the transition matrix multiple times with an adaptive sample rate which is adjusted iteratively as the computing procedure proceeds. This adaptive sample rate is designed for a good trade-off between accuracy and efficiency for PageRank approximation. We provide detailed theoretical analysis on the error bounds of both methods. We also compare them with several state-of-the-art PageRank approximation methods, including power extrapolation and inner–outer power iteration algorithm. Experimental results on several real-world datasets show that our methods can achieve significantly higher efficiency while attaining comparable accuracy than state-of-the-art methods.

Keywords

PageRank Adaptive Sampling Power iteration 

References

  1. 1.
    Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J. ACM 54(2):9CrossRefMathSciNetGoogle Scholar
  2. 2.
    Avrachenkov K, Lebedev D (2006) Pagerank of scale-free growing networks. Internet Math 3(2):207–232CrossRefMathSciNetzbMATHGoogle Scholar
  3. 3.
    Berkhin P (2005) Survey: a survey on pagerank computing. Internet Math 2(1):73–120CrossRefMathSciNetzbMATHGoogle Scholar
  4. 4.
    Benczur A, Csalogány K, Sarlós T (2005) On the feasibility of low-rank approximation for personalized pagerank. In: Proceedings of the 14th international conference on World Wide Web, Chiba, Japan, May 2005, pp 972–973Google Scholar
  5. 5.
    Borodin J, Roberts G, Tsaparas P (2005) Link analysis ranking: algorithms, theory, and experiments. ACM Trans Internet Technol 5: pp 231–297Google Scholar
  6. 6.
    Brin RMS, Page L, Winograd T (1999) The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999Google Scholar
  7. 7.
    Candès EJ, Plan Y (2010) Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR. abs/1001.0339, 2010Google Scholar
  8. 8.
    Drineas P, Kannan R (2001) Fast Monte-Carlo algorithms for approximate matrix multiplication. In: 42nd annual symposium on foundations of computer science, Las Vegas, Nevada, USA, October 2001, pp 452–459Google Scholar
  9. 9.
    Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J Sci Comput 36:132–157CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Gleich DF, Gray AP, Greif C, Lau T (2010) An inner–outer iteration for PageRank. SIAM J Sci Comput 32(1):349–371CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Haveliwala T, Kamvar S, Klein D, Manning C, Golub G (2003) Computing PageRank using power extrapolation. Stanford University Technical Report, July 2003Google Scholar
  12. 12.
    He G, Feng H, Li C, Chen H (2010) Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, July 2010, pp 543–552Google Scholar
  13. 13.
    Kamvar S, Haveliwala T, Golub G (2003) Adaptive methods for the computation of pagerank. Technical Report 2003-26, Stanford InfoLab, April 2003Google Scholar
  14. 14.
    Kamvar S, Haveliwala T, Manning C, Golub G (2003) Extrapolation methods for accelerating pagerank computations. In: Proceedings of the twelfth international world wide web conference, Budapest, Hungary, May 2003, pp 261–270Google Scholar
  15. 15.
    Kwong MK, Zettl A (1991) Norm inequalities for the powers of a matrix. Am Math Mon 98(6):533–538CrossRefMathSciNetzbMATHGoogle Scholar
  16. 16.
    Langville AN, Meyer CD (2003) Survey: deeper inside pagerank. Internet Math 1(3):335–380CrossRefMathSciNetGoogle Scholar
  17. 17.
    Lee CP, Golub GH, Zenios SA (2007) A two-stage algorithm for computing pagerank and multistage generalizations. Internet Math 4 (4):299–327Google Scholar
  18. 18.
    McSherry F (2005) A uniform approach to accelerated pagerank computation. In: Proceedings of the 14th international conference on World Wide Web, Chiba, Japan, May 2005, pp 575–582Google Scholar
  19. 19.
    Osborne JRS, Wiggins E (2009) On accelerating the pagerank computation. Internet Math 6(2):157–172CrossRefMathSciNetzbMATHGoogle Scholar
  20. 20.
    Sidi A (2008) Methods for acceleration of convergence (extrapolation) of vector sequences. In: Wah BW (ed) Wiley encyclopedia of Computer Science and Engineering. Wiley, New YorkGoogle Scholar
  21. 21.
    SNAP (2007). Stanford Network Analysis Platform Standard Large Network Dataset Collection, Jure Leskovec. http://snap.stanford.edu/data/index.html
  22. 22.
    Wicks J, Greenwald AR (2007) More efficient parallel computation of pagerank. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, The Netherlands, July 2007, pp 861–862Google Scholar
  23. 23.
    Wu G, Wei Y (2010) Arnoldi versus GMRES for computing pagerank: a theoretical contribution to google’s pagerank problem. ACM Trans Inf Syst 28(3):11:1–11:28CrossRefGoogle Scholar
  24. 24.
    Xue GR, Yang Q, Zeng HJ, Yu Y, Chen Z (2005) Exploiting the hierarchical structure for link analysis. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, Salvador, Brazil, August 2005, pp 186–193Google Scholar
  25. 25.
    Zhu Y, Ye S, Li X (2005) Distributed pagerank computation based on iterative aggregation-disaggregation methods. In: ACM fourteenth conference on information and knowledge management (CIKM), Bremen, Germany, November 2005, pp 578–585Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of Computer Science and EngineeringChinese University of Hong KongShatinHong Kong

Personalised recommendations