Skip to main content

A heuristic search algorithm based on subspaces for PageRank computation

Abstract

We studied a fast algorithm for the large-scale computation of PageRank. PageRank is what the Google search engine uses to simulate the importance of web pages. It is defined by the eigenvector of a particular stochastic matrix related to the graphs of web pages. The power method is the typical means to compute the eigenvector, while the Krylov subspace method shows faster convergence, which can be regarded as a two-step algorithm. The first step predicts the eigenvector, and the second step corrects the predicted result. More precisely, the power method is first iterated to compute the eigenvector approximately. Secondly, a Krylov subspace spanned by the approximations is searched for a better approximate eigenvector in terms of minimizing a residual. To get a better approximation efficiently, we consider using subspaces not only at the second step but also at the first step. Specifically, a Krylov subspace is first used to compute an approximate eigenvector, by which another subspace is expanded. Secondly, this non-Krylov subspace is searched for a better approximate eigenvector that minimizes its residual over the subspace. This paper describes a heuristic search algorithm iterating the two steps alternately and presents its efficient implementation. Experimental results with huge Google matrices illustrate improvements in performance of the algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford University Technical Report 1999-66

  2. Langville AN, Meyer CD (2003) Deeper inside PageRank. Internet Math 1(3):335–380

    Article  MathSciNet  MATH  Google Scholar 

  3. Langville AN, Meyer CD (2006) Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton

    MATH  Google Scholar 

  4. Eldén L (2006) Numerical linear algebra in data mining. Acta Numer 15:327–384

    Article  MathSciNet  MATH  Google Scholar 

  5. Eldén L (2007) Matrix methods in data mining and pattern recognition. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  6. Kamvar SD (2010) Numerical algorithms for personalized search in self-organizing information networks. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  7. Moler C (2011) Experiments with MATLAB. Electronic edition published by MathWorks. http://www.mathworks.com/moler

  8. Gleich DF (2015) PageRank beyond the web. SIAM Rev 57(3):321–363

    Article  MathSciNet  MATH  Google Scholar 

  9. Kamvar SD, Haveliwala TH, Golub GH (2003) Adaptive methods for the computation of PageRank. Stanford University Technical Report 2003-26

  10. Kamvar SD, Haveliwala TH, Manning CD, Golub GH (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th International Conference on World Wide Web

  11. Haveliwala TH, Kamvar SD, Klein D, Manning CD, Golub GH (2003) Computing PageRank using power extrapolation. Stanford University Technical Report 2003-45

  12. Golub GH, Greif C (2006) An Arnoldi-type algorithm for computing page rank. BIT 46(4):759–771

    Article  MathSciNet  MATH  Google Scholar 

  13. Arnal J, Migallón H, Migallón V, Palomino JA, Penadés J (2014) Parallel relaxed and extrapolated algorithms for computing PageRank. J Supercomput 70(2):637–648

    Article  Google Scholar 

  14. Tan X (2017) A new extrapolation method for PageRank computations. J Comput Appl Math 313:383–392

    Article  MathSciNet  MATH  Google Scholar 

  15. Migallón H, Migallón V, Palomino JA, Penadés J (2016) A heuristic relaxed extrapolated algorithm for accelerating PageRank. Adv Eng Softw. https://doi.org/10.1016/j.advengsoft.2016.01.024

  16. Golub GH, Loan CFV (2012) Matrix computations. SIAM, Philadelphia

    MATH  Google Scholar 

  17. LAPACK—Linear Algebra PACKage. http://www.netlib.org/lapack/

  18. Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  19. Haveliwala TH, Kamvar SD (2003) The second eigenvalue of the Google matrix. Stanford University Technical Report 2003-20

  20. Arnoldi WE (1951) The principle of minimized iterations in the solution of the matrix eigenvalue problem. Q Appl Math 9(1):17–29

    Article  MathSciNet  MATH  Google Scholar 

  21. Wilkinson JH (1988) The algebraic eigenvalue problem. Oxford University Press, Oxford

    MATH  Google Scholar 

  22. Bai Z, Demmel J, Dongarra J, Ruhe A, Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  23. Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25. Available as the SuiteSparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/

  24. OpenMP application programming interface examples ver. 4.5.0. http://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takafumi Miyata.

Additional information

This work was supported by KAKENHI Grant No. 18K11343.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miyata, T. A heuristic search algorithm based on subspaces for PageRank computation. J Supercomput 74, 3278–3294 (2018). https://doi.org/10.1007/s11227-018-2383-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2383-9

Keywords

  • PageRank
  • Google matrix
  • Power iteration
  • Krylov subspace
  • Residual minimization
  • Parallel computing