Abstract
PageRank is an algorithm for computing a ranking for every Web page based on the graph of the Web. It plays an important role in Google’s search engine. The core of the PageRank algorithm involves computing the principal eigenvector of the Google matrix. Currently, we need to solve PageRank problems with high damping factors, which cost considerable time. A possible approach for accelerating the computation is the Arnoldi-type algorithm. However, this algorithm may not be satisfactory when the damping factor is high and the dimension of the Krylov subspace is low. Even worse, it may stagnate in practice. In this paper, we propose two strategies to improve the efficiency of the Arnoldi-type algorithm. Theoretical analysis shows that the new algorithms can accelerate the original Arnoldi-type algorithm considerably, and circumvent the drawback of stagnation. Numerical experiments illustrate that the accelerated Arnoldi-type algorithms usually outperform many state-of-the-art accelerating algorithms for PageRank. Applications of the new algorithms to function predicting of proteins are also discussed.
Similar content being viewed by others
References
Abbott, A.: And now for the proteome. Nature 409, 747 (2001)
Avrachenkov, K., Litvak, N., Nemirovsky, D., Osipova, N.: Monte carlo methods in PageRank computation: when one iteration is sufficient. SIAM J. Numer. Anal. 45, 890–904 (2007)
Beattie, C., Embree, M., Sorensen, D.: Convergence of polynomial restart Krylov methods for eigenvalue computation. SIAM Rev. 47, 492–515 (2005)
Bellalij, M., Saad, Y., Sadok, H.: Further analysis of the Arnoldi process for eigenvalue problems. SIAM J. Numer. Anal. 48, 393–407 (2010)
Berkhin, P.: A survey on PageRank computing. Internet Math. 2, 73–120 (2005)
Berman, A., Plemmons, R.: Nonnegative Matrices in the Mathematical Sciences, 2nd edn. SIAM, Philadelphia (1994)
Boldi, P., Santini, M., Vigna, S.: PageRank: functional dependencies. ACM Trans. Inf. Syst. 27(1) (2009)
Brezinski, C., Redivo-Zaglia, M.: Rational extrapolation for the PageRank vector. Math. Comput. 77, 1585–1598 (2008)
Chen, Z., Cai, Z., Li, M., Liu, B.: Using search engine technology for protein function prediction. Inter. J. Bio. Res. Appl. 7, 101–113 (2011)
Cicone, A., Serra-Capizzano, S.: Google PageRanking problem: the model and the analysis. J. Comput. Appl. Math. 234, 3140–3169 (2010)
Cipra, B.: The best of the 20th century: editors name top 10 algorithms. SIAM News 33(4) (2000)
Constantine, P., Gleich, D.: Random alpha PageRank. Internet Math. 6, 189–236 (2010)
Del Corso, G., Gullì, A., Romani, F.: Comparison of Krylov subspace methods on the PageRank problem. J. Comput. Appl. Math. 210, 159–166 (2007)
Freschi, V.: Protein function prediction from interaction networks using a random walk ranking algorithm. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 14–17, pp. 42–48 (2007)
Gleich, D., Zhukov, L., Berkhin, P.: Fast Parallel PageRank: A Linear System Approach. Technical Report, Yahoo! (2004)
Gleich, D., Gray, A., Greif, C., Lau, T.: An inner-outer iteration for computing PageRank. SIAM J. Sci. Comput. 32, 349–371 (2010)
Golub, G.H., Greif, C.: An Arnoldi-type algorithm for computing PageRank. BIT 46, 759–771 (2006)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Grindrod, P.: Range-dependent random graphs and their application to modelling large small-world proteome datasets. Phys. Rev E. 66, 066702 (2002)
Haveliwala, T., Kamvar, S.: The second eigenvalue of the Google matrix. Stanford University Technical Report (2003)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)
Ipsen, I., Selee, T.: PageRank computation, with special attention to dangling nodes. SIAM J. Matrix Anal. Appl. 29, 1281–1296 (2007)
Jia, Z.: Refined iterative algorithms based on Arnoldi’s process for large unsymmetric eigenproblems. Linear Algeb. Appl. 259, 1–23 (1997)
Jia, Z., Stewart, G.W.: An analysis of the Rayleigh-Ritz method for approximating eigenspaces. Math. Comput. 70, 637–647 (2001)
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.H.: Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th Conference on International, World Wide Web (2003)
Kamvar, S., Haveliwala, T., Golub, G.H.: Adaptive methods for the computation of PageRank. Linear Algeb. Appl. 386, 51–65 (2004)
Kollias, G., Gallopoulos E.: Functional rankings with multidamping: Generalizing PageRank with inhomogeneous matrix products (submitted)
Langville, A., Meyer, C.: Deeper inside PageRank. Internet Math. 1, 335–380 (2005)
Langville, A., Meyer, C.: Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton (2006)
Lee, C., Golub, G.H., Zenios, S.: A Fast Two-Stage Algorithm for Computing PageRank and Its Extensions. Stanford University Technical Report, SCCM-03-15 (2003)
Manteuffel, T.: Adaptive procedure for estimating parameters for the nonsymmetric Tchebychev iteration. Numer. Math. 31, 183–208 (1978)
Moler, C.: The World’s Largest Matrix Computation. MATLAB News and Notes (2002)
Morrison, J., Breitling, R., Higham, D., Gilbert, D.: A lock-and-key model for protein-protein interactions. Bioinformatics 2, 2012–2019 (2006)
Nachtigal, N., Reichel, L., Trefethen, L.: A hybrid GMRES algorithm for nonsymmetric linear systems. SIAM J. Matrix Anal. Appl. 13, 796–825 (1992)
Page, L., Brin, S., Motwami, R., Winograd, T.: The PageRank citation ranking: bring order to the Web, Technical Report. Computer Science Department, Stanford University (1998)
Parlett, B.N.: A recurrence among the elements of functions of triangular matrices. Linear Algeb. Appl. 14, 117–121 (1976)
Saad, Y.: Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Math. Comput. 42, 567–588 (1984)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems, Algorithms and Architectures for Advanced Scientific Computing. Manchester University Press, Manchester (1992)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Serra-Capizzano, S.: Jordan canonical form of the Google matrix: A potential contribution to the PageRank computation. SIAM Matrix Anal. Appl. 27, 305–312 (2005)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007)
Sidi, A., Shapira, Y.: Upper bounds for convergence rates of acceleration methods with initial iterations. Numer. Algeb. 18, 113–132 (1998)
Sidi, A.: Vector extrapolation methods with applications to solution of large systems of equations and to pagerank computations. Comput. Math. Appl. 56, 1–24 (2008)
Sorensen, D.: Implicit application of polynomial filters in a $k$-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13, 357–385 (1992)
Stewart, G.W., Sun, J.: Matrix Perturbation Theory. Academic Press, Boston (1990)
Taylor, A., Higham, D.J.: CONTEST: A controllable test matrix toolbox for MATLAB. ACM Trans. Math. Soft. 35(26) (2009)
Wong, L.: Using biological networks in protein function prediction and gene expression analysis. Internet Math. 7, 274–298 (2011)
Wu, G., Wei, Y.: A Power-Arnoldi algorithm for computing PageRank. Numer. Linear Algeb. Appl. 14, 521–546 (2007)
Wu, G., Wei, Y.: Arnoldi versus GMRES for computing PageRank: a theoretical contribution to Google’s PageRank problem. ACM Trans Inf. Syst. 28(11) (2010)
Wu, G., Wei, Y.: An Arnoldi-Extrapolation algorithm for computing PageRank. J. Comput. Appl. Math. 234, 3196–3212 (2010)
Wu, K., Simon, H.: Thick-restart Lanczos method for large symmetric eigenvalue problems. SIAM J. Matrix Anal. Appl. 22, 602–616 (2000)
Wu, G., Zhang, Y., Wei, Y.: Krylov subspace algorithms for computing GeneRank for the analysis of microarray data mining. J. Comput. Biol. 17, 631–646 (2010)
Wu, G., Wang, Y., Jin, X.: A preconditioned and shifted GMRES algorithm for the PageRank Problem with multiple damping factors. SIAM J. Sci. Comput. 34, A2558–A2575 (2012)
Yu, Q., Miao, Z., Wu, G., Wei, Y.: Lumping algorithms for computing Google’s Page-Rank and its derivative, with attention to unreferenced nodes. Inform. Retriev. 15, 503–526 (2012)
Zavorin, I., O’Leary, D., Elman, H.: Complete stagnation of GMRES. Linear Algeb. Appl. 367, 165–183 (2003)
Zhang, H., Goel, A., Govindan, R., Mason, K., Van Roy, B.: Making Eigenvector-Based Reputation System Robust to Collusion. www.stanford.edu/group/reputation/WAW-adapt.ps (2004)
http://www.cise.ufl.edu/research/sparse/matrices/Gleich/index.html
http://www.mathstat.strath.ac.uk/research/groups/numerical_analysis/contest/toolbox
Acknowledgments
Gang Wu—This author is supported by the National Science Foundation of China, the Qing-Lan Project of Jiangsu Province, and the 333 Project of Jiangsu Province. Ying Zhang—The work of this author is partially supported by the National Science Foundation of China under grant 10901132. Yimin Wei—This author is supported by the National Natural Science Foundation of China under Grant 11271084, 973 Program Project (No. 2010CB327900), Shanghai Education Committee under Dawn Project 08SG01, and Shanghai Science and Technology Committee. We would like to express our sincere thanks to two anonymous reviewers for their invaluable comments and constructive suggestions that clarify and improve several sections of this paper. Meanwhile, we are grateful to Dr. David Gleich and Professor Tim Davis for providing us with the data files of the Web matrices. We also appreciate Professor Chen Greif for providing us with the MATLAB codes of the inner-outer power iterations. Finally, Gang Wu would like to thank School of Mathematics and Statistics of Jiangsu Normal University for the use of the facilities during the development of this project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, G., Zhang, Y. & Wei, Y. Accelerating the Arnoldi-Type Algorithm for the PageRank Problem and the ProteinRank Problem. J Sci Comput 57, 74–104 (2013). https://doi.org/10.1007/s10915-013-9696-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-013-9696-x