PageRank Computation Using a Multiple Implicitly Restarted Arnoldi Method for Modeling Epidemic Spread

Abstract

A parallel implementation based on implicitly restarted Arnoldi method (MIRAM) is proposed for calculating dominant eigenpair of stochastic matrices derived from very large real networks. Their high damping factor makes many existing algorithms less efficient, while MIRAM could be promising. Also, we apply this method in an epidemic application. We describe in this paper a stochastic model based on PageRank to simulate the epidemic spread, where a PageRank-like infection vector is calculated by MIRAM to help establish efficient vaccination strategy. MIRAM is implemented within the framework of Trilinos, targeting big data and sparse matrices representing scale-free networks, also known as power law networks. Hypergraph partitioning approach is employed to minimize the communication overhead. The algorithm is tested on a nation wide cluster of clusters Grid5000. Experiments on very large networks such as twitter and yahoo with over 1 billion nodes are conducted. With our parallel implementation, a speedup of \(27\times \) is met compared to the sequential solver.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. 1.

    Liu, Z., Emad, N., Amor, S.B., Lamure, M.: Towards modeling of epidemic spread: eigenvalue computation. Preprint for publication. URL:http://hal.archives-ouvertes.fr/hal-01069010

  2. 2.

    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank citation ranking: bringing order to the Web. Technical Report 1999–66, Stanford InfoLab (1999)

  3. 3.

    Bryan, K., Leise, T.: The \({\$}\)25,000,000,000 eigenvector: The linear Algebra behind Google. SIAM Rev. 48(3), 569–581 (2006). doi:10.1137/050623280. ISSN:0036-1445

  4. 4.

    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: the Science of Search Engine Rankings. Princeton University Press, Princeton, NJ, USA. ISBN:0691122024 (2006)

  5. 5.

    Berkhin, P.: A survey on pagerank computing. Internet Math. 2, 73–120 (2005)

    MATH  MathSciNet  Article  Google Scholar 

  6. 6.

    Golub, G.H., Greif, C.: An Arnoldi-type algorithm for computing PageRank. BIT Numer. Math. 46(4), 759–771 (2006)

    MATH  MathSciNet  Article  Google Scholar 

  7. 7.

    Wu, G., Wei, Y.: An Arnoldi-extrapolation algorithm for computing PageRank. J. Comput. App. Math. 234(11), 3196–3212 (2010) (Numerical linear algebra, internet and large scale applications). ISSN:0377-0427. doi:10.1016/j.cam.2010.02.009. URL:http://www.sciencedirect.com/science/article/pii/S0377042710000804

  8. 8.

    Gleich, D., Zhukov, L., Berkhin, P.: Fast parallel PageRank: a linear system approach. Technical Report L-2004-038, Yahoo! Research Labs (2004)

  9. 9.

    Wu, G., Wei, Y.: Arnoldi versus GMRES for computing PageRank: a theoretical contribution to Google’s PageRank problem. ACM Trans. Inf. Syst. 28(3), 11:1–11:28 (2010). ISSN:1046–8188. doi:10.1145/1777432.1777434

  10. 10.

    Wu, G., Wang, Y.-C., Jin, X.-Q.: A preconditioned and shifted GMRES algorithm for the PageRank problem with multiple damping factors. SIAM J. Sci. Comput. 34(5) (2012)

  11. 11.

    Haveliwala, T.H., Kamvar, S.D., Kamvar, A.D.: The second eigenvalue of the Google matrix. Technical Report 2003-20, Stanford InfoLab (2003)

  12. 12.

    Liu, Z., Emad, N., Amor, S.B., Lamure, M.: A parallel IRAM algorithm to compute PageRank for modeling epidemic spread. Symp. Comput. Architect. High Perform. Comput. 0, 120–127 (2013). doi:10.1109/SBAC-PAD.2013.2

    Google Scholar 

  13. 13.

    Fazeli, S.A.S., Emad, N., Liu, Z.: A key to choose subspace size in implicitly restarted Arnoldi method. J. Numer. Algorithm (2014). http://hal.archives-ouvertes.fr/hal-01070577

  14. 14.

    Heroux, M., Bartlett, R., Hoekstra, V.H.R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., Williams, A.: An overview of Trilinos. Technical Report SAND2003-2927, Sandia National Laboratories (2003)

  15. 15.

    Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst., 10(7), 673–693 (1999). doi:10.1109/71.780863. ISSN 1045-9219

  16. 16.

    Marathe, M., Vullikanti, A.K.S.: Computational epidemiology. Commun. ACM 56(7), 88–96 (2013). ISSN:0001-0782. doi:10.1145/2483852.2483871

  17. 17.

    Bisset, K., Chen, J., Feng, X., Anil Kumar, V.S., Marathe, M.: EpiFast: A fast algorithm for large scale realistic epidemic simulations on distributed memory systems. In: Proceedings of 23rd ACM International Conference on Supercomputing (ICS’09), pp. 430–439 (2009)

  18. 18.

    Bisset, K.: Urgent computing for interaction based socio-technical simulations. Invited presentation to Argonne National Laboratory, April

  19. 19.

    Chao, D.L., Halloran, M.E., Obenchain, V.J., Longini, I.M., Flu Jr, T.E.: A publicly available stochastic influenza epidemic simulation model. PLoS Comput. Biol. 6(1), e1000656, 01 (2010). doi:10.1371/journal.pcbi.1000656

    Article  Google Scholar 

  20. 20.

    Wang, Y., Chakrabarti, D., Wang, C., Faloutsos, C.: Epidemic spreading in real networks: an eigenvalue viewpoint. In: SRDS, pp. 25–34 (2003)

  21. 21.

    Miller, J.C., Hyman, J.M.: Effective vaccination strategies for realistic social networks. Phys. A 386(2), 780–785 (2007)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Fan, R.K.: Chung, Paul Horn, and Alexander Tsiatas. Distributing Antidote Using PageRank Vectors. Internet Math. 6(2), 237–254 (2009)

    MATH  MathSciNet  Article  Google Scholar 

  23. 23.

    Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Lee, C.P., Golub, G.H., Zenios, S.A.: A fast two-stage algorithm for computing PageRank and its extensions. Technical report, Stanford University. URL:http://www-sccm.stanford.edu/pub/sccm/sccm03-15_2.pdf (2004)

  25. 25.

    Ipsen, I.C.F., Selee, T.M.: PageRank computation, with special attention to dangling nodes. SIAM J. Matrix Anal. Appl., 29(4), 1281–1296 (2007). doi:10.1137/060664331. ISSN:0895-4798

  26. 26.

    Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the web frontier. In: Proceedings of the 13th International Conference on World Wide Web, WWW ’04, pp. 309–318, New York, NY, USA. ACM (2004). ISBN:1-58113-844-X. doi:10.1145/988672.988714

  27. 27.

    Sorensen, D.C.: Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13(1), 357–385 (1992). ISSN:0895–4798. doi:10.1137/0613025

  28. 28.

    Sorensen, D.C.: Implicitly restarted Arnoldi/Lanczos methods for large scale eigenvalue calculations. Technical report (1996)

  29. 29.

    Sorensen, D.C.: Numerical methods for large eigenvalue problems. Acta Numer. 11, 519–584 (2002). doi:10.1017/S0962492902000089

  30. 30.

    Watkins, D.S.: The QR algorithm revisited. SIAM Rev. 50(1), 133–145 (2008). ISSN:0036-1445. doi:10.1137/060659454

  31. 31.

    Bennani, M., Braconnier, T.: Stopping Criteria for Eigensolvers. Technical Report TR/PA/94/22, CERFACS, Toulouse, France (1994)

  32. 32.

    Stathopoulos, A., Saad, Y.: Dynamic thick restarting of the Davidson, and the implicitly restarted Arnoldi methods. SIAM J. Sci. Comput. 19, 227–245 (1996)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Hendrickson, B., Leland, R.: The chaco user’s guide: Version 2.0. Technical Report SAND94-2692, Sandia National Lab (1994)

  34. 34.

    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998). ISSN:1064–8275. doi:10.1137/S1064827595287997

  35. 35.

    Pellegrini, F.: Scotch and libScotch 5.1 user’s guide. URL http://hal.archives-ouvertes.fr/hal-00410327. 127 pages User’s manual (2008)

  36. 36.

    Bradley, J.T., de Jager, D., Knottenbelt, W.J., Trifunovic, A.: Hypergraph partitioning for faster parallel PageRank computation. In: EPEW’05, Proceedings of the 2nd European Performance Evaluation Workshop, volume 3670 of Lecture Notes in Computer Science, pp. 155–171, September 2005 (2005). URL http://pubs.doc.ic.ac.uk/hypergraph-fast-pagerank/

  37. 37.

    Boman, E.G., Çatalyürek, Ü.V., Chevalier, C., Devine, K.D.: The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: partitioning, ordering and coloring. Sci. Progr. 20(2), 129–150 (2012)

    Google Scholar 

  38. 38.

    Isorropia: Partitioning, Coloring, and Ordering. http://trilinos.org/docs/r11.8/packages/isorropia/doc/html/index.html. Trilinos Release 11.8

  39. 39.

    Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, E.-G., Touche, I.: Grid’5000: A large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006). ISSN:1094-3420. doi:10.1177/1094342006070078

  40. 40.

    BA Data Sets: http://topology.eecs.umich.edu/data.html

  41. 41.

    SNAP Data Sets.: http://snap.stanford.edu/data/index.html

  42. 42.

    Kwak, Haewoon., Lee, Changhyun., Park, Hosung., Moon, Sue.: What is Twitter, a social network or a news media? In: WWW ’10: Proceedings of the 19th international conference on World wide web, pp. 591–600, New York, NY, USA. ACM (2010). ISBN:978-1-60558-799-8. doi:10.1145/1772690.1772751

  43. 43.

    Romualdo, P.-S., Alessandro, V.: Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200–3203 (2001). doi:10.1103/PhysRevLett.86.3200

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Fabrcio Benevenuto from Federal University of Ouro Preto for the \(twitter\) network, Kim Capps from Yahoo! Labs for his help to get access to Alta Vista web network.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zifan Liu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Emad, N., Amor, S.B. et al. PageRank Computation Using a Multiple Implicitly Restarted Arnoldi Method for Modeling Epidemic Spread. Int J Parallel Prog 43, 1028–1053 (2015). https://doi.org/10.1007/s10766-014-0344-3

Download citation

Keywords

  • Epidemic
  • PageRank
  • Scale free networks
  • Power law
  • IRAM
  • Big data
  • Hypergraph partitioning