Advertisement

Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned

  • Joyce Jiyoung WhangEmail author
  • Andrew Lenharth
  • Inderjit S. Dhillon
  • Keshav Pingali
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9233)

Abstract

Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.

Keywords

Scalable computing Graph analytics PageRank Multi-threaded programming Data-driven algorithm 

Notes

Acknowledgments

This research was supported by NSF grants CCF-1117055 and CCF-1320746 to ID, and by NSF grants CNS-1111766 and XPS-1337281 to KP.

References

  1. 1.
    Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS, pp. 475–486 (2006)Google Scholar
  2. 2.
    Bengio, Y., Delalleau, O., Le Roux, N.: Label Propagation and Quadratic Criterion. MIT Press, Cambridge (2006)Google Scholar
  3. 3.
    Berkhin, P.: A survey on PageRank computing. Internet Math. 2, 73–120 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Network. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  5. 5.
    Gleich, D.F., Zhukov, L., Berkhin, P.: Fast parallel PageRank: A linear system approach. Technical report YRL-2004-038, Yahoo! Research Labs (2004)Google Scholar
  6. 6.
    Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)Google Scholar
  7. 7.
    Lenharth, A., Nguyen, D., Pingali, K.: Priority queues are not good concurrent priority schedulers. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 209–221. Springer, Heidelberg (2015)Google Scholar
  8. 8.
    Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. In: VLDB Endowment, pp. 716–727 (2012)Google Scholar
  9. 9.
    McSherry, F.: A uniform approach to accelerated PageRank computation. In: WWW, pp. 575–582 (2005)Google Scholar
  10. 10.
    Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: SOSP, pp. 456–471 (2013)Google Scholar
  11. 11.
    Nguyen, D., Pingali, K.: Synthesizing concurrent schedulers for irregular algorithms. In: ASPLOS, pp. 333–344 (2011)Google Scholar
  12. 12.
    Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Mndez-Lojo, M., Prountzos, D., Sui, X.: The Tao of parallelism in algorithms. In: PLDI, pp. 12–25 (2011)Google Scholar
  13. 13.
    Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146 (2013)Google Scholar
  14. 14.
    Whang, J.J., Gleich, D., Dhillon, I.S.: Overlapping community detection using seed set expansion. In: CIKM, pp. 2099–2108 (2013)Google Scholar
  15. 15.
    Zhang, Y., Gao, Q., Gao, L., Wang, C.: Priter: a distributed framework for prioritizing iterative computations. IEEE Trans. Parallel Distrib. Syst. 24(9), 1884–1893 (2013)CrossRefGoogle Scholar
  16. 16.
    Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. Parallel Distrib. Syst. 25(8), 2091–2100 (2014)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Joyce Jiyoung Whang
    • 1
    Email author
  • Andrew Lenharth
    • 1
  • Inderjit S. Dhillon
    • 1
  • Keshav Pingali
    • 1
  1. 1.University of Texas at AustinAustinUSA

Personalised recommendations