GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

  • Yogesh Simmhan
  • Alok Kumbhare
  • Charith Wickramaarachchi
  • Soonil Nagarkar
  • Santosh Ravi
  • Cauligi Raghavendra
  • Viktor Prasanna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8632)

Abstract

Vertex centric models for large scale graph processing are gaining traction due to their simple distributed programming abstraction. However, pure vertex centric algorithms under-perform due to large communication overheads and slow iterative convergence. We introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters, offering the added natural flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model and empirically analyze them for several real world graphs, demonstrating orders of magnitude improvements, in some cases, compared to Apache Giraph’s vertex centric framework.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gregor, D., Lumsdaine, A.: The Parallel BGL: A Generic Library for Distributed Graph Computations. In: Parallel Object-Oriented Scientific Computing, POOSC (2005)Google Scholar
  2. 2.
    Ediger, D., Bader, D.: Investigating Graph Algorithms in the BSP Model on the Cray XMT. In: Workshop on Multithreaded Architectures and Applications, MTAAP (2013)Google Scholar
  3. 3.
    Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the gpu using cuda. In: IEEE High Performance Computing, HiPC (2007)Google Scholar
  4. 4.
    Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: Workshop on Mining and Learning with Graphs, pp. 78–85. ACM (2010)Google Scholar
  5. 5.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: ACM International Conference on the Management of Data (SIGMOD), pp. 135–146. ACM (2010)Google Scholar
  6. 6.
    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “Think Like a Vertex” to “Think Like a Graph”. Proc. of the VLDB (PVLDB) 7(3), 193–204 (2013)Google Scholar
  7. 7.
    Avery, C.: Giraph: Large-scale graph processing infrastructure on hadoop. In: Hadoop Summit (2011)Google Scholar
  8. 8.
    Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Processing Letters 17(01), 5–20 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). ACM (2011)Google Scholar
  10. 10.
    Harshvardhan, Fidel, A., Amato, N.M., Rauchwerger, L.: The STAPL Parallel Graph Library. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 46–60. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Papadimitriou, S., Sun, J.: DisCo: Distributed Co-clustering with Map-Reduce. In: IEEE International Conference on Data Mining, ICDM (2008)Google Scholar
  12. 12.
    Chen, R., Weng, X., He, B., Yang, M.: Large graph processing in the cloud. In: ACM International Conference on the Management of Data (SIGMOD), pp. 1123–1126. ACM (2010)Google Scholar
  13. 13.
    Gerbessiotis, A.V., Valiant, L.G.: Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing (JPDC) 22(2), 251–267 (1994)CrossRefGoogle Scholar
  14. 14.
    Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.S., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE (2010)Google Scholar
  15. 15.
    Redekopp, M., Simmhan, Y., Prasanna, V.: Optimizations and analysis of bsp graph processing models on public clouds. In: IEEE Intl. Parallel & Distr. Proc. Symp. IPDPS (2013)Google Scholar
  16. 16.
    Salihoglu, S., Widom, J.: GPS: A Graph Processing System. In: International Conference on Scientific and Statistical Database Management, SSDBM (2013)Google Scholar
  17. 17.
    Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning and data mining in the cloud. VLDB 5(8), 716–727 (2012)Google Scholar
  18. 18.
    Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: ACM International Conference on the Management of Data, SIGMOD (2013)Google Scholar
  19. 19.
    Karypis, G., Kumar, V.: Analysis of multilevel graph partitioning. In: IEEE/ACM Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, SC (1995)Google Scholar
  20. 20.
    Simmhan, Y., Kumbhare, A., Wickramachari, C.: Floe: A dynamic, continusous dataflow framework for elastic clouds. Technical report, USC (2013)Google Scholar
  21. 21.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: IEEE Intl. Conf. on Data Mining, ICDM (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yogesh Simmhan
    • 1
  • Alok Kumbhare
    • 2
  • Charith Wickramaarachchi
    • 2
  • Soonil Nagarkar
    • 2
  • Santosh Ravi
    • 2
  • Cauligi Raghavendra
    • 2
  • Viktor Prasanna
    • 2
  1. 1.Indian Institute of ScienceBangaloreIndia
  2. 2.University of Southern CaliforniaLos AngelesUSA

Personalised recommendations