Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

  • Scott SallinenEmail author
  • Abdullah Gharaibeh
  • Matei Ripeanu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)


Large scale-free graphs are famously difficult to process efficiently: the skewed vertex degree distribution makes it difficult to obtain balanced partitioning. Our research instead aims to turn this into an advantage by partitioning the workload to match the strength of the individual computing elements in a Hybrid, GPU-accelerated architecture. As a proof of concept we focus on the direction-optimized breadth first search algorithm. We present the key graph partitioning, workload allocation, and communication strategies required for massive concurrency and good overall performance. We show that exploiting specialization enables gains as high as 2.4x in terms of time-to-solution and 2.0x in terms of energy efficiency by adding 2 GPUs to a 2 CPU-only baseline, for synthetic graphs with up to 16 Billion undirected edges as well as for large real-world graphs. We also show that, for a capped energy envelope, it is more efficient to add a GPU than an additional CPU. Finally, our performance would place us at the top of today’s [Green]Graph500 challenges for Scale29 graphs.


Breadth First Search Bulk Synchronous Parallel Heterogeneous Platform High Degree Vertex Bulk Synchronous Parallel Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the Institute for Computing, Information and Cognitive Systems (ICICS) at UBC.


  1. 1.
    Beamer, S., Patterson, D.A.: Searching for a parent instead of fighting over children: A fast breadth-first search implementation for graph500 (2011)Google Scholar
  2. 2.
    Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)CrossRefGoogle Scholar
  3. 3.
    Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in twitter: The million follower fallacy. (2010)Google Scholar
  4. 4.
    Cumming, B., Fourestey, G., Fuhrer, O., Gysi, T.: Application centric energy-efficiency study of distributed multi-core and hybrid cpu-gpu systems. In: SC (2014)Google Scholar
  5. 5.
    Gharaibeh, A., Reza, T., Santos-Neto, E., Sallinen, S., Ripeanu, M.: Efficient large-scale graph processing on hybrid cpu and gpu systems. arXiv:1312 (2014)
  6. 6.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI (2012)Google Scholar
  7. 7.
  8. 8.
  9. 9.
    Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp (2011)Google Scholar
  10. 10.
    Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)CrossRefGoogle Scholar
  11. 11.
    Kunegis, J.: The koblenz network collection. In: World Wide Web Companion (2013)Google Scholar
  12. 12.
    Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014.
  13. 13.
    Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning (2014). arXiv:1408.2041
  14. 14.
    Merrill, D., Garland, M., Grimshaw, A.: Scalable gpu graph traversal. In: ACM SIGPLAN Notices, vol. 47, pp. 117–128. ACM (2012)Google Scholar
  15. 15.
    Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: SOSP (2013)Google Scholar
  16. 16.
    Pearce, R., Gokhale, M., Amato, N.M.: Scaling techniques for massive scale-free graphs in distributed (external) memory. In: IPDPS (2013)Google Scholar
  17. 17.
    Pearce, R., Gokhale, M., Amato, N.M.: Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In: SC (2014)Google Scholar
  18. 18.
    Sallinen, S., Borges, D., Gharaibeh, A., Ripeanu, M.: Exploring hybrid hardware and data placement strategies for the graph 500 challenge (2014)Google Scholar
  19. 19.
    Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM SIGPLAN Notices, vol. 48, pp. 135–146. ACM (2013)Google Scholar
  20. 20.
    Wang, X.F., Chen, G.: Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst. Mag. 3(1), 6–20 (2003)CrossRefGoogle Scholar
  21. 21.
    Yasui, Y., Fujisawa, K., Sato, Y.: Fast and energy-efficient breadth-first search on a single NUMA system. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 365–381. Springer, Heidelberg (2014) Google Scholar
  22. 22.
    You, Y., Bader, D., Dehnavi, M.M.: Designing a heuristic cross-architecture combination for breadth-first search. In: ICPP (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Scott Sallinen
    • 1
    Email author
  • Abdullah Gharaibeh
    • 1
  • Matei Ripeanu
    • 1
  1. 1.University of British ColumbiaVancouverCanada

Personalised recommendations