Skip to main content
Log in

Improving vertex-frontier based GPU breadth-first search

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Breadth-first search (BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2–3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20c GPU, reaching a peak traversal rate of 11.2×109 edges/s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ZERBINO D R, VELVET B E. Algorithms for de Novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821–829.

    Article  Google Scholar 

  2. BAKOS J D. High-performance heterogeneous computing with the convey HC-1 [J]. Computing in Science & Engineering, 2010, 12(6): 80–87.

    Article  Google Scholar 

  3. MALEWICZ G, AUSTERN M H, BIK A J C, DEHNERT J C, HORN I, LEISER N, PREGEL C G. A system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 2010: 135–146.

    Chapter  Google Scholar 

  4. KWAK H, LEE C, PARK H, MOON S. What is twitter, a social network or a news media [C]// Proceedings of the 19th International Conference on World Wide Web. USA: ACM Press, 2010: 591–600.

    Chapter  Google Scholar 

  5. STRATTON J A, RODRIGUES C, SUNG I J, OBEID N, CHANG L W, ANSSARI N, LIU G D, HWU W M W. Parboil: A revised benchmark suite for scientific and commercial throughput computing [R]. Illinois, Urbana: Center for Reliable and High-Performance Computing, 2012.

    Google Scholar 

  6. Graph 500 Steering Committee. The Graph 500 List [EB/OL]. [2013-08-15]. http://www.graph500.org/.

    Google Scholar 

  7. AGARWAL V, PETRINI F, PASETTO D, BADER D A. Scalable graph exploration on multicore processors [C]// Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. USA: IEEE Computer Society, 2010: 1–11.

    Chapter  Google Scholar 

  8. GAO T, LU Y, ZHANG B, SUO G. Using MIC to accelerate a typical data-intensive application: The breadth-first search [C]// Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. USA: IEEE Computer Society, 2013: 1117–1125.

    Google Scholar 

  9. HONG S, KIM S K, OGUNTEBI T, OLUKOTUN K. Accelerating CUDA graph algorithms at maximum warp [C]// Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2011: 267–276.

    Google Scholar 

  10. ZOU D, DOU Y, GUO S, NI S. High performance sparse matrix-vector multiplication on FPGA [J]. IEICE Electronics Express, 2013, 10(17): 20130529.

    Article  Google Scholar 

  11. YANG Can-qu, WU Qiang, HU Hui-li, SHI Zhi-cai, CHEN Juan, TANG Tao. Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems [J]. Journal of Central South University, 2013, 20(6): 1527–1535.

    Article  Google Scholar 

  12. TICKNER J. Monte Carlo simulation of X-ray and gamma-ray photon transport on a graphics-processing unit [J]. Computer Physics Communications, 2010, 181(11): 1821–1832.

    Article  MATH  Google Scholar 

  13. HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [M]. Berlin: Springer, 2007: 197–208.

    Google Scholar 

  14. LUO L, WONG M, HWU W. An effective GPU implementation of breadth-first search [C]// Proceedings of the 47th Design Automation Conference. USA: ACM Press, 2010: 52–55.

    Google Scholar 

  15. MERRILL D, GARLAND M, GRIMSHAW A. Scalable GPU graph traversal [C]// Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2012: 117–128.

    Google Scholar 

  16. BADER D A, MADDURI K. SNAP, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks [C]// Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. USA: IEEE Computer Society, 2008: 1–12.

    Google Scholar 

  17. NVIDIA C. NVIDIA’s next generation CUDA compute architecture: kepler GK110 [EB/OL]. [2013-08-15]. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.

    Google Scholar 

  18. LEISERSON C E, RIVEST R L, STEIN C. Introduction to algorithms [M]. Massachusetts: The MIT Press, 2001: 534–535.

    Google Scholar 

  19. NVIDIA C. Compute unified device architecture programming guide [M]. Santa Clara: NVIDIA Corporation, 2010: 3–5.

    Google Scholar 

  20. BLELLOCH G E. Prefix sums and their applications [R]. Pittsburgh: Carnegie Mellon University, 1990.

    Google Scholar 

  21. BEAMER S, ASANOVIC K, PATTERSON D. Direction-optimizing breadth-first search [C]// High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. USA: IEEE Computer Society, 2012: 1–10.

    Chapter  Google Scholar 

  22. BADER D A, MADDURI K. A suite of synthetic random graph generators [EB/OL]. [2013-08-15]. http://www.cse.psu.iedu/~madduri/software/GTgraph/.

    Google Scholar 

  23. BADER D A, MEYERHENKE H, SANDERS P, WAGNER D. 10th DIMACS implementation challenge [EB/OL]. [2013-06-06]. http://www.cc.gatech.edu/dimacs10/index.shtml.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Lu  (卢凯).

Additional information

Foundation item: Projects(61272142, 61103082, 61003075, 61170261, 61103193) supported by the National Natural Science Foundation of China; Project supported by the Program for New Century Excellent Talents in University of China; Projects(2012AA01A301, 2012AA010901) supported by the National High Technology Research and Development Program of China

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, B., Lu, K., Gao, Yh. et al. Improving vertex-frontier based GPU breadth-first search. J. Cent. South Univ. 21, 3828–3836 (2014). https://doi.org/10.1007/s11771-014-2368-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-014-2368-7

Key words

Navigation