Abstract
Breadth-first search (BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2–3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20c GPU, reaching a peak traversal rate of 11.2×109 edges/s.
Similar content being viewed by others
References
ZERBINO D R, VELVET B E. Algorithms for de Novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821–829.
BAKOS J D. High-performance heterogeneous computing with the convey HC-1 [J]. Computing in Science & Engineering, 2010, 12(6): 80–87.
MALEWICZ G, AUSTERN M H, BIK A J C, DEHNERT J C, HORN I, LEISER N, PREGEL C G. A system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 2010: 135–146.
KWAK H, LEE C, PARK H, MOON S. What is twitter, a social network or a news media [C]// Proceedings of the 19th International Conference on World Wide Web. USA: ACM Press, 2010: 591–600.
STRATTON J A, RODRIGUES C, SUNG I J, OBEID N, CHANG L W, ANSSARI N, LIU G D, HWU W M W. Parboil: A revised benchmark suite for scientific and commercial throughput computing [R]. Illinois, Urbana: Center for Reliable and High-Performance Computing, 2012.
Graph 500 Steering Committee. The Graph 500 List [EB/OL]. [2013-08-15]. http://www.graph500.org/.
AGARWAL V, PETRINI F, PASETTO D, BADER D A. Scalable graph exploration on multicore processors [C]// Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. USA: IEEE Computer Society, 2010: 1–11.
GAO T, LU Y, ZHANG B, SUO G. Using MIC to accelerate a typical data-intensive application: The breadth-first search [C]// Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. USA: IEEE Computer Society, 2013: 1117–1125.
HONG S, KIM S K, OGUNTEBI T, OLUKOTUN K. Accelerating CUDA graph algorithms at maximum warp [C]// Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2011: 267–276.
ZOU D, DOU Y, GUO S, NI S. High performance sparse matrix-vector multiplication on FPGA [J]. IEICE Electronics Express, 2013, 10(17): 20130529.
YANG Can-qu, WU Qiang, HU Hui-li, SHI Zhi-cai, CHEN Juan, TANG Tao. Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems [J]. Journal of Central South University, 2013, 20(6): 1527–1535.
TICKNER J. Monte Carlo simulation of X-ray and gamma-ray photon transport on a graphics-processing unit [J]. Computer Physics Communications, 2010, 181(11): 1821–1832.
HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [M]. Berlin: Springer, 2007: 197–208.
LUO L, WONG M, HWU W. An effective GPU implementation of breadth-first search [C]// Proceedings of the 47th Design Automation Conference. USA: ACM Press, 2010: 52–55.
MERRILL D, GARLAND M, GRIMSHAW A. Scalable GPU graph traversal [C]// Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2012: 117–128.
BADER D A, MADDURI K. SNAP, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks [C]// Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. USA: IEEE Computer Society, 2008: 1–12.
NVIDIA C. NVIDIA’s next generation CUDA compute architecture: kepler GK110 [EB/OL]. [2013-08-15]. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
LEISERSON C E, RIVEST R L, STEIN C. Introduction to algorithms [M]. Massachusetts: The MIT Press, 2001: 534–535.
NVIDIA C. Compute unified device architecture programming guide [M]. Santa Clara: NVIDIA Corporation, 2010: 3–5.
BLELLOCH G E. Prefix sums and their applications [R]. Pittsburgh: Carnegie Mellon University, 1990.
BEAMER S, ASANOVIC K, PATTERSON D. Direction-optimizing breadth-first search [C]// High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. USA: IEEE Computer Society, 2012: 1–10.
BADER D A, MADDURI K. A suite of synthetic random graph generators [EB/OL]. [2013-08-15]. http://www.cse.psu.iedu/~madduri/software/GTgraph/.
BADER D A, MEYERHENKE H, SANDERS P, WAGNER D. 10th DIMACS implementation challenge [EB/OL]. [2013-06-06]. http://www.cc.gatech.edu/dimacs10/index.shtml.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Projects(61272142, 61103082, 61003075, 61170261, 61103193) supported by the National Natural Science Foundation of China; Project supported by the Program for New Century Excellent Talents in University of China; Projects(2012AA01A301, 2012AA010901) supported by the National High Technology Research and Development Program of China
Rights and permissions
About this article
Cite this article
Yang, B., Lu, K., Gao, Yh. et al. Improving vertex-frontier based GPU breadth-first search. J. Cent. South Univ. 21, 3828–3836 (2014). https://doi.org/10.1007/s11771-014-2368-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-014-2368-7