Improving vertex-frontier based GPU breadth-first search

Yang, Bo; Lu, Kai; Gao, Ying-hui; Xu, Kai; Wang, Xiao-ping; Cheng, Zhi-quan

doi:10.1007/s11771-014-2368-7

Improving vertex-frontier based GPU breadth-first search

Published: 17 October 2014

Volume 21, pages 3828–3836, (2014)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Bo Yang (杨博)^1,2,
Kai Lu (卢凯)^1,2,
Ying-hui Gao (高颖慧)³,
Kai Xu (徐凯)^1,2,
Xiao-ping Wang (王小平)^1,2 &
…
Zhi-quan Cheng (程志权)⁴

79 Accesses
Explore all metrics

Abstract

Breadth-first search (BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×10⁹ traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2–3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20c GPU, reaching a peak traversal rate of 11.2×10⁹ edges/s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU

Efficient Hybrid Breadth-First Search on GPUs

Load-Balanced Breadth-First Search on GPUs

References

ZERBINO D R, VELVET B E. Algorithms for de Novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821–829.
Article Google Scholar
BAKOS J D. High-performance heterogeneous computing with the convey HC-1 [J]. Computing in Science & Engineering, 2010, 12(6): 80–87.
Article Google Scholar
MALEWICZ G, AUSTERN M H, BIK A J C, DEHNERT J C, HORN I, LEISER N, PREGEL C G. A system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 2010: 135–146.
Chapter Google Scholar
KWAK H, LEE C, PARK H, MOON S. What is twitter, a social network or a news media [C]// Proceedings of the 19th International Conference on World Wide Web. USA: ACM Press, 2010: 591–600.
Chapter Google Scholar
STRATTON J A, RODRIGUES C, SUNG I J, OBEID N, CHANG L W, ANSSARI N, LIU G D, HWU W M W. Parboil: A revised benchmark suite for scientific and commercial throughput computing [R]. Illinois, Urbana: Center for Reliable and High-Performance Computing, 2012.
Google Scholar
Graph 500 Steering Committee. The Graph 500 List [EB/OL]. [2013-08-15]. http://www.graph500.org/.
Google Scholar
AGARWAL V, PETRINI F, PASETTO D, BADER D A. Scalable graph exploration on multicore processors [C]// Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. USA: IEEE Computer Society, 2010: 1–11.
Chapter Google Scholar
GAO T, LU Y, ZHANG B, SUO G. Using MIC to accelerate a typical data-intensive application: The breadth-first search [C]// Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. USA: IEEE Computer Society, 2013: 1117–1125.
Google Scholar
HONG S, KIM S K, OGUNTEBI T, OLUKOTUN K. Accelerating CUDA graph algorithms at maximum warp [C]// Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2011: 267–276.
Google Scholar
ZOU D, DOU Y, GUO S, NI S. High performance sparse matrix-vector multiplication on FPGA [J]. IEICE Electronics Express, 2013, 10(17): 20130529.
Article Google Scholar
YANG Can-qu, WU Qiang, HU Hui-li, SHI Zhi-cai, CHEN Juan, TANG Tao. Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems [J]. Journal of Central South University, 2013, 20(6): 1527–1535.
Article Google Scholar
TICKNER J. Monte Carlo simulation of X-ray and gamma-ray photon transport on a graphics-processing unit [J]. Computer Physics Communications, 2010, 181(11): 1821–1832.
Article MATH Google Scholar
HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [M]. Berlin: Springer, 2007: 197–208.
Google Scholar
LUO L, WONG M, HWU W. An effective GPU implementation of breadth-first search [C]// Proceedings of the 47th Design Automation Conference. USA: ACM Press, 2010: 52–55.
Google Scholar
MERRILL D, GARLAND M, GRIMSHAW A. Scalable GPU graph traversal [C]// Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming. USA: ACM Press, 2012: 117–128.
Google Scholar
BADER D A, MADDURI K. SNAP, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks [C]// Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. USA: IEEE Computer Society, 2008: 1–12.
Google Scholar
NVIDIA C. NVIDIA’s next generation CUDA compute architecture: kepler GK110 [EB/OL]. [2013-08-15]. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
Google Scholar
LEISERSON C E, RIVEST R L, STEIN C. Introduction to algorithms [M]. Massachusetts: The MIT Press, 2001: 534–535.
Google Scholar
NVIDIA C. Compute unified device architecture programming guide [M]. Santa Clara: NVIDIA Corporation, 2010: 3–5.
Google Scholar
BLELLOCH G E. Prefix sums and their applications [R]. Pittsburgh: Carnegie Mellon University, 1990.
Google Scholar
BEAMER S, ASANOVIC K, PATTERSON D. Direction-optimizing breadth-first search [C]// High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. USA: IEEE Computer Society, 2012: 1–10.
Chapter Google Scholar
BADER D A, MADDURI K. A suite of synthetic random graph generators [EB/OL]. [2013-08-15]. http://www.cse.psu.iedu/~madduri/software/GTgraph/.
Google Scholar
BADER D A, MEYERHENKE H, SANDERS P, WAGNER D. 10th DIMACS implementation challenge [EB/OL]. [2013-06-06]. http://www.cc.gatech.edu/dimacs10/index.shtml.
Google Scholar

Download references

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, 410073, China
Bo Yang (杨博), Kai Lu (卢凯), Kai Xu (徐凯) & Xiao-ping Wang (王小平)
College of Computer, National University of Defense Technology, Changsha, 410073, China
Bo Yang (杨博), Kai Lu (卢凯), Kai Xu (徐凯) & Xiao-ping Wang (王小平)
Department of Electronic Science and Engineering, National University of Defense Technology, Changsha, 410073, China
Ying-hui Gao (高颖慧)
Avatar Science Company, Guangzhou, 510001, China
Zhi-quan Cheng (程志权)

Authors

Bo Yang (杨博)
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lu (卢凯)
View author publications
You can also search for this author in PubMed Google Scholar
Ying-hui Gao (高颖慧)
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu (徐凯)
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-ping Wang (王小平)
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-quan Cheng (程志权)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Lu (卢凯).

Additional information

Foundation item: Projects(61272142, 61103082, 61003075, 61170261, 61103193) supported by the National Natural Science Foundation of China; Project supported by the Program for New Century Excellent Talents in University of China; Projects(2012AA01A301, 2012AA010901) supported by the National High Technology Research and Development Program of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Lu, K., Gao, Yh. et al. Improving vertex-frontier based GPU breadth-first search. J. Cent. South Univ. 21, 3828–3836 (2014). https://doi.org/10.1007/s11771-014-2368-7

Download citation

Received: 30 December 2013
Accepted: 10 June 2014
Published: 17 October 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11771-014-2368-7

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving vertex-frontier based GPU breadth-first search

Abstract

Access this article

Similar content being viewed by others

G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU

Efficient Hybrid Breadth-First Search on GPUs

Load-Balanced Breadth-First Search on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Improving vertex-frontier based GPU breadth-first search

Abstract

Access this article

Similar content being viewed by others

G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU

Efficient Hybrid Breadth-First Search on GPUs

Load-Balanced Breadth-First Search on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation