A BSP model graph processing system on many cores

Lai, Siyan; Lai, Guangda; Lu, Fangzhou; Shen, Guojun; Jin, Jing; Lin, Xiaola

doi:10.1007/s10586-017-0829-0

A BSP model graph processing system on many cores

Published: 24 March 2017

Volume 20, pages 1359–1377, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Siyan Lai¹,
Guangda Lai¹,
Fangzhou Lu¹,
Guojun Shen¹,
Jing Jin¹ &
…
Xiaola Lin¹

378 Accesses
2 Citations
Explore all metrics

Abstract

Large-scale graph processing plays an increasingly important role for many data-related applications. Recently GPU has been adopted to accelerate various graph processing algorithms. However, since the architecture of GPU is very different from traditional computing model, the learning threshold for developing GPU-based applications is high. In this paper, we propose a GPU-based parallel graph processing system named GPregel to tackle this challenge. GPregel is a BSP model in graph processing such as Pregel from Google. It harnesses a lightweight compiler to hide the underlying complexity of the parallel processing details and simplifies programming, so that it greatly reduces the difficulty in utilizing the GPU to solve graph computing problems. Moreover, GPregel develops several optimizations for enhancing the performance, including (1) a special storage model for BSP model running on GPU, which overcomes the execution divergence and irregular memory access by coarse-grained designs; (2) a warp-level optimal strategy Parallelized-Messages-Sending and a thread-level optimal strategy Threads-Merge-Executing to accelerate the computations of high degree vertexes and low degree vertexes respectively; (3) messages copy mechanism optimization that utilizes a shared array and a rolling array to speed up the messages copy. Experiments demonstrate that GPregel can achieve high performance with little work for developers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A communication-reduced and computation-balanced framework for fast graph computation

Article 02 August 2018

Yongli Cheng, Fang Wang, … Jun Zhou

Accelerate the Execution of Graph Processing Using GPU

Load-Balanced Breadth-First Search on GPUs

References

Kapre, N., Mehta, N., Rizzo, D., Eslick, I., Rubin, R., Uribe, T.E., DeHon, A.: GraphStep: A system architecture for sparse-graph algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), pp. 143–151 (2006)
Bader, D.A., Madduri, K.: GTgraph: A Synthetic Graph Generator Suite, Atlanta (2006)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)
Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)
Article MathSciNet Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Harish, P., Narayanan, P.J.: Accelerating Large Graph Algorithms on the GPU Using CUDA. Lecture Notes in Computer Science, pp. 197–208. Springer, Berlin (2007)
Google Scholar
He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552, ACM (2010)
Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55 (2008)
Vineet, V., Narayanan, P.J.: CUDA cuts: fast graph cuts on the GPU. In: Proceedings of the IEEE Computer Society Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)
Protocol Buffers: Google’s data interchange format. https://code.google.com/p/GPregel/
Apache thrift: http://thrift.apache.org/
Nvidia.: CUDA C Programming Guide version 8.0. (2016)
Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA 26. In: Kirk, D., Hwu, W. (eds.) Programming Massively Parallel Processors, 2nd edn, pp. 339–358. Elsevier, Amsterdam (2013)
Chapter Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. IPDPS 2009, pp. 1–10 (2009)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999)
Mtibaa, A., May, M., Diot, C., Ammar, M.: PeopleRank: social opportunistic forwarding. IEEE Int. Conf. Comput. Commun. 54(1), 1–5 (2010)
MATH Google Scholar
Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338, p. 2012 (2012)
Dale, J.: CUDA function overheads. http://visionexperts.blogspot.com/2009/07/cuda-function-overheads.html
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining SDM’ 04 (2004)
Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97(21), 11149–11152 (2000)
Article Google Scholar
Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960)
MathSciNet MATH Google Scholar
Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/index.html
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: GPU Gems 3 (2007)
Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC) (2005)
Berry, J., Mackey, G.: MultiThreaded graph library (MTGL). https://software.sandia.gov/trac/mtgl
Apache Incubator Giraph: http://incubator.apache.org/giraph/
GoldenOrb: http://www.raveldata.com/goldenorb/
Salihoglu, S., Widom J.: GPS: a graph processing system *. Stanford InfoLab (2013)
Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big (ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)
Article Google Scholar
Phoebus: https://github.com/xslogic/phoebus
Hong, S., Kyun, S., Tayo, K., Olukotun, O.K.: Accelerating CUDA graph algorithms at maximum warp. In: PPoPP, vol. 46, no. 8, pp. 267–276 (2011)
Luo, L., Wong, M., Hwu, W.: An effective GPU implementation of breadth-first search. In: Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC), pp. 52–55 (2010)
Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), vol. 47, no. 8, pp. 117–128 (2011)
Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 68 (2015)
Liu, H., Huang, H.H., Hu, Y.: iBFS: concurrent breadth-first search on GPUs. In: Proceedings of the 2016 International Conference on Management of Data, ACM, pp. 403–416 (2016)
Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: Laperm: Locality aware scheduler for dynamic parallelism on gpus. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 583–595 (2016)
Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Sreepathi Pai, M.I., Das, C.R.: Controlled Kernel Launch for dynamic parallelism in GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, ACM (2017)
El Hajj, I., Gómez-Luna, J., Li, C., Chang, L.W., Milojicic, D., Hwu, W.M.: KLAP: kernel launch aggregation and promotion for optimizing dynamic parallelism. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016)
Wang, Y., Pan, Y., Davidson, A., Wu, Y., Yang, C., Wang, L., Owens, J. D.: Gunrock: GPU graph analytics. arXiv preprint arXiv:1701.01170 (2017)

Download references

Acknowledgements

This paper is supported in part by the National Natural Science Foundation of China under Grant No. 61472454.

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Siyan Lai, Guangda Lai, Fangzhou Lu, Guojun Shen, Jing Jin & Xiaola Lin

Authors

Siyan Lai
View author publications
You can also search for this author in PubMed Google Scholar
Guangda Lai
View author publications
You can also search for this author in PubMed Google Scholar
Fangzhou Lu
View author publications
You can also search for this author in PubMed Google Scholar
Guojun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jing Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaola Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyan Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, S., Lai, G., Lu, F. et al. A BSP model graph processing system on many cores. Cluster Comput 20, 1359–1377 (2017). https://doi.org/10.1007/s10586-017-0829-0

Download citation

Received: 06 January 2017
Revised: 19 February 2017
Accepted: 14 March 2017
Published: 24 March 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10586-017-0829-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A BSP model graph processing system on many cores

Abstract

Access this article

Similar content being viewed by others

A communication-reduced and computation-balanced framework for fast graph computation

Accelerate the Execution of Graph Processing Using GPU

Load-Balanced Breadth-First Search on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A BSP model graph processing system on many cores

Abstract

Access this article

Similar content being viewed by others

A communication-reduced and computation-balanced framework for fast graph computation

Accelerate the Execution of Graph Processing Using GPU

Load-Balanced Breadth-First Search on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation