Skip to main content
Log in

A BSP model graph processing system on many cores

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Large-scale graph processing plays an increasingly important role for many data-related applications. Recently GPU has been adopted to accelerate various graph processing algorithms. However, since the architecture of GPU is very different from traditional computing model, the learning threshold for developing GPU-based applications is high. In this paper, we propose a GPU-based parallel graph processing system named GPregel to tackle this challenge. GPregel is a BSP model in graph processing such as Pregel from Google. It harnesses a lightweight compiler to hide the underlying complexity of the parallel processing details and simplifies programming, so that it greatly reduces the difficulty in utilizing the GPU to solve graph computing problems. Moreover, GPregel develops several optimizations for enhancing the performance, including (1) a special storage model for BSP model running on GPU, which overcomes the execution divergence and irregular memory access by coarse-grained designs; (2) a warp-level optimal strategy Parallelized-Messages-Sending and a thread-level optimal strategy Threads-Merge-Executing to accelerate the computations of high degree vertexes and low degree vertexes respectively; (3) messages copy mechanism optimization that utilizes a shared array and a rolling array to speed up the messages copy. Experiments demonstrate that GPregel can achieve high performance with little work for developers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  1. Kapre, N., Mehta, N., Rizzo, D., Eslick, I., Rubin, R., Uribe, T.E., DeHon, A.: GraphStep: A system architecture for sparse-graph algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), pp. 143–151 (2006)

  2. Bader, D.A., Madduri, K.: GTgraph: A Synthetic Graph Generator Suite, Atlanta (2006)

  3. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)

  4. Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)

    Article  MathSciNet  Google Scholar 

  5. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  6. Harish, P., Narayanan, P.J.: Accelerating Large Graph Algorithms on the GPU Using CUDA. Lecture Notes in Computer Science, pp. 197–208. Springer, Berlin (2007)

    Google Scholar 

  7. He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552, ACM (2010)

  8. Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55 (2008)

  9. Vineet, V., Narayanan, P.J.: CUDA cuts: fast graph cuts on the GPU. In: Proceedings of the IEEE Computer Society Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)

  10. Protocol Buffers: Google’s data interchange format. https://code.google.com/p/GPregel/

  11. Apache thrift: http://thrift.apache.org/

  12. Nvidia.: CUDA C Programming Guide version 8.0. (2016)

  13. Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA 26. In: Kirk, D., Hwu, W. (eds.) Programming Massively Parallel Processors, 2nd edn, pp. 339–358. Elsevier, Amsterdam (2013)

    Chapter  Google Scholar 

  14. Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. IPDPS 2009, pp. 1–10 (2009)

  15. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999)

  16. Mtibaa, A., May, M., Diot, C., Ammar, M.: PeopleRank: social opportunistic forwarding. IEEE Int. Conf. Comput. Commun. 54(1), 1–5 (2010)

    MATH  Google Scholar 

  17. Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338, p. 2012 (2012)

  18. Dale, J.: CUDA function overheads. http://visionexperts.blogspot.com/2009/07/cuda-function-overheads.html

  19. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining SDM’ 04 (2004)

  20. Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97(21), 11149–11152 (2000)

    Article  Google Scholar 

  21. Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960)

    MathSciNet  MATH  Google Scholar 

  22. Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/index.html

  23. Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: GPU Gems 3 (2007)

  24. Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC) (2005)

  25. Berry, J., Mackey, G.: MultiThreaded graph library (MTGL). https://software.sandia.gov/trac/mtgl

  26. Apache Incubator Giraph: http://incubator.apache.org/giraph/

  27. GoldenOrb: http://www.raveldata.com/goldenorb/

  28. Salihoglu, S., Widom J.: GPS: a graph processing system *. Stanford InfoLab (2013)

  29. Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big (ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)

    Article  Google Scholar 

  30. Phoebus: https://github.com/xslogic/phoebus

  31. Hong, S., Kyun, S., Tayo, K., Olukotun, O.K.: Accelerating CUDA graph algorithms at maximum warp. In: PPoPP, vol. 46, no. 8, pp. 267–276 (2011)

  32. Luo, L., Wong, M., Hwu, W.: An effective GPU implementation of breadth-first search. In: Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC), pp. 52–55 (2010)

  33. Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), vol. 47, no. 8, pp. 117–128 (2011)

  34. Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 68 (2015)

  35. Liu, H., Huang, H.H., Hu, Y.: iBFS: concurrent breadth-first search on GPUs. In: Proceedings of the 2016 International Conference on Management of Data, ACM, pp. 403–416 (2016)

  36. Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: Laperm: Locality aware scheduler for dynamic parallelism on gpus. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 583–595 (2016)

  37. Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Sreepathi Pai, M.I., Das, C.R.: Controlled Kernel Launch for dynamic parallelism in GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, ACM (2017)

  38. El Hajj, I., Gómez-Luna, J., Li, C., Chang, L.W., Milojicic, D., Hwu, W.M.: KLAP: kernel launch aggregation and promotion for optimizing dynamic parallelism. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016)

  39. Wang, Y., Pan, Y., Davidson, A., Wu, Y., Yang, C., Wang, L., Owens, J. D.: Gunrock: GPU graph analytics. arXiv preprint arXiv:1701.01170 (2017)

Download references

Acknowledgements

This paper is supported in part by the National Natural Science Foundation of China under Grant No. 61472454.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyan Lai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lai, S., Lai, G., Lu, F. et al. A BSP model graph processing system on many cores. Cluster Comput 20, 1359–1377 (2017). https://doi.org/10.1007/s10586-017-0829-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0829-0

Keywords

Navigation