Skip to main content

iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12728)

Abstract

The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.

This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures.

We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to \(4{\times }\) over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about \(1.5{\times }\) on most test instances.

Keywords

  • IPU
  • Graph500
  • BFS
  • Performance optimization

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-78713-4_16
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-78713-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.

Source: Graphcore.

Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    Git commit: 5ee3df5, Online: https://github.com/gunrock/gunrock.

  2. 2.

    Git commit: 426846f, Online: https://github.com/iHeartGraph/Enterprise.

  3. 3.

    https://en.wikichip.org/wiki/amd/epyc/7302p.

  4. 4.

    https://en.wikichip.org/wiki/intel/xeon_gold/6130.

References

  1. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)

    Google Scholar 

  2. Abu-Khzam, F.N., Collins, R.L., Fellows, M.R., Langston, M.A., Suters, W.H., Symons, C.T.: Kernelization algorithms for the vertex cover problem (2017)

    Google Scholar 

  3. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, and Tools. Addison-Wesley Pub. Co., Boston (1986)

    MATH  Google Scholar 

  4. Azad, A., Buluç, A.: Distributed-memory algorithms for maximum cardinality matching in bipartite graphs. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 32–42. IEEE (2016)

    Google Scholar 

  5. Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and ST-connectivity on the cray MTA-2. In: 2006 International Conference on Parallel Processing (ICPP 2006), pp. 523–530. IEEE (2006)

    Google Scholar 

  6. Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. arXiv preprint arXiv:1508.03619 (2015)

  7. Beamer, S., Asanovic, K., Patterson, D., Beamer, S., Patterson, D.: Searching for a parent instead of fighting over children: a fast breadth-first search implementation for graph500. EECS Department, University of California, Berkeley, Technical report UCB/EECS-2011-117 (2011)

    Google Scholar 

  8. Buluç, A., Beamer, S., Madduri, K., Asanovic, K., Patterson, D.: Distributed-memory breadth-first search on massive graphs. arXiv preprint arXiv:1705.04590 (2017)

  9. Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perf. Comput. Appl. 25(4), 496–509 (2011)

    CrossRef  Google Scholar 

  10. Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2011)

    Google Scholar 

  11. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)

    Google Scholar 

  12. Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 425–434. IEEE (2014)

    Google Scholar 

  13. Chenglong, Z., Huawei, C., Guobo, W., Qinfen, H., Yang, Z., Xiaochun, Y., Dongrui, F.: Efficient optimization of graph computing on high-throughput computer. J. Comput. Res. Dev. 57(6), 1152 (2020)

    Google Scholar 

  14. Gaihre, A., Wu, Z., Yao, F., Liu, H.: XBFS: exploring runtime optimizations for breadth-first search on GPUs. In: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, pp. 121–131 (2019)

    Google Scholar 

  15. Ghosh, R.K., Bhattacharjee, G.: Parallel breadth-first search algorithms for trees and graphs. Int. J. Comput. Math. 15(1–4), 255–268 (1984)

    MathSciNet  CrossRef  Google Scholar 

  16. Gregor, D., Lumsdaine, A.: Lifting sequential graph algorithms for distributed-memory parallel computation. ACM SIGPLAN Not. 40(10), 423–437 (2005)

    CrossRef  Google Scholar 

  17. Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77220-0_21

    CrossRef  Google Scholar 

  18. Hennessy, J.L., Patterson, D.A.: A new golden age for computer architecture. Commun. ACM 62(2), 48–60 (2019)

    CrossRef  Google Scholar 

  19. Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core CPU and GPU. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 78–88. IEEE (2011)

    Google Scholar 

  20. Jia, Z., Tillman, B., Maggioni, M., Scarpazza, D.P.: Dissecting the graphcore ipu architecture via microbenchmarking. arXiv preprint arXiv:1912.03413 (2019)

  21. Kaya, K., Langguth, J., Panagiotas, I., Uçar, B.: Karp-Sipser based kernels for bipartite graph matching. In: 2020 Proceedings of the Twenty-Second Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 134–145. SIAM (2020)

    Google Scholar 

  22. Kolodziej, S.P., et al.: The suitesparse matrix collection website interface. J. Open Source Softw. 4(35), 1244 (2019)

    CrossRef  Google Scholar 

  23. Korf, R.E., Schultze, P.: Large-scale parallel breadth-first search. In: AAAI, vol. 5, pp. 1380–1385 (2005)

    Google Scholar 

  24. Langguth, J., Azad, A., Halappanavar, M., Manne, F.: On parallel push-relabel based algorithms for bipartite maximum matching. Parallel Comput. 40(7), 289–308 (2014)

    CrossRef  Google Scholar 

  25. Langguth, J., Cai, X., Sourouri, M.: Memory bandwidth contention: communication vs computation tradeoffs in supercomputers with multicore architectures. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 497–506. IEEE (2018)

    Google Scholar 

  26. Langguth, J., Patwary, M.M.A., Manne, F.: Parallel algorithms for bipartite matching problems on distributed memory computers. Parallel Comput. 37(12), 820–845 (2011)

    CrossRef  Google Scholar 

  27. Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)

    Google Scholar 

  28. Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)

    Google Scholar 

  29. Seshadhri, C., Pinar, A., Kolda, T.G.: An in-depth analysis of stochastic Kronecker graphs. J. ACM (JACM) 60(2), 1–32 (2013)

    MathSciNet  CrossRef  Google Scholar 

  30. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    CrossRef  Google Scholar 

  31. Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12 (2016)

    Google Scholar 

  32. Yang, C., Buluc, A., Owens, J.D.: GraphBLAST: a high-performance linear algebra-based graph framework on the GPU (2020)

    Google Scholar 

  33. Yasui, Y., Fujisawa, K., Goto, K.: NUMA-optimized parallel breadth-first search on multicore single-node system. In: 2013 IEEE International Conference on Big Data, pp. 394–402. IEEE (2013)

    Google Scholar 

  34. Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: SC 2005: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 25. IEEE, November 2005. https://doi.org/10.1109/SC.2005.4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luk Burchard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Burchard, L., Moe, J., Schroeder, D.T., Pogorelov, K., Langguth, J. (2021). iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78713-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78712-7

  • Online ISBN: 978-3-030-78713-4

  • eBook Packages: Computer ScienceComputer Science (R0)