FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Wu, Yu-Wei; Wang, Qing-Gang; Zheng, Long; Liao, Xiao-Fei; Jin, Hai; Jiang, Wen-Bin; Zheng, Ran; Hu, Kan

doi:10.1007/s11390-021-1242-y

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Regular Paper
Published: 30 September 2021

Volume 36, pages 1051–1070, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yu-Wei Wu^1,2,3,
Qing-Gang Wang^1,2,3,
Long Zheng^1,2,3,
Xiao-Fei Liao^1,2,3,
Hai Jin^1,2,3,
Wen-Bin Jiang^1,2,3,
Ran Zheng^1,2,3 &
…
Kan Hu^1,2,3

292 Accesses
7 Citations
Explore all metrics

Abstract

With the rapid growth of real-world graphs, the size of which can easily exceed the on-chip (board) storage capacity of an accelerator, processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult. The multi-FPGA acceleration is of great necessity and importance. Many cloud providers (e.g., Amazon, Microsoft, and Baidu) now expose FPGAs to users in their data centers, providing opportunities to accelerate large-scale graph processing. In this paper, we present a communication library, called FDGLib, which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center, with minimal hardware engineering efforts. FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications. Considering the torus-based FPGA interconnection in data centers, FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes. We interface FDGLib into AccuGraph, a state-of-the-art graph accelerator. Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini, with better scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed large-scale graph processing on FPGAs

Article Open access 04 June 2023

Amin Sahebi, Marco Barbone, … Roberto Giorgi

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

Article 09 November 2023

Shiyang Li, Jingyu Zhu, … Xuqiang Wang

An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy

References

Quick L, Wilkinson P, Hardcastle D. Using Pregel-like large scale graph processing frameworks for social network analysis. In Proc. the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Aug. 2012, pp.457-463. https://doi.org/10.1109/ASONAM.2012.254.
Aridhi S, Montresor A, Velegrakis Y. BLADYG: A novel block-centric framework for the analysis of large dynamic graphs. In Proc. the ACM Workshop on High Performance Graph Processing, May 2016, pp.39-42. https://doi.org/10.1016/j.bdr.2017.05.003.
Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar. 2016, Article No. 11. https://doi.org/10.1145/2851141.2851145.
Warnke-Sommer J, Ali H. Graph mining for next generation sequencing: Leveraging the assembly graph for biological insights. BMC Genomics, 2016, 17(1): Article No. 340. https://doi.org/10.1186/s12864-016-2678-2.
Dai G, Chi Y, Wang Y, Yang H. FPGP: Graph processing framework on FPGA—A case study of breadth-first search. In Proc. the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2016, pp.105-110. https://doi.org/10.1145/2847263.2847339.
Engelhardt N, So H K H. GraVF: A vertex-centric distributed graph processing framework on FPGAs. In Proc. the 26th International Conference on Field Programmable Logic and Applications, Aug. 29–Sept. 2, 2016. 10.1109/FPL.2016.7577360.
Nurvitadhi E, Weisz G, Wang Y, Hurkat S, Nguyen M, Hoe J C, Martínez J, Guestrin C. GraphGen: An FPGA framework for vertex-centric graph computation. In Proc. the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, May 2014, pp.25-28. https://doi.org/10.1109/FCCM.2014.15.
Oguntebi T, Olukotun K. GraphOps: A dataflow library for graph analytics acceleration. In Proc. the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2016, pp.111-117. https://doi.org/10.1145/2847263.2847337.
Zhou S, Chelmis C, Prasanna V K. High-throughput and energy-efficient graph processing on FPGA. In Proc. the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, May 2016, pp.103-110. https://doi.org/10.1109/FCCM.2016.35.
Yao P, Zheng L, Liao X, Jin H, He B. An efficient graph accelerator with parallel data conflict management. In Proc. the 27th International Conference on Parallel Architectures and Compilation Techniques, Nov. 2018, Article No. 8. https://doi.org/10.1145/3243176.3243201.
Yang C, Zheng L, Gui C, Jin H. Efficient FPGA-based graph processing with hybrid pull-push computational model. Frontiers Comput. Sci., 2020, 14(4): Article No. 144102. https://doi.org/10.1007/s11704-019-9020-5.
Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q. An effective framework for asynchronous incremental graph processing. Frontiers Comput. Sci., 2019, 13(3): 539-551. https://doi.org/10.1007/s11704-018-7443-z.
Article Google Scholar
Jin H, Yao P, Liao X. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): Article No. 126102. https://doi.org/10.1007/s11432-017-9226-8.
Li Z, Ding Z. Distributed optimization on unbalanced graphs via continuous-time methods. Science China Information Sciences, 2018, 61(12): Article No. 129204. https://doi.org/10.1007/s11432-018-9502-1.
Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. the 42nd Annual International Symposium on Computer Architecture, Jun. 2015, pp.105-117. https://doi.org/10.1145/2749469.2750386.
McSherry F, Isard M, Murray D G. Scalability! But at what COST? In Proc. the 15th USENIX Conference on Hot Topics in Operating Systems, May 2015.
Dai G, Huang T, Chi Y, Xu N, Wang Y, Yang H. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.217-226. https://doi.org/10.1145/3020078.3021739.
Dathathri R, Gill G, Hoang L, Dang H, Brooks A, Dryden N, Snir M, Pingali K. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proc. the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2018, pp.752-768. https://doi.org/10.1145/3192366.3192404.
Satish N, Sundaram N, Patwary M M A, Seo J, Park J, Hassaan M A, Sengupta S, Yin Z, Dubey P. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.979-990. https://doi.org/10.1145/2588555.2610518.
Khorasani F, Gupta R, Bhuyan L N. Scalable SIMD-efficient graph processing on GPUs. In Proc. the 2015 International Conference on Parallel Architectures and Compilation Techniques, Oct. 2015, pp.39-50. https://doi.org/10.1109/P-ACT.2015.15.
Fu H, Liao J, Yang J et al. The Sunway TaihuLight supercomputer: System and applications. Science China Information Sciences, 2016, 59(7): Article No. 072001. https://doi.org/10.1007/s11432-016-5588-7.
Zhang F, Zheng L, Liao X, Lv X, Jin H, Xiao J. An effective 2-dimension graph partitioning for work stealing assisted graph processing on multi-FPGAs. IEEE Transactions on Big Data, https://doi.org/10.1109/TBDATA.2020.3035090.
Engelhardt N, So H K H. GraVF-M: Graph processing system generation for multi-FPGA platforms. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(4): Article No. 21. https://doi.org/10.1145/3357596.
Shao Z, Li R, Hu D, Liao X, Jin H. Improving performance of graph processing on FPGA-DRAM platform by two-level vertex caching. In Proc. the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2019, pp.320-329. https://doi.org/10.1145/3289602.3293900.
Zhou S, Kannan R, Prasanna V K, Seetharaman G, Wu Q. HitGraph: High-throughput graph processing framework on FPGA. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(10): 2249-2264. https://doi.org/10.1109/TPDS.2019.2910068.
Article Google Scholar
Wang Q, Zheng L, Zhao J, Liao X, Jin H, Xue J. A conflict-free scheduler for high-performance graph processing on multi-pipeline FPGAs. ACM Transactions on Architecture and Code Optimization, 2020, 17(2): Article No. 14. https://doi.org/10.1145/3390523.
Putnam A, Caulfield A M, Chung E S et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proc. the 41st ACM/IEEE International Symposium on Computer Architecture, Jun. 2014, pp.13-24. https://doi.org/10.1109/ISCA.2014.6853195.
Caulfield A M, Chung E S, Putnam A et al. Configurable clouds. IEEE Micro, 2017, 37(3): 52-61. https://doi.org/10.1109/MM.2017.51.
Article Google Scholar
Zhou S, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proc. the 29th International Symposium on Computer Architecture and High Performance Computing, Oct. 2017, pp.137-144. https://doi.org/10.1109/SBAC-PAD.2017.25.
Zhu X, Chen W, Zheng W, Ma X. Gemini: A computation-centric distributed graph processing system. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.301-316.
Malewicz G, Austern M H, Bik A, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146. https://doi.org/10.1145/1807167.1807184.
Eskandari N, Tarafdar N, Ly-Ma D, Chow P. A modular heterogeneous stack for deploying FPGAs and CPUs in the data center. In Proc. the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2019, pp.262-271. https://doi.org/10.1145/3289602.3293909.
Gill G, Dathathri R, Hoang L, Pingali K. A study of partitioning policies for graph analytics on large-scale distributed platforms. Proceedings of the VLDB Endowment, 2018, 12(4): 321-334. https://doi.org/10.14778/3297753.3297754.
Article Google Scholar
Zilberman N, Bracha G, Schzukin G. Stardust: Divide and conquer in the data center network. In Proc. the 16th USENIX Symposium on Networked Systems Design and Implementation, Feb. 2019, pp.141-160.
Snir M, Gropp W, Otto S, Huss-Lederman S, Dongarra J, Walker D. MPI-The Complete Reference: Volume 1, the MPI Core (2nd edition). MIT Publishers, 1998.
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17-30.
Boman E G, Devine K D, Rajamanickam S. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013. https://doi.org/10.1145/2503210.2503293.
Slota G M, Rajamanickam S, Devine K D, Madduri K. Partitioning trillion-edge graphs in minutes. In Proc. the 2017 IEEE International Parallel and Distributed Processing Symposium, May 29–June 2, 2017, pp.646-655. https://doi.org/10.1109/IPDPS.2017.95.
Chen R, Shi J, Chen Y, Zang B, Guan H, Chen H. Power-Lyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing, 2018, 5(3): Article No. 13. https://doi.org/10.1145/3298989.

Download references

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Yu-Wei Wu, Qing-Gang Wang, Long Zheng, Xiao-Fei Liao, Hai Jin, Wen-Bin Jiang, Ran Zheng & Kan Hu
Services Computing Technology and System Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Yu-Wei Wu, Qing-Gang Wang, Long Zheng, Xiao-Fei Liao, Hai Jin, Wen-Bin Jiang, Ran Zheng & Kan Hu
Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Yu-Wei Wu, Qing-Gang Wang, Long Zheng, Xiao-Fei Liao, Hai Jin, Wen-Bin Jiang, Ran Zheng & Kan Hu

Authors

Yu-Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Long Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Fei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Bin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Kan Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Zheng.

Supplementary Information

ESM 1

(PDF 501 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, YW., Wang, QG., Zheng, L. et al. FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers. J. Comput. Sci. Technol. 36, 1051–1070 (2021). https://doi.org/10.1007/s11390-021-1242-y

Download citation

Received: 30 December 2020
Accepted: 06 August 2021
Published: 30 September 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11390-021-1242-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Abstract

Access this article

Similar content being viewed by others

Distributed large-scale graph processing on FPGAs

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Abstract

Access this article

Similar content being viewed by others

Distributed large-scale graph processing on FPGAs

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation