Abstract
The HPCG benchmark represents a modern complement to the HPL benchmark in the performance evaluation of HPC systems, as it has been recognized as a more representative benchmark to reflect real-world applications. While typical workloads become more and more challenging, the semiconductor industry is battling with performance scaling and power efficiency on next-generation technology nodes. As a result, the industry is turning towards more customized compute architectures to help meet the latest performance requirements. In this paper, we present the details of the first FPGA-based implementation of HPCG that takes advantage of such customized compute architectures. Our results show that our high-performance multi-FPGA implementation, using 1 and 4 Xilinx Alveo U280 achieves up to 108.3 GFlops and 346.5 GFlops respectively, representing speed-ups of \(104.1\times \) and \(333.2\times \) over software running on a server with an Intel Xeon processor with no loss of accuracy. We also demonstrate that the FPGA-based solution achieves comparable performance with respect to modern GPUs and an up to \(2.7\times \) improvement in terms of power efficiency compared to an NVIDIA Tesla V100. Finally, a theoretical evaluation, based on Berkeley’s Roofline model demonstrates that our implementation is near optimally tuned on the Xilinx Alveo U280.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurrency Comput. Pract. Experience 15(9), 803–820 (2003)
J. Dongarra, P. Luszczek, and M. Heroux, "Hpcg technical specification," Sandia National Laboratories, Sandia Report SAND2013-8752, 2013
Shalf, J.M., Leland, R.: Computing beyond moore’s law. Computer 48(SAND-2015-8039J), 14–23 (2015)
Theis, T.N., Wong, H.-S.P.: The end of moore’s law: a new beginning for information technology. Comput. Sci. Eng. 19(2), 41–50 (2017)
Jiang, W., et al.: Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In: Proceedings of the 56th Annual Design Automation Conference 2019 (2019)
Zeni, A., Crespi, M., Di Tucci, L., Santambrogio, M.D.: An fpga-based computing infrastructure tailored to efficiently scaffold genome sequences. In: 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), IEEE (2019)
Di Tucci, L., O’Brien, K., Blott, M., Santambrogio, M.D.: Architectural optimizations for high performance and energy efficient smith-waterman implementation on FPGAs using OpenCL. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE (2017)
Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744 312, 150 (2013)
Dongarra, J.: Sunway taihulight supercomputer makes its appearance. Natl. Sci. Rev. 3(3), 265–266 (2016)
Park, J.: Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE (2014)
Phillips, E., Fatica, M.: A cuda implementation of the high performance conjugate gradient benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 68–84. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_4
Kumahata, K., Minami, K., Maruyama, N.: High-performance conjugate gradient performance improvement on the k computer. Int. J. High Perform. Comput. Appl. 30(1), 55–70 (2016)
Ruiz, D., Mantovani, F., Casas, M., Labarta, J., Spiga, F.: The hpcg benchmark: analysis, shared memory preliminary improvements and evaluation on an arm-based platform (2018)
Zhang, X., Yang, C., Liu, F., Liu, Y., Lu, Y.: Optimizing and scaling HPCG on Tianhe-2: early experience. In: Sun, X., et al. (eds.) ICA3PP 2014. LNCS, vol. 8630, pp. 28–41. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11197-1_3
Egawa, R., et al.: Performance and power analysis of SX-ACE using HP-X benchmark programs. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2017)
Marjanović, V., Gracia, J., Glass, C.W.: Performance modeling of the HPCG benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 172–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_9
Vermij, E., Fiorin, L., Hagleitner, C., Bertels, K.: Boosting the efficiency of HPCG and graph500 with near-data processing. In: 2017 46th International Conference on Parallel Processing (ICPP), IEEE (2017)
Sundararajan, P.: High performance computing using fpgas, Technical Report. Available Online 2010: Technical report (2010) www.xilinx.com/support
Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic, IEEE (2011)
Herbordt, M.C., et al.: Achieving high performance with FPGa-based computing. Computer 40(3), 50–57 (2007)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Muralidharan, S., O’Brien, K., Lalanne, C.: A semi-automated tool flow for roofline anaylsis of opencl kernels on accelerators. In: First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC 2015) (2015)
Acknowledgments
We acknowledge the Xilinx University Program for providing access to the Xilinx Adaptive Compute Cluster (XACC) at ETH Zurich.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeni, A., O’Brien, K., Blott, M., Santambrogio, M.D. (2021). Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-85665-6_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)