Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware

Zeni, Alberto; O’Brien, Kenneth; Blott, Michaela; Santambrogio, Marco D.

doi:10.1007/978-3-030-85665-6_38

Alberto Zeni^11,12,
Kenneth O’Brien¹¹,
Michaela Blott¹¹ &
…
Marco D. Santambrogio¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

European Conference on Parallel Processing

2056 Accesses
7 Citations

Abstract

The HPCG benchmark represents a modern complement to the HPL benchmark in the performance evaluation of HPC systems, as it has been recognized as a more representative benchmark to reflect real-world applications. While typical workloads become more and more challenging, the semiconductor industry is battling with performance scaling and power efficiency on next-generation technology nodes. As a result, the industry is turning towards more customized compute architectures to help meet the latest performance requirements. In this paper, we present the details of the first FPGA-based implementation of HPCG that takes advantage of such customized compute architectures. Our results show that our high-performance multi-FPGA implementation, using 1 and 4 Xilinx Alveo U280 achieves up to 108.3 GFlops and 346.5 GFlops respectively, representing speed-ups of \(104.1\times \) and \(333.2\times \) over software running on a server with an Intel Xeon processor with no loss of accuracy. We also demonstrate that the FPGA-based solution achieves comparable performance with respect to modern GPUs and an up to \(2.7\times \) improvement in terms of power efficiency compared to an NVIDIA Tesla V100. Finally, a theoretical evaluation, based on Berkeley’s Roofline model demonstrates that our implementation is near optimally tuned on the Xilinx Alveo U280.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.hpcg-benchmark.org/software/index.html.

References

Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurrency Comput. Pract. Experience 15(9), 803–820 (2003)
Article Google Scholar
J. Dongarra, P. Luszczek, and M. Heroux, "Hpcg technical specification," Sandia National Laboratories, Sandia Report SAND2013-8752, 2013
Google Scholar
Shalf, J.M., Leland, R.: Computing beyond moore’s law. Computer 48(SAND-2015-8039J), 14–23 (2015)
Google Scholar
Theis, T.N., Wong, H.-S.P.: The end of moore’s law: a new beginning for information technology. Comput. Sci. Eng. 19(2), 41–50 (2017)
Google Scholar
Jiang, W., et al.: Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In: Proceedings of the 56th Annual Design Automation Conference 2019 (2019)
Google Scholar
Zeni, A., Crespi, M., Di Tucci, L., Santambrogio, M.D.: An fpga-based computing infrastructure tailored to efficiently scaffold genome sequences. In: 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), IEEE (2019)
Google Scholar
Di Tucci, L., O’Brien, K., Blott, M., Santambrogio, M.D.: Architectural optimizations for high performance and energy efficient smith-waterman implementation on FPGAs using OpenCL. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE (2017)
Google Scholar
Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744 312, 150 (2013)
Google Scholar
Dongarra, J.: Sunway taihulight supercomputer makes its appearance. Natl. Sci. Rev. 3(3), 265–266 (2016)
Google Scholar
Park, J.: Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE (2014)
Google Scholar
Phillips, E., Fatica, M.: A cuda implementation of the high performance conjugate gradient benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 68–84. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_4
Chapter Google Scholar
Kumahata, K., Minami, K., Maruyama, N.: High-performance conjugate gradient performance improvement on the k computer. Int. J. High Perform. Comput. Appl. 30(1), 55–70 (2016)
Article Google Scholar
Ruiz, D., Mantovani, F., Casas, M., Labarta, J., Spiga, F.: The hpcg benchmark: analysis, shared memory preliminary improvements and evaluation on an arm-based platform (2018)
Google Scholar
Zhang, X., Yang, C., Liu, F., Liu, Y., Lu, Y.: Optimizing and scaling HPCG on Tianhe-2: early experience. In: Sun, X., et al. (eds.) ICA3PP 2014. LNCS, vol. 8630, pp. 28–41. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11197-1_3
Chapter Google Scholar
Egawa, R., et al.: Performance and power analysis of SX-ACE using HP-X benchmark programs. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2017)
Google Scholar
Marjanović, V., Gracia, J., Glass, C.W.: Performance modeling of the HPCG benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 172–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_9
Chapter Google Scholar
Vermij, E., Fiorin, L., Hagleitner, C., Bertels, K.: Boosting the efficiency of HPCG and graph500 with near-data processing. In: 2017 46th International Conference on Parallel Processing (ICPP), IEEE (2017)
Google Scholar
Sundararajan, P.: High performance computing using fpgas, Technical Report. Available Online 2010: Technical report (2010) www.xilinx.com/support
Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic, IEEE (2011)
Google Scholar
Herbordt, M.C., et al.: Achieving high performance with FPGa-based computing. Computer 40(3), 50–57 (2007)
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Muralidharan, S., O’Brien, K., Lalanne, C.: A semi-automated tool flow for roofline anaylsis of opencl kernels on accelerators. In: First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC 2015) (2015)
Google Scholar

Download references

Acknowledgments

We acknowledge the Xilinx University Program for providing access to the Xilinx Adaptive Compute Cluster (XACC) at ETH Zurich.

Author information

Authors and Affiliations

Research Labs, Xilinx Inc., Dublin, Ireland
Alberto Zeni, Kenneth O’Brien & Michaela Blott
Politecnico di Milano, Milan, Italy
Alberto Zeni & Marco D. Santambrogio

Authors

Alberto Zeni
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Blott
View author publications
You can also search for this author in PubMed Google Scholar
Marco D. Santambrogio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Zeni .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Leonel Sousa
Universidade de Lisboa, Lisbon, Portugal
Nuno Roma
Universidade de Lisboa, Lisbon, Portugal
Pedro Tomás

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeni, A., O’Brien, K., Blott, M., Santambrogio, M.D. (2021). Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-85665-6_38
Published: 25 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics