Skip to main content

Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware

  • Conference paper
  • First Online:
Euro-Par 2021: Parallel Processing (Euro-Par 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

Abstract

The HPCG benchmark represents a modern complement to the HPL benchmark in the performance evaluation of HPC systems, as it has been recognized as a more representative benchmark to reflect real-world applications. While typical workloads become more and more challenging, the semiconductor industry is battling with performance scaling and power efficiency on next-generation technology nodes. As a result, the industry is turning towards more customized compute architectures to help meet the latest performance requirements. In this paper, we present the details of the first FPGA-based implementation of HPCG that takes advantage of such customized compute architectures. Our results show that our high-performance multi-FPGA implementation, using 1 and 4 Xilinx Alveo U280 achieves up to 108.3 GFlops and 346.5 GFlops respectively, representing speed-ups of \(104.1\times \) and \(333.2\times \) over software running on a server with an Intel Xeon processor with no loss of accuracy. We also demonstrate that the FPGA-based solution achieves comparable performance with respect to modern GPUs and an up to \(2.7\times \) improvement in terms of power efficiency compared to an NVIDIA Tesla V100. Finally, a theoretical evaluation, based on Berkeley’s Roofline model demonstrates that our implementation is near optimally tuned on the Xilinx Alveo U280.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.hpcg-benchmark.org/software/index.html.

References

  1. Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurrency Comput. Pract. Experience 15(9), 803–820 (2003)

    Article  Google Scholar 

  2. J. Dongarra, P. Luszczek, and M. Heroux, "Hpcg technical specification," Sandia National Laboratories, Sandia Report SAND2013-8752, 2013

    Google Scholar 

  3. Shalf, J.M., Leland, R.: Computing beyond moore’s law. Computer 48(SAND-2015-8039J), 14–23 (2015)

    Google Scholar 

  4. Theis, T.N., Wong, H.-S.P.: The end of moore’s law: a new beginning for information technology. Comput. Sci. Eng. 19(2), 41–50 (2017)

    Google Scholar 

  5. Jiang, W., et al.: Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In: Proceedings of the 56th Annual Design Automation Conference 2019 (2019)

    Google Scholar 

  6. Zeni, A., Crespi, M., Di Tucci, L., Santambrogio, M.D.: An fpga-based computing infrastructure tailored to efficiently scaffold genome sequences. In: 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), IEEE (2019)

    Google Scholar 

  7. Di Tucci, L., O’Brien, K., Blott, M., Santambrogio, M.D.: Architectural optimizations for high performance and energy efficient smith-waterman implementation on FPGAs using OpenCL. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE (2017)

    Google Scholar 

  8. Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744 312, 150 (2013)

    Google Scholar 

  9. Dongarra, J.: Sunway taihulight supercomputer makes its appearance. Natl. Sci. Rev. 3(3), 265–266 (2016)

    Google Scholar 

  10. Park, J.: Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE (2014)

    Google Scholar 

  11. Phillips, E., Fatica, M.: A cuda implementation of the high performance conjugate gradient benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 68–84. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_4

    Chapter  Google Scholar 

  12. Kumahata, K., Minami, K., Maruyama, N.: High-performance conjugate gradient performance improvement on the k computer. Int. J. High Perform. Comput. Appl. 30(1), 55–70 (2016)

    Article  Google Scholar 

  13. Ruiz, D., Mantovani, F., Casas, M., Labarta, J., Spiga, F.: The hpcg benchmark: analysis, shared memory preliminary improvements and evaluation on an arm-based platform (2018)

    Google Scholar 

  14. Zhang, X., Yang, C., Liu, F., Liu, Y., Lu, Y.: Optimizing and scaling HPCG on Tianhe-2: early experience. In: Sun, X., et al. (eds.) ICA3PP 2014. LNCS, vol. 8630, pp. 28–41. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11197-1_3

    Chapter  Google Scholar 

  15. Egawa, R., et al.: Performance and power analysis of SX-ACE using HP-X benchmark programs. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2017)

    Google Scholar 

  16. Marjanović, V., Gracia, J., Glass, C.W.: Performance modeling of the HPCG benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 172–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_9

    Chapter  Google Scholar 

  17. Vermij, E., Fiorin, L., Hagleitner, C., Bertels, K.: Boosting the efficiency of HPCG and graph500 with near-data processing. In: 2017 46th International Conference on Parallel Processing (ICPP), IEEE (2017)

    Google Scholar 

  18. Sundararajan, P.: High performance computing using fpgas, Technical Report. Available Online 2010: Technical report (2010) www.xilinx.com/support

  19. Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic, IEEE (2011)

    Google Scholar 

  20. Herbordt, M.C., et al.: Achieving high performance with FPGa-based computing. Computer 40(3), 50–57 (2007)

    Article  Google Scholar 

  21. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  22. Muralidharan, S., O’Brien, K., Lalanne, C.: A semi-automated tool flow for roofline anaylsis of opencl kernels on accelerators. In: First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC 2015) (2015)

    Google Scholar 

Download references

Acknowledgments

We acknowledge the Xilinx University Program for providing access to the Xilinx Adaptive Compute Cluster (XACC) at ETH Zurich.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Zeni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeni, A., O’Brien, K., Blott, M., Santambrogio, M.D. (2021). Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85665-6_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85664-9

  • Online ISBN: 978-3-030-85665-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics