Skip to main content

Exploring fpga Optimizations to Compute Sparse Numerical Linear Algebra Kernels

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12083)

Abstract

The solution of sparse triangular linear systems (sptrsv) is the bottleneck of many numerical methods. Thus, it is crucial to count with efficient implementations of such kernel, at least for commonly used platforms. In this sense, Field–Programmable Gate Arrays (FPGAs) have evolved greatly in the last years, entering the HPC hardware ecosystem largely due to their superior energy–efficiency relative to more established accelerators. Up until recently, the design for FPGAs implied the use of low–level Hardware Description Languages (HDL) such as VHDL or Verilog. Nowadays, manufacturers are making a large effort to adopt High–Level Synthesis languages like C/C++ or OpenCL, but the gap between their performance and that of HDLs is not yet fully studied. This work focuses on the performance offered by FPGAs to compute the sptrsv using OpenCL. For this purpose, we implement different parallel variants of this kernel and experimentally evaluate several setups, varying among others the work–group size, the number of compute units, the unroll–factor and the vectorization–factor.

Keywords

  • FPGAs
  • Sparse linear algebra
  • sptrsv
  • Power consumption

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-44534-8_20
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-44534-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    https://www.xilinx.com/products/design-tools/vitis/vitis-libraries/vitis-blas.html.

References

  1. The Green500 list (2019). http://www.green500.org

  2. Benner, P., Ezzatti, P., Quintana-Ortí, E., Remón, A.: On the impact of optimization on the time-power-energy balance of dense linear algebra factorizations. In: Aversa, R., Kołodziej, J., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013. LNCS, vol. 8286, pp. 3–10. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03889-6_1

    CrossRef  MATH  Google Scholar 

  3. Czajkowski, T., et al.: From OpenCL to high-performance hardware on FPGAs. In: 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 531–534. IEEE (2012)

    Google Scholar 

  4. Davis, T.: Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2006)

    CrossRef  Google Scholar 

  5. Dongarra, J., et al.: The international ExaScale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)

    CrossRef  Google Scholar 

  6. Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)

    MathSciNet  CrossRef  Google Scholar 

  7. Dufrechou, E., Ezzatti, P.: Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 196–203 (2018)

    Google Scholar 

  8. Erguiz, D., Dufrechou, E., Ezzatti, P.: Assessing sparse triangular linear system solvers on GPUs. In: 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 37–42, October 2017

    Google Scholar 

  9. Ezzatti, P., Quintana-Ortí, E.S., Remón, A., Saak, J.: Power-aware computing. Concurr. Comput. Pract. Exp. 31(6), e5034 (2019). e5034 cpe.5034

    CrossRef  Google Scholar 

  10. Favaro, F., Dufrechou, E., Ezzatti, P., Oliver, J.P.: Unleashing the sptrsv method in FPGAs. In: PACO 2019: 3rd Workshop on Power-Aware Computing (2019)

    Google Scholar 

  11. Fowers, J., Ovtcharov, K., Strauss, K., Chung, E., Stitt, G.: A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication. In: Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, pp. 36–43. IEEE Computer Society (2014)

    Google Scholar 

  12. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (2013)

    MATH  Google Scholar 

  13. Kestur, S., Davis, J.D., Chung, E.S.: Towards a universal FPGA matrix-vector multiplication architecture. In: 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16, April 2012

    Google Scholar 

  14. Kestur, S., Davis, J.D., Williams, O.: BLAS comparison on FPGA, CPU and GPU. In: 2010 IEEE Computer Society Annual Symposium on VLSI, pp. 288–293, July 2010

    Google Scholar 

  15. De Matteis, T., de Fine Licht, J., Hoefler, T.: FBLAS: Streaming Linear Algebra on FPGA (2019)

    Google Scholar 

  16. Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU, NVIDIA Corp., Westford, MA, USA, Technical report, NVR-2011, 1 (2011)

    Google Scholar 

  17. Tan, Y., Imamura, T.: Performance evaluation and tuning of an OpenCL based matrix multiplier. In: PDPTA, pp. 107–113. The Steering Committee of The World Congress in Computer Science (2018)

    Google Scholar 

  18. Townsend, K.R.: Computing SpMV on FPGAs. Graduate Theses and Dissertations (2016). https://lib.dr.iastate.edu/etd/15227

  19. Umuroglu, Y., Jahre, M.: A vector caching scheme for streaming FPGA SpMV accelerators. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) ARC 2015. LNCS, vol. 9040, pp. 15–26. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16214-0_2

    CrossRef  Google Scholar 

  20. Wirbel, L.: Xilinx SDAccel: A Unified Development Environment for Tomorrows Data Center. The Linley Group Inc., Mountain View (2014)

    Google Scholar 

Download references

Acknowledgments

The researchers were supported by Universidad de la República and the PEDECIBA. We acknowledge the ANII – MPG Independent Research Groups: “Efficient Heterogeneous Computing” with the CSC group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Favaro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Favaro, F., Dufrechou, E., Ezzatti, P., Oliver, J.P. (2020). Exploring fpga Optimizations to Compute Sparse Numerical Linear Algebra Kernels. In: Rincón, F., Barba, J., So, H., Diniz, P., Caba, J. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2020. Lecture Notes in Computer Science(), vol 12083. Springer, Cham. https://doi.org/10.1007/978-3-030-44534-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44534-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44533-1

  • Online ISBN: 978-3-030-44534-8

  • eBook Packages: Computer ScienceComputer Science (R0)