Skip to main content
Log in

FPGA Based High Performance Double-Precision Matrix Multiplication

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent candidate for acceleration. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I. This compares favourably with both related art and general purpose CPU implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., Genest, G.: Maxwell—a 64 fpga supercomputer. In: AHS ’07: Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems, pp. 287–294. IEEE Computer Society, Washington, DC, USA (2007)

  2. Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., Genest, G.: The fpga high-performance computing alliance parallel toolkit. In: AHS ’07: Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems, pp. 301–310. IEEE Computer Society, Washington, DC, USA (2007)

  3. Underwood, K.D., Hemmert, K.S.: Closing the gap: Cpu and fpga trends in sustainable floating-point blas performance. In: FCCM, pp. 219–228. IEEE Computer Society (2004)

  4. Zhuo L., Prasanna V.K.: High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57(8), 1057–1071 (2008)

    Article  MathSciNet  Google Scholar 

  5. Craven S., Athanas P.: Examining the viability of fpga supercomputing. EURASIP J. Embed. Syst. 2007(1), 13–13 (2007)

    Google Scholar 

  6. Kumar, V.B.Y., Joshi, S., Patkar, S.B., Narayanan, H.: Fpga based high performance double-precision matrix multiplication. In: VLSID ’09: Proceedings of the 2009 22nd International Conference on VLSI Design, pp. 341–346. IEEE Computer Society, Washington, DC, USA (2009)

  7. Zhuo L., Prasanna V.K.: Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Parallel Distrib. Syst. 18(4), 433–448 (2007)

    Article  Google Scholar 

  8. Goto, K., van de Geijn, R.: High performance implementation of the level-3 BLAS, accepted 28 Oct 2007

  9. Dou, Y., Vassiliadis, S., Kuzmanov, G.K., Gaydadjiev, G.N.: 64-bit floating-point fpga matrix multiplication. In: FPGA ’05: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 86–95. ACM, New York, USA (2005)

  10. Zhuo L., Prasanna V.K.: Scalable and modular algorithms for floating-point matrix multiplication on fpgas. IPDPS 01, 92 (2004)

    Google Scholar 

  11. Xilinx Virtex-5 family User Guide

  12. Kuzmanov, G., van Oijen, W.: Floating-point matrix multiplication in a polymorphic processor. In: International Conference on Field Programmable Technology (ICFPT), Dec 2007, pp. 249–252

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinay B. Y. Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, V.B.Y., Joshi, S., Patkar, S.B. et al. FPGA Based High Performance Double-Precision Matrix Multiplication. Int J Parallel Prog 38, 322–338 (2010). https://doi.org/10.1007/s10766-010-0131-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-010-0131-8

Keywords

Navigation