FPGA Based High Performance Double-Precision Matrix Multiplication

Kumar, Vinay B. Y.; Joshi, Siddharth; Patkar, Sachin B.; Narayanan, H.

doi:10.1007/s10766-010-0131-8

FPGA Based High Performance Double-Precision Matrix Multiplication

Published: 18 February 2010

Volume 38, pages 322–338, (2010)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Vinay B. Y. Kumar¹,
Siddharth Joshi¹,
Sachin B. Patkar¹ &
…
H. Narayanan¹

752 Accesses
23 Citations
Explore all metrics

Abstract

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent candidate for acceleration. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I. This compares favourably with both related art and general purpose CPU implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

Article 26 June 2014

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

References

Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., Genest, G.: Maxwell—a 64 fpga supercomputer. In: AHS ’07: Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems, pp. 287–294. IEEE Computer Society, Washington, DC, USA (2007)
Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., Genest, G.: The fpga high-performance computing alliance parallel toolkit. In: AHS ’07: Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems, pp. 301–310. IEEE Computer Society, Washington, DC, USA (2007)
Underwood, K.D., Hemmert, K.S.: Closing the gap: Cpu and fpga trends in sustainable floating-point blas performance. In: FCCM, pp. 219–228. IEEE Computer Society (2004)
Zhuo L., Prasanna V.K.: High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57(8), 1057–1071 (2008)
Article MathSciNet Google Scholar
Craven S., Athanas P.: Examining the viability of fpga supercomputing. EURASIP J. Embed. Syst. 2007(1), 13–13 (2007)
Google Scholar
Kumar, V.B.Y., Joshi, S., Patkar, S.B., Narayanan, H.: Fpga based high performance double-precision matrix multiplication. In: VLSID ’09: Proceedings of the 2009 22nd International Conference on VLSI Design, pp. 341–346. IEEE Computer Society, Washington, DC, USA (2009)
Zhuo L., Prasanna V.K.: Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Parallel Distrib. Syst. 18(4), 433–448 (2007)
Article Google Scholar
Goto, K., van de Geijn, R.: High performance implementation of the level-3 BLAS, accepted 28 Oct 2007
Dou, Y., Vassiliadis, S., Kuzmanov, G.K., Gaydadjiev, G.N.: 64-bit floating-point fpga matrix multiplication. In: FPGA ’05: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 86–95. ACM, New York, USA (2005)
Zhuo L., Prasanna V.K.: Scalable and modular algorithms for floating-point matrix multiplication on fpgas. IPDPS 01, 92 (2004)
Google Scholar
Xilinx Virtex-5 family User Guide
Kuzmanov, G., van Oijen, W.: Floating-point matrix multiplication in a polymorphic processor. In: International Conference on Field Programmable Technology (ICFPT), Dec 2007, pp. 249–252

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology, Bombay, Mumbai, 400076, India
Vinay B. Y. Kumar, Siddharth Joshi, Sachin B. Patkar & H. Narayanan

Authors

Vinay B. Y. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Sachin B. Patkar
View author publications
You can also search for this author in PubMed Google Scholar
H. Narayanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinay B. Y. Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, V.B.Y., Joshi, S., Patkar, S.B. et al. FPGA Based High Performance Double-Precision Matrix Multiplication. Int J Parallel Prog 38, 322–338 (2010). https://doi.org/10.1007/s10766-010-0131-8

Download citation

Received: 15 July 2009
Accepted: 31 January 2010
Published: 18 February 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10766-010-0131-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FPGA Based High Performance Double-Precision Matrix Multiplication

Abstract

Access this article

Similar content being viewed by others

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FPGA Based High Performance Double-Precision Matrix Multiplication

Abstract

Access this article

Similar content being viewed by others

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation