Hardware Overhead vs. Performance of Matrix Multiplication on FPGA
Matrix multiplication requires a large number of operations, demanding for high performance computing. In order to complete the matrix multiplication in one clock cycle, a designer can utilize multiple multipliers. However, this approach is inefficient in terms of hardware area and power consumption. Therefore, it is important to find out the way to complete the multiplication that is fast and uses hardware resources properly. In this paper, we introduce the way to reduce the number of multipliers and provide the hardware overhead and performance of matrix multiplication on FPGA.
KeywordsMatrix multiplication Digital signal processing Low-power design FPGA
This study was supported by Seoul National University of Science and Technology, Korea.
- 2.Choi J (1997) A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers. In: Proceedings of the 11th international symposium on parallel processingGoogle Scholar
- 3.Bensaali F, Amira A, Bouridane A (2005) Accelerating matrix product on reconfigurable hardware for image processing applications, circuits, devices and systems. In: IEEE Proceedings 3 June 2005Google Scholar
- 4.Lin CY (2011) A model for matrix multiplication performance on FPGAs. In: 21st international conference on field programmable logic and applicationsGoogle Scholar
- 5.Al-Qadi Z, Aqel M (2009) Performance analysis of parallel matrix multiplication algorithms used in image processing. World Appl Sci JGoogle Scholar