Abstract
Matrix computing plays a vital role in many scientific and engineering applications, but previous work can only handle the data with specified precision based on FPGA. This study first presents algorithms, data flows, and mapping strategies to match the hardware structure for matrix computing of different precisions. Then, we propose a unified multi-precision matrix computing unit core that can handle three precisions and three matrix operation modes and can be used as a coprocessor for large-scale matrix computing which has advantages of low storage and high efficiency. Finally, we build a complete matrix computing acceleration system and deploy it on FPGA using 128 processing elements (PEs). The experimental results show that the accelerator achieves a maximum frequency of 180 MHz, and matrix computing of double-precision, single-precision, and half-precision floating-point data performs 46.1 GFLOPS, 92.1 GFLOPS, and 184.3 GFLOPS respectively, which is superior to other current designs in terms of application range and performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amira, A., Bouridane, A., Milligan, P.: Accelerating matrix product on reconfigurable hardware for signal processing. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 101–111. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_11
Bensaali, F., Amira, A., Bouridane, A.: Accelerating matrix product on reconfigurable hardware for image processing applications. IEE Proc. Circ. Devices Syst. 152, 236–246 (2005)
Liu, Z.Q., et al.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. 10(3), 23 (2017)
Jovanovic, Z., Milutinovic, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)
Sonawane, D., Sutaone, M.S., Malek, I.: Systolic architecture for integer point matrix multiplication using FPGA, pp. 3822–3825 (2009)
Abbaszadeh, A., et al.: An scalable matrix computing unit architecture for FPGA, and SCUMO user design interface. Electronics 8(1), 20 (2019)
Qasim, S.M., Abbasi, S.A., Almashary, B.: A proposed FPGA-based parallel architecture for matrix multiplication. In: Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (2008)
Zhou, L.T., et al.: Research on Systolic multiplication technology based on FPGA. Comput. Eng. Sci. 37, 1632–1636 (2015)
Jang, J.-W., Choi, S.B., Prasanna, V.K.: Energy- and time-efficient matrix multiplication on FPGAs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13(11), 1305–1319 (2005). https://doi.org/10.1109/TVLSI.2005.859562
Zhuo, L., Prasanna, V.K.: Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Parallel Distrib. Syst. 18(4), 433–448 (2007)
Kumar, V.B.Y., et al.: FPGA based high performance double-precision matrix multiplication. Int. J. Parallel Prog. 38(3), 322–338 (2010)
Dou, Y., et al.: 64-bit floating-point FPGA matrix multiplication. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 86–95. Association for Computing Machinery (2005)
Wu, G.M., Dou, Y., Wang, M.: High performance and memory efficient implementation of matrix multiplication on FPGAs. In: 2010 International Conference on Field-Programmable Technology, Beijing, pp. 134–137 (2010)
Jia, X., Wu, G.M., Xie, X.H.: A high-performance accelerator for floating-point matrix multiplication. In: 2017 15th IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 396–402. IEEE, New York (2017)
Qiao, Y.R., et al.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput.-Pract. Exp. 29(20), 20 (2017)
Shen, J., et al.: Towards a multi-array architecture for accelerating large-scale matrix multiplication on FPGAs (2018)
Zhang, L., et al.: A scalable architecture for accelerating multi-operation and continuous floating-point matrix computing on FPGAs. IEEE Access 8, 92469–92478 (2020)
Tian, T.: The Research and Implementation of High Performance SIMD Floating-Point Multiplication Accumulator Unit for FT-XDSP. National University of Defense Technology (2013)
Wang, W.Q., et al.: A universal FPGA-based floating-point matrix processor for mobile systems. In: Proceedings of the 2014 International Conference on Field-Programmable Technology, pp. 139–146. IEEE, New York (2014)
Acknowledgments
This work was partially supported by the National Science and Technology Major Project (2017-V-0014-0066).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhang, L., Peng, Y., Hu, X., Huang, A., Tian, T. (2021). FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-79478-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)