FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

Zhang, Longlong; Peng, Yuanxi; Hu, Xiao; Huang, Ahui; Tian, Tian

doi:10.1007/978-3-030-79478-1_17

Longlong Zhang^11,12,
Yuanxi Peng¹¹,
Xiao Hu¹²,
Ahui Huang^11,12 &
…
Tian Tian¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1098 Accesses

Abstract

Matrix computing plays a vital role in many scientific and engineering applications, but previous work can only handle the data with specified precision based on FPGA. This study first presents algorithms, data flows, and mapping strategies to match the hardware structure for matrix computing of different precisions. Then, we propose a unified multi-precision matrix computing unit core that can handle three precisions and three matrix operation modes and can be used as a coprocessor for large-scale matrix computing which has advantages of low storage and high efficiency. Finally, we build a complete matrix computing acceleration system and deploy it on FPGA using 128 processing elements (PEs). The experimental results show that the accelerator achieves a maximum frequency of 180 MHz, and matrix computing of double-precision, single-precision, and half-precision floating-point data performs 46.1 GFLOPS, 92.1 GFLOPS, and 184.3 GFLOPS respectively, which is superior to other current designs in terms of application range and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amira, A., Bouridane, A., Milligan, P.: Accelerating matrix product on reconfigurable hardware for signal processing. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 101–111. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_11
Chapter Google Scholar
Bensaali, F., Amira, A., Bouridane, A.: Accelerating matrix product on reconfigurable hardware for image processing applications. IEE Proc. Circ. Devices Syst. 152, 236–246 (2005)
Article Google Scholar
Liu, Z.Q., et al.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. 10(3), 23 (2017)
Article Google Scholar
Jovanovic, Z., Milutinovic, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)
Article Google Scholar
Sonawane, D., Sutaone, M.S., Malek, I.: Systolic architecture for integer point matrix multiplication using FPGA, pp. 3822–3825 (2009)
Google Scholar
Abbaszadeh, A., et al.: An scalable matrix computing unit architecture for FPGA, and SCUMO user design interface. Electronics 8(1), 20 (2019)
Article Google Scholar
Qasim, S.M., Abbasi, S.A., Almashary, B.: A proposed FPGA-based parallel architecture for matrix multiplication. In: Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (2008)
Google Scholar
Zhou, L.T., et al.: Research on Systolic multiplication technology based on FPGA. Comput. Eng. Sci. 37, 1632–1636 (2015)
Google Scholar
Jang, J.-W., Choi, S.B., Prasanna, V.K.: Energy- and time-efficient matrix multiplication on FPGAs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13(11), 1305–1319 (2005). https://doi.org/10.1109/TVLSI.2005.859562
Article Google Scholar
Zhuo, L., Prasanna, V.K.: Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Parallel Distrib. Syst. 18(4), 433–448 (2007)
Article Google Scholar
Kumar, V.B.Y., et al.: FPGA based high performance double-precision matrix multiplication. Int. J. Parallel Prog. 38(3), 322–338 (2010)
Article Google Scholar
Dou, Y., et al.: 64-bit floating-point FPGA matrix multiplication. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 86–95. Association for Computing Machinery (2005)
Google Scholar
Wu, G.M., Dou, Y., Wang, M.: High performance and memory efficient implementation of matrix multiplication on FPGAs. In: 2010 International Conference on Field-Programmable Technology, Beijing, pp. 134–137 (2010)
Google Scholar
Jia, X., Wu, G.M., Xie, X.H.: A high-performance accelerator for floating-point matrix multiplication. In: 2017 15th IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 396–402. IEEE, New York (2017)
Google Scholar
Qiao, Y.R., et al.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput.-Pract. Exp. 29(20), 20 (2017)
Article Google Scholar
Shen, J., et al.: Towards a multi-array architecture for accelerating large-scale matrix multiplication on FPGAs (2018)
Google Scholar
Zhang, L., et al.: A scalable architecture for accelerating multi-operation and continuous floating-point matrix computing on FPGAs. IEEE Access 8, 92469–92478 (2020)
Google Scholar
Tian, T.: The Research and Implementation of High Performance SIMD Floating-Point Multiplication Accumulator Unit for FT-XDSP. National University of Defense Technology (2013)
Google Scholar
Wang, W.Q., et al.: A universal FPGA-based floating-point matrix processor for mobile systems. In: Proceedings of the 2014 International Conference on Field-Programmable Technology, pp. 139–146. IEEE, New York (2014)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Science and Technology Major Project (2017-V-0014-0066).

Author information

Authors and Affiliations

State Key Laboratory of High Performance Computing, School of Computer, National University of Defense Technology, Changsha, China
Longlong Zhang, Yuanxi Peng & Ahui Huang
Institute of Microelectronics, School of Computer, National University of Defense Technology, Changsha, China
Longlong Zhang, Xiao Hu, Ahui Huang & Tian Tian

Authors

Longlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanxi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ahui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tian Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanxi Peng .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xin He
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
En Shao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Guangming Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Peng, Y., Hu, X., Huang, A., Tian, T. (2021). FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-79478-1_17
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)