Skip to main content

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

  • 1098 Accesses

Abstract

Matrix computing plays a vital role in many scientific and engineering applications, but previous work can only handle the data with specified precision based on FPGA. This study first presents algorithms, data flows, and mapping strategies to match the hardware structure for matrix computing of different precisions. Then, we propose a unified multi-precision matrix computing unit core that can handle three precisions and three matrix operation modes and can be used as a coprocessor for large-scale matrix computing which has advantages of low storage and high efficiency. Finally, we build a complete matrix computing acceleration system and deploy it on FPGA using 128 processing elements (PEs). The experimental results show that the accelerator achieves a maximum frequency of 180 MHz, and matrix computing of double-precision, single-precision, and half-precision floating-point data performs 46.1 GFLOPS, 92.1 GFLOPS, and 184.3 GFLOPS respectively, which is superior to other current designs in terms of application range and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amira, A., Bouridane, A., Milligan, P.: Accelerating matrix product on reconfigurable hardware for signal processing. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 101–111. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_11

    Chapter  Google Scholar 

  2. Bensaali, F., Amira, A., Bouridane, A.: Accelerating matrix product on reconfigurable hardware for image processing applications. IEE Proc. Circ. Devices Syst. 152, 236–246 (2005)

    Article  Google Scholar 

  3. Liu, Z.Q., et al.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. 10(3), 23 (2017)

    Article  Google Scholar 

  4. Jovanovic, Z., Milutinovic, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)

    Article  Google Scholar 

  5. Sonawane, D., Sutaone, M.S., Malek, I.: Systolic architecture for integer point matrix multiplication using FPGA, pp. 3822–3825 (2009)

    Google Scholar 

  6. Abbaszadeh, A., et al.: An scalable matrix computing unit architecture for FPGA, and SCUMO user design interface. Electronics 8(1), 20 (2019)

    Article  Google Scholar 

  7. Qasim, S.M., Abbasi, S.A., Almashary, B.: A proposed FPGA-based parallel architecture for matrix multiplication. In: Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (2008)

    Google Scholar 

  8. Zhou, L.T., et al.: Research on Systolic multiplication technology based on FPGA. Comput. Eng. Sci. 37, 1632–1636 (2015)

    Google Scholar 

  9. Jang, J.-W., Choi, S.B., Prasanna, V.K.: Energy- and time-efficient matrix multiplication on FPGAs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13(11), 1305–1319 (2005). https://doi.org/10.1109/TVLSI.2005.859562

    Article  Google Scholar 

  10. Zhuo, L., Prasanna, V.K.: Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Parallel Distrib. Syst. 18(4), 433–448 (2007)

    Article  Google Scholar 

  11. Kumar, V.B.Y., et al.: FPGA based high performance double-precision matrix multiplication. Int. J. Parallel Prog. 38(3), 322–338 (2010)

    Article  Google Scholar 

  12. Dou, Y., et al.: 64-bit floating-point FPGA matrix multiplication. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 86–95. Association for Computing Machinery (2005)

    Google Scholar 

  13. Wu, G.M., Dou, Y., Wang, M.: High performance and memory efficient implementation of matrix multiplication on FPGAs. In: 2010 International Conference on Field-Programmable Technology, Beijing, pp. 134–137 (2010)

    Google Scholar 

  14. Jia, X., Wu, G.M., Xie, X.H.: A high-performance accelerator for floating-point matrix multiplication. In: 2017 15th IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 396–402. IEEE, New York (2017)

    Google Scholar 

  15. Qiao, Y.R., et al.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput.-Pract. Exp. 29(20), 20 (2017)

    Article  Google Scholar 

  16. Shen, J., et al.: Towards a multi-array architecture for accelerating large-scale matrix multiplication on FPGAs (2018)

    Google Scholar 

  17. Zhang, L., et al.: A scalable architecture for accelerating multi-operation and continuous floating-point matrix computing on FPGAs. IEEE Access 8, 92469–92478 (2020)

    Google Scholar 

  18. Tian, T.: The Research and Implementation of High Performance SIMD Floating-Point Multiplication Accumulator Unit for FT-XDSP. National University of Defense Technology (2013)

    Google Scholar 

  19. Wang, W.Q., et al.: A universal FPGA-based floating-point matrix processor for mobile systems. In: Proceedings of the 2014 International Conference on Field-Programmable Technology, pp. 139–146. IEEE, New York (2014)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Science and Technology Major Project (2017-V-0014-0066).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanxi Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, L., Peng, Y., Hu, X., Huang, A., Tian, T. (2021). FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79478-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79477-4

  • Online ISBN: 978-3-030-79478-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics