Skip to main content

Haica: A High Performance Computing & Artificial Intelligence Fused Computing Architecture

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Abstract

In recent years, HPC and AI fusion, which refers to applying AI technology to traditional HPC applications, has become a new trend. HPC and AI fusion requires supports for multiple precisions used in both domains. While, supporting multiple precisions in a single computing unit is not easy. Prior research typically targets on modifying a high-precision FMA to support low precisions. However, such efforts suffer from disadvantages of increased latency and limited compute throughput for low-precision operations, high area and power overhead, and limited support of new data formats appeared in AI domain. To address this issue, we propose Haica, a double-precision FMA and low-precision systolic array fused architecture, to achieve HPC and AI fusion. Our work has two innovations. First, we propose a low-cost multiple-low-precision FMA by taking advantage of the commonality among FP16, BF16, and TF32. Second, inspired by the idea of splicing high precision with low precision, we replace the multiply and merge modules in a double-precision FMA with a modified systolic array that is composed of the proposed low-precision FMAs. We implement the logic design of Haica using RTL and evaluate its overhead. Results show that compared to the naive combination of a double-precision FMA and single-half-mixed-precision systolic array, Haica provides extra support for BF16 and TF32 with only 7.87% area and 33.26% power overhead. This demonstrates that Haica achieves HPC and AI fusion in a cost-effective manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arunachalam, V., Raj, A.N.J., Hampannavar, N., Bidul, C.: Efficient dual-precision floating-point fused-multiply-add architecture. Microprocess. Microsyst. 57, 23–31 (2018)

    Article  Google Scholar 

  2. Chen, Z., Wu, T., Liu, X., Zheng, F., Ding, Y., Li, H.: Design and implementation of a multi-precision mixed floating point fused multiply add component. In: Proceedings of HPC China (2018). (in Chinese)

    Google Scholar 

  3. Choquette, J., Gandhi, W., Giroux, O., Stam, N., Krashinsky, R.: Nvidia a100 tensor core GPU: performance and innovation. IEEE Micro 41(2), 29–35 (2021)

    Article  Google Scholar 

  4. Dong, L., Wei, F., Xu, K., Liu, S., Zhou, M.: Adaptive multi-compositionality for recursive neural network models. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 422–431 (2015)

    Article  Google Scholar 

  5. Haidar, A., Tomov, S., Dongarra, J., Higham, N.J.: Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 603–613. IEEE (2018)

    Google Scholar 

  6. Han, Y., Zhang, G.J., Huang, X., Wang, Y.: A moist physics parameterization based on deep learning. J. Adv. Model. Earth Syst. 12(9), e2020MS002076 (2020)

    Google Scholar 

  7. Hokenek, E., Montoye, R.K., Cook, P.W.: Second-generation risc floating point with multiply-add fused. IEEE J. Solid-State Circuits 25(5), 1207–1213 (1990)

    Article  Google Scholar 

  8. Jia, W., et al.: Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020)

    Google Scholar 

  9. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)

    Google Scholar 

  10. Kalamkar, D., et al.: A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)

    Google Scholar 

  12. Kumar, V.P., Tsai, Y.C.: Designing linear systolic arrays. J. Parallel Distrib. Comput. 7(3), 441–463 (1989)

    Article  Google Scholar 

  13. Kurth, T., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 649–660. IEEE (2018)

    Google Scholar 

  14. Lang, T., Bruguera, J.D.: Floating-point fused multiply-add with reduced latency. In: Proceedings. In: IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp. 145–150. IEEE (2002)

    Google Scholar 

  15. Mohammadi, F.G., Shenavarmasouleh, F., Amini, M.H., Arabnia, H.R.: Malware detection using artificial bee colony algorithm. In: Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers, pp. 568–572 (2020)

    Google Scholar 

  16. Rajaraman, V.: IEEE standard for floating point numbers. Resonance 21(1), 11–30 (2016)

    Article  Google Scholar 

  17. Tannenbaum, D.C., Iyer, S.: Logic circuitry configurable to perform 32-bit or dual 16-bit floating-point operations, uS Patent 9,465,578 (11 October 2016)

    Google Scholar 

  18. Wu, T.: The research and implementation of high performance vector FMAC unit for LTE. Ph.D. thesis, National University of Defense Technology (2011). (in Chinese)

    Google Scholar 

  19. Xiao, Z., Xu, X., Xing, H., Luo, S., Dai, P., Zhan, D.: RTFN: a robust temporal feature network for time series classification. arXiv preprint arXiv:2011.11829 (2020)

  20. Xiao, Z., Xu, X., Xing, H., Song, F., Wang, X., Zhao, B.: A federated learning system with enhanced feature extraction for human activity recognition. Knowl.-Based Syst. 229, 107338 (2021)

    Article  Google Scholar 

  21. Zhang, H., Chen, D., Ko, S.B.: Efficient multiple-precision floating-point fused multiply-add with mixed-precision support. IEEE Trans. Comput. 68(7), 1035–1048 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  22. Zhang, H., Chen, D., Ko, S.B.: New flexible multiple-precision multiply-accumulate unit for deep neural network training and inference. IEEE Trans. Comput. 69(1), 26–38 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengbo Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Zheng, F., Guo, F., Yu, Q., Chen, Z. (2023). Haica: A High Performance Computing & Artificial Intelligence Fused Computing Architecture. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22677-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22676-2

  • Online ISBN: 978-3-031-22677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics