Abstract
Deep Neural Networks (DNNs) have shown great acceleration with low-precision computing. To enable low-precision on general-purpose processors for higher computing power, this paper proposes a novel RISC-V core design with multi-precision capability (namely MP-core) but in the flavor of low-precision. We propose two design styles, SIMD and multilevel, to build both integer and floating-point multipliers, and make detailed comparisons regarding their circuit, architecture, and performance beyond DNN applications. With proposed instruction extensions, we show that the SIMD MP-core reserves the single-precision core (namely, SP-core) design with little changes, but only has moderate performance gains. In contrast, the multilevel MP-core flavors low-precision by exploiting the spatial hardware parallelism and temporal instruction-level parallelism to the extreme. With microarchitecture support in the register file and instruction scheduling, the multilevel MP-core improves the performance of linear equation solving by 8.7\(\times \) and 3\(\times \) using different precisions over the SP-core. Our study demonstrates that a general-purpose processor can also have great performance gain from multi-precision computing without loss of generality but flavors low-precision if applications permit.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akkas, A., Schulte, M.J.: A quadruple precision and dual double precision floating-point multiplier. In: Euromicro Symposium on Digital System Design, 2003, Proceedings , pp. 76–81. IEEE (2003)
Anders, M., et al.: 2.9 tops/w reconfigurable dense/sparse matrix-multiply accelerator with unified int8/inti6/fp16 datapath in 14nm tri-gate cmos. In: 2018 IEEE Symposium on VLSI Circuits, pp. 39–40. IEEE (2018)
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perf. Comput. Appl. 21(4), 457–466 (2007)
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
Haidar, A., Wu, P., Tomov, S., Dongarra, J.: Investigating half precision arithmetic to accelerate dense linear system solvers. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, p. 10. ACM (2017)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding (2015). arXiv preprint arXiv:1510.00149
Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019)
Howard, A.G., et al.: Efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861
Huang, L., Shen, L., Dai, K., Wang, Z.: A new architecture for multiple-precision floating-point multiply-add fused unit design. In: 18th IEEE Symposium on Computer Arithmetic (ARITH’07), pp. 69–76. IEEE (2007)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50\(\times \) fewer parameters and\(<\) 0.5 mb model size (2016). arXiv preprint arXiv:1602.07360
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: ISCA, pp. 1–12 (2017)
Judd, P., Albericio, J., Hetherington, T., Aamodt, T.M., Moshovos, A.: Stripes: bit-serial deep neural network computing. In: MICRO, pp. 1–12. IEEE (2016)
Kalamkar, D., et al.: A study of bfloat16 for deep learning training (2019). arXiv preprint arXiv:1905.12322
Krithivasan, S., Schulte, M.J.: Multiplier architectures for media processing. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 2193–2197. IEEE (2003)
Lee, J., Kim, C., Kang, S., Shin, D., Kim, S., Yoo, H.-J.: UNPU: an energy-efficient deep neural network accelerator with fully variable weight bit precision. JSSC 54(1), 173–185 (2018)
Markidis, S., Wei Der Chien, S., Laure, E., Peng, I.B., Vetter, J.S.: Nvidia tensor core programmability, performance & precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 522–531. IEEE (2018)
Oklobdzija, V.G., Villeger, D.: Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE VLSI Syst. 3(2), 292–301 (1995)
Rosenfeld, P., Cooper-Balis, E., Jacob, B.: Dramsim2: a cycle accurate memory system simulator. IEEE Comput. Arch. Lett. 10(1), 16–19 (2011)
Sharify, S., Lascorz, A.D., Siu, K., Judd, P., Moshovos, A.: Loom: exploiting weight and activation precisions to accelerate convolutional neural networks. In: DAC, pp. 1–6. IEEE (2018)
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: ISCA, pp. 764–775. IEEE (2018)
Tan, D., Danysh, A., Liebelt, M.: Multiple-precision fixed-point vector multiply-accumulator using shared segmentation. In: Proceedings 2003 16th IEEE Symposium on Computer Arithmetic, pp. 12–19. IEEE (2003)
Tan, D., Lemonds, C.E., Schulte, M.J.: Low-power multiple-precision iterative floating-point multiplier with SIMD support. IEEE Trans. Comput. 58(2), 175–187 (2008)
Waterman, A., Lee, Y., Patterson, D.A., Asanovic, K.: The risc-v instruction set manual, volume i: Base user-level isa. EECS Department, UC Berkeley, Tech. Rep. UCB/EECS-2011-62, 116 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, L., Wang, Q., Jiang, J., Jing, N. (2022). Enabling Extreme High-Throughput Multi-precision Computing on General-Purpose Microprocessor. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2021, Volume 3. FTC 2021. Lecture Notes in Networks and Systems, vol 360. Springer, Cham. https://doi.org/10.1007/978-3-030-89912-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-89912-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89911-0
Online ISBN: 978-3-030-89912-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)