A Precision-Aware Neuron Engine for DNN Accelerators

Vishwakarma, Sudheer; Raut, Gopal; Jaiswal, Sonu; Vishvakarma, Santosh Kumar; Ghai, Dhruva

doi:10.1007/s42979-024-02851-z

A Precision-Aware Neuron Engine for DNN Accelerators

Original Research
Published: 26 April 2024

Volume 5, article number 494, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Sudheer Vishwakarma¹^na1,
Gopal Raut²^na1,
Sonu Jaiswal²,
Santosh Kumar Vishvakarma² &
…
Dhruva Ghai ORCID: orcid.org/0000-0002-8204-6330¹

82 Accesses
Explore all metrics

Abstract

Deep Neural Networks (DNNs) form the backbone of contemporary deep learning, powering various artificial intelligence (AI) applications. However, their computational demands, primarily stemming from the resource-intensive Neuron Engine (NE), present a critical challenge. This NE comprises of Multiply-and-Accumulate (MAC) and Activation Function (AF) operations, contributing significantly to the overall computational overhead. To address these challenges, we propose a groundbreaking Precision-aware Neuron Engine (PNE) architecture, introducing a novel approach to low-bit and high-bit precision computations with minimal resource utilization. The PNE’s MAC unit stands out for its innovative pre-loading of the accumulator register with a bias value, eliminating the need for additional components like an extra adder, multiplexer, and bias register. This design achieves significant resource savings, with an 8-bit signed fixed-point implementation demonstrating notable reductions in resource utilization, critical delay, and power-delay product compared to conventional architectures. An 8-bit sfixed < N, q > implementation of the MAC in the PNE shows 29.23% savings in resource utilization and 32.91% savings in critical delay compared with IEEE architecture, and 24.91% savings in PDP (power-delay product) compared with booth architecture. Our comprehensive evaluation showcases the PNE’s efficacy in maintaining inferential accuracy across quantized and unquantized models. The proposed design not only achieves precision-awareness with a minimal increase (\(\approx\) 10%) in resource overhead, but also achieves a remarkable 34.61% increase in throughput and reduction in critical delay (34.37% faster than conventional design), highlighting its efficiency gains and superior performance in PNE computations. Software emulator shows minimal accuracy losses ranging from 0.6% to 1.6%, the PNE proves its versatility across different precisions and datasets, including MNIST (on LeNet) and ImageNet (on CaffeNet). The flexibility and configurability of the PNE make it a promising solution for precision-aware neuron processing, particularly in edge AI applications with stringent hardware constraints. This research contributes a pivotal advancement towards enhancing the efficiency of DNN computations through precision-aware architecture, paving the way for more resource-efficient and high-performance AI systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low- and Mixed-Precision Inference Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators

Article 16 May 2023

Data Availability

Data sharing is not applicable to this article as no data sets were generated or analyzed during the current study, and detailed circuit simulation results are given in the manuscript.

References

Sim H, Lee J. Cost-Effective Stochastic MAC circuits for Deep Neural Networks. Neural Netw. 2019;117:152–62.
Article Google Scholar
Khalil K, Eldash O, Kumar A, Bayoumi M. An efficient approach for neural network architecture. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2018;745–748. IEEE
Shawl MS, Singh A, Gaur N, Bathla S, Mehra A. Implementation of Area and Power Efficient Components of a MAC unit for DSP Processors. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 2018;1155–1159. IEEE.
Machupalli R, Hossain M, Mandal M. Review of ASIC Accelerators for Deep Neural Network. Microprocess Microsyst. 2022;89:104441.
Article Google Scholar
Merenda M, Porcaro C, Iero D. Edge machine learning for ai-enabled iot devices: A review. Sensors. 2020;20(9):2533.
Article Google Scholar
Shantharama P, Thyagaturu AS, Reisslein M. Hardware-accelerated platforms and infrastructures for network functions: A survey of enabling technologies and research studies. IEEE Access. 2020;8:132021–85.
Article Google Scholar
Hashemi S, Anthony N, Tann H, Bahar RI, Reda S. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017;1474–1479. IEEE.
Raut G, Rai S, Vishvakarma SK, Kumar A. RECON: Resource-Efficient CORDIC-based Neuron Architecture. IEEE Open Journal of Circuits and Systems. 2021;2:170–81.
Article Google Scholar
Garland J, Gregg D. Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing. ACM Transactions on Architecture and Code Optimization (TACO). 2018;15(3):1–24.
Article Google Scholar
Vishwakarma S, Raut G, Dhakad NS, Vishvakarma SK, Ghai D. A Configurable Activation Function for Variable Bit-Precision DNN Hardware Accelerators. In: IFIP International Internet of Things Conference, 2023;433–441. Springer.
Posewsky T, Ziener D. Efficient deep neural network acceleration through fpga-based batch processing. In: 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2016;1–8. IEEE.
Schmidhuber J. Deep Learning in Neural Networks: An overview. Neural Netw. 2015;61:85–117.
Article Google Scholar
Jelčicová Z, Mardari A, Andersson O, Kasapaki E, Sparsø J. A neural network engine for resource constrained embedded systems. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers, 2020;125–131. IEEE
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al. Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 2016;26–35.
Zhang Y, Suda N, Lai L, Chandra V. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 2017.
Cheng Y, Wang D, Zhou P, Zhang T. Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges. IEEE Signal Process Mag. 2018;35(1):126–36.
Article Google Scholar
Masadeh M, Hasan O, Tahar S. Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency. IEEE Access. 2019;7:147129–42.
Article Google Scholar
Krishna AV, Deepthi S, Nirmala Devi M. Design of 32-Bit MAC unit using Vedic Multiplier and XOR Logic. In: Proceedings of International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications, 2021;715–723. Springer.
Farrukh FUD, Zhang C, Jiang Y, Zhang Z, Wang Z, Wang Z, Jiang H. Power Efficient Tiny Yolo CNN using Reduced Hardware Resources based on Booth Multiplier and Wallace Tree Adders. IEEE Open Journal of Circuits and Systems. 2020;1:76–87.
Article Google Scholar
Johansson K. Low power and Low Complexity Shift-and-Add based Computations. PhD thesis, Linköping University Electronic Press 2008.
Gudovskiy DA, Rigazio L. Shiftcnn: Generalized Low-Precision Architecture for inference of Convolutional Neural Networks. arXiv preprint arXiv:1706.02393 2017.
Janveja M, Niranjan V. High performance Wallace tree multiplier using improved adder. ICTACT j microelectron. 2017;3(01):370–4.
Article Google Scholar
Yuvaraj M, Kailath BJ, Bhaskhar N. Design of optimized MAC unit using integrated vedic multiplier. In: 2017 International Conference on Microelectronic Devices, Circuits and Systems (ICMDCS), 2017;1–6. IEEE.
Sze V, Chen Y-H, Yang T-J, Emer JS. Efficient processing of deep neural networks: A tutorial and survey. Proc IEEE. 2017;105(12):2295–329.
Article Google Scholar
Sharma VP, Vishwakarma SK. Analysis and Implementation of MAC Unit for different Precisions. signal (\(\mu\)W) 70(120):240
Raut G, Biasizzo A, Dhakad N, Gupta N, Papa G, Vishvakarma SK. Data Multiplexed and Hardware Reused Architecture for Deep Neural Network Accelerator. Neurocomputing. 2022;486:147–59.
Article Google Scholar
Wuraola A, Patel N, Nguang SK. Efficient activation functions for embedded inference engines. Neurocomputing. 2021;442:73–88.
Article Google Scholar
Aggarwal S, Meher PK, Khare K. Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans Very Large Scale Integr VLSI Syst. 2015;24(4):1588–92.
Article Google Scholar
Lee J, et al. Unpu: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J Solid-State Circuits. 2018;54(1):173–85.
Article Google Scholar
Lin C-H, Wu A-Y. Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications. IEEE Trans Circuits Syst I Regul Pap. 2005;52(11):2385–96.
Article Google Scholar
Mohamed SM, et al. FPGA implementation of reconfigurable CORDIC algorithm and a memristive chaotic system with transcendental nonlinearities. IEEE Trans Circuits Syst I Regul Pap. 2022;69(7):2885–92.
Article Google Scholar
Prashanth H, Rao M. SOMALib: Library of Exact and Approximate Activation Functions for Hardware-efficient Neural Network Accelerators. In: 2022 IEEE 40th International Conference on Computer Design (ICCD), 2022;746–753. IEEE.
Mehra S, Raut G, Das R, Vishvakarma SK, Biasizzo A. An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning. IEEE Access 2023.
Alex K. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf 2009.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 2014.
Park J-S, Park C, Kwon S, Kim H-S, Jeon T, Kang Y, Lee H, Lee D, Kim J, Lee Y, Park S, Jang J-W, Ha S, Kim M, Bang J, Lim SH, Kang I. A Multi-Mode 8K-MAC HW-Utilization-Aware Neural Processing Unit with a Unified Multi-Precision Datapath in 4nm Flagship Mobile SoC. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), 2022;65:246–248.
Chang J-K, Lee H, Choi C-S. A Power-Aware Variable-Precision Multiply-Acumulate Unit. In: 2009 9th International Symposium on Communications and Information Technology, 2009;1336–1339.
Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org 2015.
Raut G, Mukala J, Sharma V, Vishvakarma SK. Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators. Circuits, Systems, and Signal Processing, 2023;1–27.
Multiplier v12.0 LogiCORE IP Product Guide. https://www.xilinx.com/support/documentation/ipdocumentation/multgen/v120/pg108-mult-gen.pdf
Venkataramani G, Goldstein SC. Slack Analysis in the System Design Loop. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2008;231–236.

Download references

Acknowledgements

This article is an extended version of our previous conference paper presented at [10].

Author information

Sudheer Vishwakarma and Gopal Raut have contributed equally to this work.

Authors and Affiliations

Department of Electronics and Communication Engineering, Oriental University, Indore, India
Sudheer Vishwakarma & Dhruva Ghai
Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
Gopal Raut, Sonu Jaiswal & Santosh Kumar Vishvakarma

Authors

Sudheer Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar
Gopal Raut
View author publications
You can also search for this author in PubMed Google Scholar
Sonu Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Vishvakarma
View author publications
You can also search for this author in PubMed Google Scholar
Dhruva Ghai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhruva Ghai.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest and there was no human or animal testing or participation involved in this research. All data were obtained from public domain sources.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vishwakarma, S., Raut, G., Jaiswal, S. et al. A Precision-Aware Neuron Engine for DNN Accelerators. SN COMPUT. SCI. 5, 494 (2024). https://doi.org/10.1007/s42979-024-02851-z

Download citation

Received: 13 January 2024
Accepted: 31 March 2024
Published: 26 April 2024
DOI: https://doi.org/10.1007/s42979-024-02851-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Precision-Aware Neuron Engine for DNN Accelerators

Abstract

Access this article

Similar content being viewed by others

Low- and Mixed-Precision Inference Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Precision-Aware Neuron Engine for DNN Accelerators

Abstract

Access this article

Similar content being viewed by others

Low- and Mixed-Precision Inference Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation