Skip to main content
Log in

A Precision-Aware Neuron Engine for DNN Accelerators

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Deep Neural Networks (DNNs) form the backbone of contemporary deep learning, powering various artificial intelligence (AI) applications. However, their computational demands, primarily stemming from the resource-intensive Neuron Engine (NE), present a critical challenge. This NE comprises of Multiply-and-Accumulate (MAC) and Activation Function (AF) operations, contributing significantly to the overall computational overhead. To address these challenges, we propose a groundbreaking Precision-aware Neuron Engine (PNE) architecture, introducing a novel approach to low-bit and high-bit precision computations with minimal resource utilization. The PNE’s MAC unit stands out for its innovative pre-loading of the accumulator register with a bias value, eliminating the need for additional components like an extra adder, multiplexer, and bias register. This design achieves significant resource savings, with an 8-bit signed fixed-point implementation demonstrating notable reductions in resource utilization, critical delay, and power-delay product compared to conventional architectures. An 8-bit sfixed < N, q > implementation of the MAC in the PNE shows 29.23% savings in resource utilization and 32.91% savings in critical delay compared with IEEE architecture, and 24.91% savings in PDP (power-delay product) compared with booth architecture. Our comprehensive evaluation showcases the PNE’s efficacy in maintaining inferential accuracy across quantized and unquantized models. The proposed design not only achieves precision-awareness with a minimal increase (\(\approx\) 10%) in resource overhead, but also achieves a remarkable 34.61% increase in throughput and reduction in critical delay (34.37% faster than conventional design), highlighting its efficiency gains and superior performance in PNE computations. Software emulator shows minimal accuracy losses ranging from 0.6% to 1.6%, the PNE proves its versatility across different precisions and datasets, including MNIST (on LeNet) and ImageNet (on CaffeNet). The flexibility and configurability of the PNE make it a promising solution for precision-aware neuron processing, particularly in edge AI applications with stringent hardware constraints. This research contributes a pivotal advancement towards enhancing the efficiency of DNN computations through precision-aware architecture, paving the way for more resource-efficient and high-performance AI systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no data sets were generated or analyzed during the current study, and detailed circuit simulation results are given in the manuscript.

References

  1. Sim H, Lee J. Cost-Effective Stochastic MAC circuits for Deep Neural Networks. Neural Netw. 2019;117:152–62.

    Article  Google Scholar 

  2. Khalil K, Eldash O, Kumar A, Bayoumi M. An efficient approach for neural network architecture. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2018;745–748. IEEE

  3. Shawl MS, Singh A, Gaur N, Bathla S, Mehra A. Implementation of Area and Power Efficient Components of a MAC unit for DSP Processors. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 2018;1155–1159. IEEE.

  4. Machupalli R, Hossain M, Mandal M. Review of ASIC Accelerators for Deep Neural Network. Microprocess Microsyst. 2022;89:104441.

    Article  Google Scholar 

  5. Merenda M, Porcaro C, Iero D. Edge machine learning for ai-enabled iot devices: A review. Sensors. 2020;20(9):2533.

    Article  Google Scholar 

  6. Shantharama P, Thyagaturu AS, Reisslein M. Hardware-accelerated platforms and infrastructures for network functions: A survey of enabling technologies and research studies. IEEE Access. 2020;8:132021–85.

    Article  Google Scholar 

  7. Hashemi S, Anthony N, Tann H, Bahar RI, Reda S. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017;1474–1479. IEEE.

  8. Raut G, Rai S, Vishvakarma SK, Kumar A. RECON: Resource-Efficient CORDIC-based Neuron Architecture. IEEE Open Journal of Circuits and Systems. 2021;2:170–81.

    Article  Google Scholar 

  9. Garland J, Gregg D. Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing. ACM Transactions on Architecture and Code Optimization (TACO). 2018;15(3):1–24.

    Article  Google Scholar 

  10. Vishwakarma S, Raut G, Dhakad NS, Vishvakarma SK, Ghai D. A Configurable Activation Function for Variable Bit-Precision DNN Hardware Accelerators. In: IFIP International Internet of Things Conference, 2023;433–441. Springer.

  11. Posewsky T, Ziener D. Efficient deep neural network acceleration through fpga-based batch processing. In: 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2016;1–8. IEEE.

  12. Schmidhuber J. Deep Learning in Neural Networks: An overview. Neural Netw. 2015;61:85–117.

    Article  Google Scholar 

  13. Jelčicová Z, Mardari A, Andersson O, Kasapaki E, Sparsø J. A neural network engine for resource constrained embedded systems. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers, 2020;125–131. IEEE

  14. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al. Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 2016;26–35.

  15. Zhang Y, Suda N, Lai L, Chandra V. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 2017.

  16. Cheng Y, Wang D, Zhou P, Zhang T. Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges. IEEE Signal Process Mag. 2018;35(1):126–36.

    Article  Google Scholar 

  17. Masadeh M, Hasan O, Tahar S. Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency. IEEE Access. 2019;7:147129–42.

    Article  Google Scholar 

  18. Krishna AV, Deepthi S, Nirmala Devi M. Design of 32-Bit MAC unit using Vedic Multiplier and XOR Logic. In: Proceedings of International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications, 2021;715–723. Springer.

  19. Farrukh FUD, Zhang C, Jiang Y, Zhang Z, Wang Z, Wang Z, Jiang H. Power Efficient Tiny Yolo CNN using Reduced Hardware Resources based on Booth Multiplier and Wallace Tree Adders. IEEE Open Journal of Circuits and Systems. 2020;1:76–87.

    Article  Google Scholar 

  20. Johansson K. Low power and Low Complexity Shift-and-Add based Computations. PhD thesis, Linköping University Electronic Press 2008.

  21. Gudovskiy DA, Rigazio L. Shiftcnn: Generalized Low-Precision Architecture for inference of Convolutional Neural Networks. arXiv preprint arXiv:1706.02393 2017.

  22. Janveja M, Niranjan V. High performance Wallace tree multiplier using improved adder. ICTACT j microelectron. 2017;3(01):370–4.

    Article  Google Scholar 

  23. Yuvaraj M, Kailath BJ, Bhaskhar N. Design of optimized MAC unit using integrated vedic multiplier. In: 2017 International Conference on Microelectronic Devices, Circuits and Systems (ICMDCS), 2017;1–6. IEEE.

  24. Sze V, Chen Y-H, Yang T-J, Emer JS. Efficient processing of deep neural networks: A tutorial and survey. Proc IEEE. 2017;105(12):2295–329.

    Article  Google Scholar 

  25. Sharma VP, Vishwakarma SK. Analysis and Implementation of MAC Unit for different Precisions. signal (\(\mu\)W) 70(120):240

  26. Raut G, Biasizzo A, Dhakad N, Gupta N, Papa G, Vishvakarma SK. Data Multiplexed and Hardware Reused Architecture for Deep Neural Network Accelerator. Neurocomputing. 2022;486:147–59.

    Article  Google Scholar 

  27. Wuraola A, Patel N, Nguang SK. Efficient activation functions for embedded inference engines. Neurocomputing. 2021;442:73–88.

    Article  Google Scholar 

  28. Aggarwal S, Meher PK, Khare K. Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans Very Large Scale Integr VLSI Syst. 2015;24(4):1588–92.

    Article  Google Scholar 

  29. Lee J, et al. Unpu: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J Solid-State Circuits. 2018;54(1):173–85.

    Article  Google Scholar 

  30. Lin C-H, Wu A-Y. Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications. IEEE Trans Circuits Syst I Regul Pap. 2005;52(11):2385–96.

    Article  Google Scholar 

  31. Mohamed SM, et al. FPGA implementation of reconfigurable CORDIC algorithm and a memristive chaotic system with transcendental nonlinearities. IEEE Trans Circuits Syst I Regul Pap. 2022;69(7):2885–92.

    Article  Google Scholar 

  32. Prashanth H, Rao M. SOMALib: Library of Exact and Approximate Activation Functions for Hardware-efficient Neural Network Accelerators. In: 2022 IEEE 40th International Conference on Computer Design (ICCD), 2022;746–753. IEEE.

  33. Mehra S, Raut G, Das R, Vishvakarma SK, Biasizzo A. An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning. IEEE Access 2023.

  34. Alex K. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf 2009.

  35. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

    Article  Google Scholar 

  36. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 2014.

  37. Park J-S, Park C, Kwon S, Kim H-S, Jeon T, Kang Y, Lee H, Lee D, Kim J, Lee Y, Park S, Jang J-W, Ha S, Kim M, Bang J, Lim SH, Kang I. A Multi-Mode 8K-MAC HW-Utilization-Aware Neural Processing Unit with a Unified Multi-Precision Datapath in 4nm Flagship Mobile SoC. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), 2022;65:246–248.

  38. Chang J-K, Lee H, Choi C-S. A Power-Aware Variable-Precision Multiply-Acumulate Unit. In: 2009 9th International Symposium on Communications and Information Technology, 2009;1336–1339.

  39. Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org 2015.

  40. Raut G, Mukala J, Sharma V, Vishvakarma SK. Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators. Circuits, Systems, and Signal Processing, 2023;1–27.

  41. Multiplier v12.0 LogiCORE IP Product Guide. https://www.xilinx.com/support/documentation/ipdocumentation/multgen/v120/pg108-mult-gen.pdf

  42. Venkataramani G, Goldstein SC. Slack Analysis in the System Design Loop. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2008;231–236.

Download references

Acknowledgements

This article is an extended version of our previous conference paper presented at [10].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhruva Ghai.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest and there was no human or animal testing or participation involved in this research. All data were obtained from public domain sources.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vishwakarma, S., Raut, G., Jaiswal, S. et al. A Precision-Aware Neuron Engine for DNN Accelerators. SN COMPUT. SCI. 5, 494 (2024). https://doi.org/10.1007/s42979-024-02851-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02851-z

Keywords

Navigation