Abstract
Transcendental nonlinear function design in deep neural accelerators principally concerns performance parameters such as area, power, delay, and throughput. Neural hardware demands resource-intensive blocks such as adders, multipliers, and nonlinear activation functions. This work addresses the issues related to the implementation of the activation function for deep neural accelerators. The proposed design implements an activation function unit with the help of stochastic computing along with clock gating techniques to reduce the active power dissipation in the hardware. But a complete deep neural network uses various activation functions in the hidden layers. To overcome the problem of implementing individual hardware design corresponding to each activation function, we have designed the streamlined composite activation function unit for neural accelerators (SCAN), which implements hyperbolic tangent and ReLU activation functions. The proposed method using stochastic computing along with clock gating is compared with other states of the art. The area is reduced by approximately 74.14\(\%\) as compared to that of CORDIC-based design. While implementing a single neuron, both area and power are reduced by manifold, enhancing the performance of deep neural accelerators. Testing accuracy and inference time are calculated using the benchmark dataset (MNIST) on AlexNet architecture. Testing accuracy in the proposed implementation is increased by 1.08\(\%\), and loss is reduced by 40.66\(\%\).
Similar content being viewed by others
Data Availability
Data sharing was not applicable to this article as no datasets were generated or analyzed during the current study, and detailed circuit simulation results are given in the manuscript.
References
A. Alaghi, W. Qian, J.P. Hayes, The promise and challenge of stochastic computing. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37(8), 1515–1531 (2017)
L. Benini, B. Alessandro, M. De Giovanni, A survey of design techniques for system-level dynamic power management. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8(3), 299–316 (2000)
L. Benini, D.M. Giovanni, Dynamic Power Management: Design Techniques and CAD Tools (Springer, 2012)
M. Bhasin, G.P.S. Raghava, Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine 22(23–24), 3195–3204 (2004)
B.D. Brown, H.C. Card, Stochastic neural computation. I. Computational elements. IEEE Trans. Comput. 50(9), 891–905 (2001)
C.H. Chang, , H.Y. Kao, S.H. Huang, May. Hardware implementation for multiple activation functions, in 2019 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW) (IEEE, 2019), pp. 1–2
G. Dinelli et al. Advantages and limitations of fully on-chip CNN FPGA-based hardware accelerator, in IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE, 2020)
M. Ercegovac, D. Kirovski, M. Potkonjak, Low-power behavioral synthesis optimization using multiple precision arithmetic, in Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361) (IEEE, 1999), pp. 568–573
B.R. Gaines, Stochastic computing systems, in Advances in Information Systems Science (Springer, Boston, 1969), pp. 37–172
K. Guo et al., Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37(1), 35–47 (2017)
J. Kathuria, M. Ayoubkhan, A. Noor, A review of clock gating techniques. MIT Int. J. Electron. Commun. Eng. 1(2), 106–114 (2011)
K. Leboeuf, A.H. Namin, R. Muscedere, H. Wu, M. Ahmadi, High speed VLSI implementation of the hyperbolic tangent sigmoid function, in 2008 Third International Conference on Convergence and Hybrid Information Technology, vol. 1 (IEEE, 2008), pp. 1070–1073
B. Lee, N. Burgess, Some results on Taylor-series function approximation on FPGA. Thrity-Seventh Asilomar Conf. Signals Syst. Comput. 2, 2198–2202 (2003). (IEEE)
J. Li, Z. Yuan, Z. Li, C. Ding, A. Ren, Q. Qiu, J. Draper, Y. Wang, Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks, in 2017 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2017), pp. 1230–1236
C.W. Lin, J.-S. Wang, A digital circuit design of hyperbolic tangent sigmoid function for neural networks, in 2008 IEEE International Symposium on Circuits and Systems (IEEE, 2008), pp. 856–859
J.Y.L. Low, C.C. Jong, A memory-efficient tables-and-additions method for accurate computation of elementary functions. IEEE Trans. Comput. 62(5), 858–872 (2012)
D.T. Nguyen, N.N. Tuan, H. Kim, H.-J. Lee, A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr. Debbie, Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4
G. Rajput, G. Raut, M. Chandra, S.K. Vishvakarma, VLSI implementation of transcendental function hyperbolic tangent for deep neural network accelerators. Microprocess. Microsyst. 84, 104270 (2021)
G. Rajput, S. Agrawal, G. Raut, S.K. Vishvakarma, An accurate and noninvasive skin cancer screening based on imaging technique. Int. J. Imaging Syst. Technol. (2021)
S.P.J.V. Rani, P. Kanagasabapathy, Multilayer perceptron neural network architecture using vhdl with combinational logic sigmoid function, in 2007 International Conference on Signal Processing, Communications and Networking (IEEE, 2007), pp. 404–409
F. Ratto, T. Fanni, L. Raffo, C. Sau, Mutual impact between clock gating and high level synthesis in reconfigurable hardware accelerators. Electronics 10(1), 73 (2021)
G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, RECON: resource-efficient CORDIC-based neuron architecture. IEEE Open J. Circuits Syst. 2, 170–181 (2021)
G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, A CORDIC based configurable activation function for ANN applications, in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (IEEE, 2020), pp. 78–83
B. Zamanlooy, M. Mirhassani, Efficient VLSI implementation of neural networks with hyperbolic tangent activation function. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 22(1), 39–48 (2013)
Acknowledgements
The authors would like to thank the Council of Scientific and Industrial Research (CSIR) New Delhi, Government of India, under JRF scheme for providing financial support and Special Manpower Development Program Chip to System Design, Department of Electronics and Information Technology (DeitY) under the Ministry of Communication and Information Technology, Government of India, for providing necessary research facilities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
SCAN uses Stochastic Computing to implement the Power-Efficient Hardware of Composite Activation-function-unit for Deep Neural Accelerators.
Rights and permissions
About this article
Cite this article
Rajput, G., Biyani, K.N., Logashree, V. et al. SCAN: Streamlined Composite Activation Function Unit for Deep Neural Accelerators. Circuits Syst Signal Process 41, 3465–3486 (2022). https://doi.org/10.1007/s00034-021-01947-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01947-8