Abstract
Deep neural networks (DNNs) have emerged as a powerful and versatile set of techniques enabling unprecedented success on challenging artificial intelligence (AI) problems. However, the recent success of DNNs comes at the cost of high computational complexity using very large models, which often require 100s of MBs of data storage, ExaOps of computation, and immense bandwidth for data movement. Despite advances in computing systems, it still takes days to weeks to train state-of-the-art DNNs—which directly limits the pace of innovation. Approximate computing is gaining traction as a promising method to alleviate demanding computational complexity in DNNs. Exploiting their inherent resiliency, approximate computing aims to relax exactness constraints with the goal of obtaining significant gains in computational throughput while maintaining an acceptable quality of results. In this chapter, we review the wide spectrum of approximate computing techniques that have been successfully applied to DNNs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andri R, Cavigelli L, Rossi D, Benini L Yodann (2016) An ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, Piscataway, pp 236–241
Anwar S, Hwang K, Sung W (2015) Structured pruning of deep convolutional neural networks. http://arxiv.org/abs/1512.08571
Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 μj/86% cifar-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS. In: IEEE International on Solid-State Circuits Conference-(ISSCC), 2018. IEEE, Piscataway, pp 222–224
Cai Z, He X, Sun J, Vasconcelos N (2017) Deep learning with low precision by half-wave Gaussian quantization. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision (ICCV), 2015. IEEE, Piscataway, pp 2722–2730
Chen CY, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2017) Adacomp: adaptive residual gradient compression for data-parallel distributed training. arXiv preprint arXiv:1712.02679
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th annual Design Automation Conference, DAC ’13. ACM, New York, pp 113:1–113:9. https://doi.org/10.1145/2463209.2488873. http://doi.acm.org/10.1145/2463209.2488873
Choi J, Wang Z, Venkataramani S, Chuang PIJ, Srinivasan V, Gopalakrishnan K (2018) Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085
Courbariaux M, Bengio Y, David J (2015) BinaryConnect: training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363
Das D, Mellempudi N, Mudigere D, Kalamkar D, Avancha S, Banerjee K, Sridharan S, Vaidyanathan K, Kaul B, Georganas E, et al. (2018) Mixed precision training of convolutional neural networks using integer operations. arXiv preprint arXiv:1802.00930
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV, et al. (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231
Esser S, Merolla P, Arthur J, Cassidy A, Appuswamy R, Andreopoulos A, Berg D, McKinstry J, Melano T, Barch D, et al. (2016) Convolutional networks for fast, energy-efficient neuromorphic computing. Proc Natl Acad Sci 113:11441–11446. Preprint on ArXiv. http://arxiv.org/abs/1603.08270
Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, et al. (2018) A scalable multi-teraops deep learning processor core for ai training and inference. In: Symposium on VLSI circuits, 2018. IEEE, Piscataway
Ganapathy S, Venkataramani S, Ravindran B, Raghunathan A (2017) Dyvedeep: dynamic variable effort deep neural networks. CoRR abs/1704.01137. http://arxiv.org/abs/1704.01137
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. CoRR abs/1705.03122
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: International conference on machine learning, pp 1737–1746
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. http://arxiv.org/abs/1510.00149
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks, pp 1–9. https://doi.org/10.1016/S0140-6736(95)92525-2. http://arxiv.org/abs/1506.02626
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, Piscataway, pp 243–254
Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, et al. (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, New York, pp 75–84
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. http://arxiv.org/abs/1707.06168
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Neural Information Processing Systems (NIPS), pp 4107–4115
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. CoRR abs/1609.07061
Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: IEEE workshop on Signal Processing Systems (SiPS), pp 1–6
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456
Jain S, Venkataramani S, Srinivasan V, Choi J, Chuang P, Chang L (2018): Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors. In: Proceedings of the 55th annual Design Automation Conference, DAC’18. ACM, New York
Kim M, Smaragdis P (2015) Bitwise neural networks. In: ICML workshop on resource-efficient machine learning
Köster U, Webb T, Wang X, Nassar M, Bansal AK, Constable W, Elibol O, Gray S, Hall S, Hornof L, et al. (2017) Flexpoint: an adaptive numerical format for efficient training of deep neural networks. In: Advances in neural information processing systems, pp 1742–1752
Li F, Liu B (2016) Ternary weight networks. CoRR abs/1605.04711. http://arxiv.org/abs/1605.04711
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient ConvNets. http://arxiv.org/abs/1608.08710
Lin Y, Han S, Mao H, Wang Y, Dally WJ (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887
Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the regularity of sparse structure in convolutional neural networks. http://arxiv.org/abs/1705.08922
McDonnell MD (2018) Training wide residual networks for deployment using a single bit for each weight. In: ICLR
Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaev O, Venkatesh G, et al. (2017) Mixed precision training. arXiv preprint arXiv:1710.03740
Mishra A, Marr D (2018) Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: International conference on learning representations
Mishra A, Nurvitadhi E, Cook JJ, Marr D (2018) WRPN: wide reduced-precision networks. ICLR
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. http://arxiv.org/abs/1611.06440
Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (FPT), 2016. IEEE, Piscataway, pp 77–84
Panda P, Sengupta A, Roy K (2017) Energy-efficient and improved image recognition with conditional deep learning. J Emerg Technol Comput Syst 13(3):33:1–33:21. https://doi.org/10.1145/3007192. http://doi.acm.org/10.1145/3007192
Panda P, Venkataramani S, Sengupta A, Raghunathan A, Roy K (2017) Energy-efficient object detection using semantic decomposition. IEEE Trans Very Large Scale Integr VLSI Syst 25(9):2673–2677. https://doi.org/10.1109/TVLSI.2017.2707077
Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp 27–40
Park Y, Kellis M (2015) Deep learning for regulatory genomics. Nat Biotechnol 33(8):825
Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)
Park J, Kung J, Yi W, Kim JJ (2018) Maximizing system performance by balancing computation loads in LSTM accelerators. In: Design, Automation & Test in Europe conference & exhibition (DATE), 2018. IEEE, Piscataway, pp 7–12
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES 2016, Pittsburgh, Pennsylvania, USA, October 1–7, 2016, pp 34:1–34:10. https://doi.org/10.1145/2968456.2968458. http://doi.acm.org/10.1145/2968456.2968458
Venkataramani S, Bahl V, Hua XS, Liu J, Li J, Phillipose M, Priyantha B, Shoaib M (2015) Sapphire: an always-on context-aware computer vision system for portable devices. In: 2015 Design, Automation Test in Europe conference exhibition (DATE), pp 1491–1496. https://doi.org/10.7873/DATE.2015.0369
Venkataramani S, Raghunathan A, Liu J, Shoaib M (2015): Scalable-effort classifiers for energy-efficient machine learning. In: Proceedings of the 52nd annual Design Automation Conference, DAC’15. ACM, New York, pp 67:1–67:6. https://doi.org/10.1145/2744769.2744904. http://doi.acm.org/10.1145/2744769.2744904
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I-511–I-518. https://doi.org/10.1109/CVPR.2001.990517
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. http://arxiv.org/abs/1608.03665
Wu S, Li G, Chen F, Shi L (2018) Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Zhang Y, Pezeshki M, Brakel P, Zhang S, Laurent C, Bengio Y, Courville AC (2017) Towards end-to-end speech recognition with deep convolutional neural networks. CoRR abs/1701.02720
Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160
Zhou S, Wang Y, Wen H, He Q, Zou Y (2017) Balanced quantization: an effective and efficient approach to quantized neural networks. CoRR abs/1706.07145
Zhu M, Gupta S (2017) To prune, or not to prune: exploring the efficacy of pruning for model compression. http://arxiv.org/abs/1710.01878
Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: International Conference on Learning Representations (ICLR)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Choi, J., Venkataramani, S. (2019). Approximate Computing Techniques for Deep Neural Networks. In: Reda, S., Shafique, M. (eds) Approximate Circuits. Springer, Cham. https://doi.org/10.1007/978-3-319-99322-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-99322-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99321-8
Online ISBN: 978-3-319-99322-5
eBook Packages: EngineeringEngineering (R0)