Approximate Computing Techniques for Deep Neural Networks

Choi, Jungwook; Venkataramani, Swagath

doi:10.1007/978-3-319-99322-5_15

Approximate Computing Techniques for Deep Neural Networks

Jungwook Choi³ &
Swagath Venkataramani³

Chapter
First Online: 06 December 2018

1986 Accesses
5 Citations

Abstract

Deep neural networks (DNNs) have emerged as a powerful and versatile set of techniques enabling unprecedented success on challenging artificial intelligence (AI) problems. However, the recent success of DNNs comes at the cost of high computational complexity using very large models, which often require 100s of MBs of data storage, ExaOps of computation, and immense bandwidth for data movement. Despite advances in computing systems, it still takes days to weeks to train state-of-the-art DNNs—which directly limits the pace of innovation. Approximate computing is gaining traction as a promising method to alleviate demanding computational complexity in DNNs. Exploiting their inherent resiliency, approximate computing aims to relax exactness constraints with the goal of obtaining significant gains in computational throughput while maintaining an acceptable quality of results. In this chapter, we review the wide spectrum of approximate computing techniques that have been successfully applied to DNNs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andri R, Cavigelli L, Rossi D, Benini L Yodann (2016) An ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, Piscataway, pp 236–241
Google Scholar
Anwar S, Hwang K, Sung W (2015) Structured pruning of deep convolutional neural networks. http://arxiv.org/abs/1512.08571
Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 μj/86% cifar-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS. In: IEEE International on Solid-State Circuits Conference-(ISSCC), 2018. IEEE, Piscataway, pp 222–224
Chapter Google Scholar
Cai Z, He X, Sun J, Vasconcelos N (2017) Deep learning with low precision by half-wave Gaussian quantization. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)
Google Scholar
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision (ICCV), 2015. IEEE, Piscataway, pp 2722–2730
Chapter Google Scholar
Chen CY, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2017) Adacomp: adaptive residual gradient compression for data-parallel distributed training. arXiv preprint arXiv:1712.02679
Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th annual Design Automation Conference, DAC ’13. ACM, New York, pp 113:1–113:9. https://doi.org/10.1145/2463209.2488873. http://doi.acm.org/10.1145/2463209.2488873
Choi J, Wang Z, Venkataramani S, Chuang PIJ, Srinivasan V, Gopalakrishnan K (2018) Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085
Google Scholar
Courbariaux M, Bengio Y, David J (2015) BinaryConnect: training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363
Google Scholar
Das D, Mellempudi N, Mudigere D, Kalamkar D, Avancha S, Banerjee K, Sridharan S, Vaidyanathan K, Kaul B, Georganas E, et al. (2018) Mixed precision training of convolutional neural networks using integer operations. arXiv preprint arXiv:1802.00930
Google Scholar
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV, et al. (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231
Google Scholar
Esser S, Merolla P, Arthur J, Cassidy A, Appuswamy R, Andreopoulos A, Berg D, McKinstry J, Melano T, Barch D, et al. (2016) Convolutional networks for fast, energy-efficient neuromorphic computing. Proc Natl Acad Sci 113:11441–11446. Preprint on ArXiv. http://arxiv.org/abs/1603.08270
Article Google Scholar
Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, et al. (2018) A scalable multi-teraops deep learning processor core for ai training and inference. In: Symposium on VLSI circuits, 2018. IEEE, Piscataway
Google Scholar
Ganapathy S, Venkataramani S, Ravindran B, Raghunathan A (2017) Dyvedeep: dynamic variable effort deep neural networks. CoRR abs/1704.01137. http://arxiv.org/abs/1704.01137
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. CoRR abs/1705.03122
Google Scholar
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: International conference on machine learning, pp 1737–1746
Google Scholar
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. http://arxiv.org/abs/1510.00149
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks, pp 1–9. https://doi.org/10.1016/S0140-6736(95)92525-2. http://arxiv.org/abs/1506.02626
Article Google Scholar
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, Piscataway, pp 243–254
Chapter Google Scholar
Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, et al. (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, New York, pp 75–84
Chapter Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Google Scholar
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. http://arxiv.org/abs/1707.06168
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Neural Information Processing Systems (NIPS), pp 4107–4115
Google Scholar
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. CoRR abs/1609.07061
Google Scholar
Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: IEEE workshop on Signal Processing Systems (SiPS), pp 1–6
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456
Google Scholar
Jain S, Venkataramani S, Srinivasan V, Choi J, Chuang P, Chang L (2018): Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors. In: Proceedings of the 55th annual Design Automation Conference, DAC’18. ACM, New York
Google Scholar
Kim M, Smaragdis P (2015) Bitwise neural networks. In: ICML workshop on resource-efficient machine learning
Google Scholar
Köster U, Webb T, Wang X, Nassar M, Bansal AK, Constable W, Elibol O, Gray S, Hall S, Hornof L, et al. (2017) Flexpoint: an adaptive numerical format for efficient training of deep neural networks. In: Advances in neural information processing systems, pp 1742–1752
Google Scholar
Li F, Liu B (2016) Ternary weight networks. CoRR abs/1605.04711. http://arxiv.org/abs/1605.04711
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient ConvNets. http://arxiv.org/abs/1608.08710
Lin Y, Han S, Mao H, Wang Y, Dally WJ (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887
Google Scholar
Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the regularity of sparse structure in convolutional neural networks. http://arxiv.org/abs/1705.08922
McDonnell MD (2018) Training wide residual networks for deployment using a single bit for each weight. In: ICLR
Google Scholar
Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaev O, Venkatesh G, et al. (2017) Mixed precision training. arXiv preprint arXiv:1710.03740
Google Scholar
Mishra A, Marr D (2018) Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: International conference on learning representations
Google Scholar
Mishra A, Nurvitadhi E, Cook JJ, Marr D (2018) WRPN: wide reduced-precision networks. ICLR
Google Scholar
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. http://arxiv.org/abs/1611.06440
Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (FPT), 2016. IEEE, Piscataway, pp 77–84
Chapter Google Scholar
Panda P, Sengupta A, Roy K (2017) Energy-efficient and improved image recognition with conditional deep learning. J Emerg Technol Comput Syst 13(3):33:1–33:21. https://doi.org/10.1145/3007192. http://doi.acm.org/10.1145/3007192
Article Google Scholar
Panda P, Venkataramani S, Sengupta A, Raghunathan A, Roy K (2017) Energy-efficient object detection using semantic decomposition. IEEE Trans Very Large Scale Integr VLSI Syst 25(9):2673–2677. https://doi.org/10.1109/TVLSI.2017.2707077
Article Google Scholar
Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp 27–40
Google Scholar
Park Y, Kellis M (2015) Deep learning for regulatory genomics. Nat Biotechnol 33(8):825
Article Google Scholar
Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)
Google Scholar
Park J, Kung J, Yi W, Kim JJ (2018) Maximizing system performance by balancing computation loads in LSTM accelerators. In: Design, Automation & Test in Europe conference & exhibition (DATE), 2018. IEEE, Piscataway, pp 7–12
Chapter Google Scholar
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Google Scholar
Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES 2016, Pittsburgh, Pennsylvania, USA, October 1–7, 2016, pp 34:1–34:10. https://doi.org/10.1145/2968456.2968458. http://doi.acm.org/10.1145/2968456.2968458
Venkataramani S, Bahl V, Hua XS, Liu J, Li J, Phillipose M, Priyantha B, Shoaib M (2015) Sapphire: an always-on context-aware computer vision system for portable devices. In: 2015 Design, Automation Test in Europe conference exhibition (DATE), pp 1491–1496. https://doi.org/10.7873/DATE.2015.0369
Venkataramani S, Raghunathan A, Liu J, Shoaib M (2015): Scalable-effort classifiers for energy-efficient machine learning. In: Proceedings of the 52nd annual Design Automation Conference, DAC’15. ACM, New York, pp 67:1–67:6. https://doi.org/10.1145/2744769.2744904. http://doi.acm.org/10.1145/2744769.2744904
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I-511–I-518. https://doi.org/10.1109/CVPR.2001.990517
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. http://arxiv.org/abs/1608.03665
Wu S, Li G, Chen F, Shi L (2018) Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680
Google Scholar
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Google Scholar
Zhang Y, Pezeshki M, Brakel P, Zhang S, Laurent C, Bengio Y, Courville AC (2017) Towards end-to-end speech recognition with deep convolutional neural networks. CoRR abs/1701.02720
Google Scholar
Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160
Google Scholar
Zhou S, Wang Y, Wen H, He Q, Zou Y (2017) Balanced quantization: an effective and efficient approach to quantized neural networks. CoRR abs/1706.07145
Google Scholar
Zhu M, Gupta S (2017) To prune, or not to prune: exploring the efficacy of pruning for model compression. http://arxiv.org/abs/1710.01878
Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: International Conference on Learning Representations (ICLR)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Jungwook Choi & Swagath Venkataramani

Authors

Jungwook Choi
View author publications
You can also search for this author in PubMed Google Scholar
Swagath Venkataramani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jungwook Choi .

Editor information

Editors and Affiliations

Brown University, Rhode Island, Providence, USA
Sherief Reda
Vienna University of Technology, Wien, Wien, Austria
Muhammad Shafique

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Choi, J., Venkataramani, S. (2019). Approximate Computing Techniques for Deep Neural Networks. In: Reda, S., Shafique, M. (eds) Approximate Circuits. Springer, Cham. https://doi.org/10.1007/978-3-319-99322-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-99322-5_15
Published: 06 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99321-8
Online ISBN: 978-3-319-99322-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics