Approximate Computing Techniques for Deep Neural Networks

  • Jungwook ChoiEmail author
  • Swagath Venkataramani


Deep neural networks (DNNs) have emerged as a powerful and versatile set of techniques enabling unprecedented success on challenging artificial intelligence (AI) problems. However, the recent success of DNNs comes at the cost of high computational complexity using very large models, which often require 100s of MBs of data storage, ExaOps of computation, and immense bandwidth for data movement. Despite advances in computing systems, it still takes days to weeks to train state-of-the-art DNNs—which directly limits the pace of innovation. Approximate computing is gaining traction as a promising method to alleviate demanding computational complexity in DNNs. Exploiting their inherent resiliency, approximate computing aims to relax exactness constraints with the goal of obtaining significant gains in computational throughput while maintaining an acceptable quality of results. In this chapter, we review the wide spectrum of approximate computing techniques that have been successfully applied to DNNs.


  1. 1.
    Andri R, Cavigelli L, Rossi D, Benini L Yodann (2016) An ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, Piscataway, pp 236–241Google Scholar
  2. 2.
    Anwar S, Hwang K, Sung W (2015) Structured pruning of deep convolutional neural networks.
  3. 3.
    Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 μj/86% cifar-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS. In: IEEE International on Solid-State Circuits Conference-(ISSCC), 2018. IEEE, Piscataway, pp 222–224CrossRefGoogle Scholar
  4. 4.
    Cai Z, He X, Sun J, Vasconcelos N (2017) Deep learning with low precision by half-wave Gaussian quantization. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  5. 5.
    Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision (ICCV), 2015. IEEE, Piscataway, pp 2722–2730CrossRefGoogle Scholar
  6. 6.
    Chen CY, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2017) Adacomp: adaptive residual gradient compression for data-parallel distributed training. arXiv preprint arXiv:1712.02679Google Scholar
  7. 7.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRefGoogle Scholar
  8. 8.
    Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th annual Design Automation Conference, DAC ’13. ACM, New York, pp 113:1–113:9.
  9. 9.
    Choi J, Wang Z, Venkataramani S, Chuang PIJ, Srinivasan V, Gopalakrishnan K (2018) Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085Google Scholar
  10. 10.
    Courbariaux M, Bengio Y, David J (2015) BinaryConnect: training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363Google Scholar
  11. 11.
    Das D, Mellempudi N, Mudigere D, Kalamkar D, Avancha S, Banerjee K, Sridharan S, Vaidyanathan K, Kaul B, Georganas E, et al. (2018) Mixed precision training of convolutional neural networks using integer operations. arXiv preprint arXiv:1802.00930Google Scholar
  12. 12.
    Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV, et al. (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231Google Scholar
  13. 13.
    Esser S, Merolla P, Arthur J, Cassidy A, Appuswamy R, Andreopoulos A, Berg D, McKinstry J, Melano T, Barch D, et al. (2016) Convolutional networks for fast, energy-efficient neuromorphic computing. Proc Natl Acad Sci 113:11441–11446. Preprint on ArXiv. CrossRefGoogle Scholar
  14. 14.
    Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, et al. (2018) A scalable multi-teraops deep learning processor core for ai training and inference. In: Symposium on VLSI circuits, 2018. IEEE, PiscatawayGoogle Scholar
  15. 15.
    Ganapathy S, Venkataramani S, Ravindran B, Raghunathan A (2017) Dyvedeep: dynamic variable effort deep neural networks. CoRR abs/1704.01137.
  16. 16.
    Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. CoRR abs/1705.03122 Google Scholar
  17. 17.
    Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: International conference on machine learning, pp 1737–1746Google Scholar
  18. 18.
    Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding.
  19. 19.
    Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks, pp 1–9. CrossRefGoogle Scholar
  20. 20.
    Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, Piscataway, pp 243–254CrossRefGoogle Scholar
  21. 21.
    Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, et al. (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, New York, pp 75–84CrossRefGoogle Scholar
  22. 22.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778Google Scholar
  23. 23.
    He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks.
  24. 24.
    Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Neural Information Processing Systems (NIPS), pp 4107–4115Google Scholar
  25. 25.
    Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. CoRR abs/1609.07061 Google Scholar
  26. 26.
    Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: IEEE workshop on Signal Processing Systems (SiPS), pp 1–6Google Scholar
  27. 27.
    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456Google Scholar
  28. 28.
    Jain S, Venkataramani S, Srinivasan V, Choi J, Chuang P, Chang L (2018): Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors. In: Proceedings of the 55th annual Design Automation Conference, DAC’18. ACM, New YorkGoogle Scholar
  29. 29.
    Kim M, Smaragdis P (2015) Bitwise neural networks. In: ICML workshop on resource-efficient machine learningGoogle Scholar
  30. 30.
    Köster U, Webb T, Wang X, Nassar M, Bansal AK, Constable W, Elibol O, Gray S, Hall S, Hornof L, et al. (2017) Flexpoint: an adaptive numerical format for efficient training of deep neural networks. In: Advances in neural information processing systems, pp 1742–1752Google Scholar
  31. 31.
    Li F, Liu B (2016) Ternary weight networks. CoRR abs/1605.04711.
  32. 32.
    Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient ConvNets.
  33. 33.
    Lin Y, Han S, Mao H, Wang Y, Dally WJ (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887Google Scholar
  34. 34.
    Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the regularity of sparse structure in convolutional neural networks.
  35. 35.
    McDonnell MD (2018) Training wide residual networks for deployment using a single bit for each weight. In: ICLRGoogle Scholar
  36. 36.
    Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaev O, Venkatesh G, et al. (2017) Mixed precision training. arXiv preprint arXiv:1710.03740Google Scholar
  37. 37.
    Mishra A, Marr D (2018) Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: International conference on learning representationsGoogle Scholar
  38. 38.
    Mishra A, Nurvitadhi E, Cook JJ, Marr D (2018) WRPN: wide reduced-precision networks. ICLRGoogle Scholar
  39. 39.
    Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference.
  40. 40.
    Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (FPT), 2016. IEEE, Piscataway, pp 77–84CrossRefGoogle Scholar
  41. 41.
    Panda P, Sengupta A, Roy K (2017) Energy-efficient and improved image recognition with conditional deep learning. J Emerg Technol Comput Syst 13(3):33:1–33:21. CrossRefGoogle Scholar
  42. 42.
    Panda P, Venkataramani S, Sengupta A, Raghunathan A, Roy K (2017) Energy-efficient object detection using semantic decomposition. IEEE Trans Very Large Scale Integr VLSI Syst 25(9):2673–2677. CrossRefGoogle Scholar
  43. 43.
    Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp 27–40Google Scholar
  44. 44.
    Park Y, Kellis M (2015) Deep learning for regulatory genomics. Nat Biotechnol 33(8):825CrossRefGoogle Scholar
  45. 45.
    Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  46. 46.
    Park J, Kung J, Yi W, Kim JJ (2018) Maximizing system performance by balancing computation loads in LSTM accelerators. In: Design, Automation & Test in Europe conference & exhibition (DATE), 2018. IEEE, Piscataway, pp 7–12CrossRefGoogle Scholar
  47. 47.
    Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 Google Scholar
  48. 48.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556Google Scholar
  49. 49.
    Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES 2016, Pittsburgh, Pennsylvania, USA, October 1–7, 2016, pp 34:1–34:10.
  50. 50.
    Venkataramani S, Bahl V, Hua XS, Liu J, Li J, Phillipose M, Priyantha B, Shoaib M (2015) Sapphire: an always-on context-aware computer vision system for portable devices. In: 2015 Design, Automation Test in Europe conference exhibition (DATE), pp 1491–1496.
  51. 51.
    Venkataramani S, Raghunathan A, Liu J, Shoaib M (2015): Scalable-effort classifiers for energy-efficient machine learning. In: Proceedings of the 52nd annual Design Automation Conference, DAC’15. ACM, New York, pp 67:1–67:6.
  52. 52.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I-511–I-518.
  53. 53.
    Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks.
  54. 54.
    Wu S, Li G, Chen F, Shi L (2018) Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680Google Scholar
  55. 55.
    Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146Google Scholar
  56. 56.
    Zhang Y, Pezeshki M, Brakel P, Zhang S, Laurent C, Bengio Y, Courville AC (2017) Towards end-to-end speech recognition with deep convolutional neural networks. CoRR abs/1701.02720 Google Scholar
  57. 57.
    Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160 Google Scholar
  58. 58.
    Zhou S, Wang Y, Wen H, He Q, Zou Y (2017) Balanced quantization: an effective and efficient approach to quantized neural networks. CoRR abs/1706.07145 Google Scholar
  59. 59.
    Zhu M, Gupta S (2017) To prune, or not to prune: exploring the efficacy of pruning for model compression.
  60. 60.
    Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: International Conference on Learning Representations (ICLR)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations