Skip to main content

Approximate Computing Techniques for Deep Neural Networks

  • Chapter
  • First Online:

Abstract

Deep neural networks (DNNs) have emerged as a powerful and versatile set of techniques enabling unprecedented success on challenging artificial intelligence (AI) problems. However, the recent success of DNNs comes at the cost of high computational complexity using very large models, which often require 100s of MBs of data storage, ExaOps of computation, and immense bandwidth for data movement. Despite advances in computing systems, it still takes days to weeks to train state-of-the-art DNNs—which directly limits the pace of innovation. Approximate computing is gaining traction as a promising method to alleviate demanding computational complexity in DNNs. Exploiting their inherent resiliency, approximate computing aims to relax exactness constraints with the goal of obtaining significant gains in computational throughput while maintaining an acceptable quality of results. In this chapter, we review the wide spectrum of approximate computing techniques that have been successfully applied to DNNs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Andri R, Cavigelli L, Rossi D, Benini L Yodann (2016) An ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, Piscataway, pp 236–241

    Google Scholar 

  2. Anwar S, Hwang K, Sung W (2015) Structured pruning of deep convolutional neural networks. http://arxiv.org/abs/1512.08571

  3. Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 μj/86% cifar-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS. In: IEEE International on Solid-State Circuits Conference-(ISSCC), 2018. IEEE, Piscataway, pp 222–224

    Chapter  Google Scholar 

  4. Cai Z, He X, Sun J, Vasconcelos N (2017) Deep learning with low precision by half-wave Gaussian quantization. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)

    Google Scholar 

  5. Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision (ICCV), 2015. IEEE, Piscataway, pp 2722–2730

    Chapter  Google Scholar 

  6. Chen CY, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2017) Adacomp: adaptive residual gradient compression for data-parallel distributed training. arXiv preprint arXiv:1712.02679

    Google Scholar 

  7. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  8. Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th annual Design Automation Conference, DAC ’13. ACM, New York, pp 113:1–113:9. https://doi.org/10.1145/2463209.2488873. http://doi.acm.org/10.1145/2463209.2488873

  9. Choi J, Wang Z, Venkataramani S, Chuang PIJ, Srinivasan V, Gopalakrishnan K (2018) Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085

    Google Scholar 

  10. Courbariaux M, Bengio Y, David J (2015) BinaryConnect: training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363

    Google Scholar 

  11. Das D, Mellempudi N, Mudigere D, Kalamkar D, Avancha S, Banerjee K, Sridharan S, Vaidyanathan K, Kaul B, Georganas E, et al. (2018) Mixed precision training of convolutional neural networks using integer operations. arXiv preprint arXiv:1802.00930

    Google Scholar 

  12. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV, et al. (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231

    Google Scholar 

  13. Esser S, Merolla P, Arthur J, Cassidy A, Appuswamy R, Andreopoulos A, Berg D, McKinstry J, Melano T, Barch D, et al. (2016) Convolutional networks for fast, energy-efficient neuromorphic computing. Proc Natl Acad Sci 113:11441–11446. Preprint on ArXiv. http://arxiv.org/abs/1603.08270

    Article  Google Scholar 

  14. Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, et al. (2018) A scalable multi-teraops deep learning processor core for ai training and inference. In: Symposium on VLSI circuits, 2018. IEEE, Piscataway

    Google Scholar 

  15. Ganapathy S, Venkataramani S, Ravindran B, Raghunathan A (2017) Dyvedeep: dynamic variable effort deep neural networks. CoRR abs/1704.01137. http://arxiv.org/abs/1704.01137

  16. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. CoRR abs/1705.03122

    Google Scholar 

  17. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: International conference on machine learning, pp 1737–1746

    Google Scholar 

  18. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. http://arxiv.org/abs/1510.00149

  19. Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks, pp 1–9. https://doi.org/10.1016/S0140-6736(95)92525-2. http://arxiv.org/abs/1506.02626

    Article  Google Scholar 

  20. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, Piscataway, pp 243–254

    Chapter  Google Scholar 

  21. Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, et al. (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, New York, pp 75–84

    Chapter  Google Scholar 

  22. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

    Google Scholar 

  23. He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. http://arxiv.org/abs/1707.06168

  24. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Neural Information Processing Systems (NIPS), pp 4107–4115

    Google Scholar 

  25. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. CoRR abs/1609.07061

    Google Scholar 

  26. Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: IEEE workshop on Signal Processing Systems (SiPS), pp 1–6

    Google Scholar 

  27. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456

    Google Scholar 

  28. Jain S, Venkataramani S, Srinivasan V, Choi J, Chuang P, Chang L (2018): Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors. In: Proceedings of the 55th annual Design Automation Conference, DAC’18. ACM, New York

    Google Scholar 

  29. Kim M, Smaragdis P (2015) Bitwise neural networks. In: ICML workshop on resource-efficient machine learning

    Google Scholar 

  30. Köster U, Webb T, Wang X, Nassar M, Bansal AK, Constable W, Elibol O, Gray S, Hall S, Hornof L, et al. (2017) Flexpoint: an adaptive numerical format for efficient training of deep neural networks. In: Advances in neural information processing systems, pp 1742–1752

    Google Scholar 

  31. Li F, Liu B (2016) Ternary weight networks. CoRR abs/1605.04711. http://arxiv.org/abs/1605.04711

  32. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient ConvNets. http://arxiv.org/abs/1608.08710

  33. Lin Y, Han S, Mao H, Wang Y, Dally WJ (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887

    Google Scholar 

  34. Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the regularity of sparse structure in convolutional neural networks. http://arxiv.org/abs/1705.08922

  35. McDonnell MD (2018) Training wide residual networks for deployment using a single bit for each weight. In: ICLR

    Google Scholar 

  36. Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaev O, Venkatesh G, et al. (2017) Mixed precision training. arXiv preprint arXiv:1710.03740

    Google Scholar 

  37. Mishra A, Marr D (2018) Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: International conference on learning representations

    Google Scholar 

  38. Mishra A, Nurvitadhi E, Cook JJ, Marr D (2018) WRPN: wide reduced-precision networks. ICLR

    Google Scholar 

  39. Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. http://arxiv.org/abs/1611.06440

  40. Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (FPT), 2016. IEEE, Piscataway, pp 77–84

    Chapter  Google Scholar 

  41. Panda P, Sengupta A, Roy K (2017) Energy-efficient and improved image recognition with conditional deep learning. J Emerg Technol Comput Syst 13(3):33:1–33:21. https://doi.org/10.1145/3007192. http://doi.acm.org/10.1145/3007192

    Article  Google Scholar 

  42. Panda P, Venkataramani S, Sengupta A, Raghunathan A, Roy K (2017) Energy-efficient object detection using semantic decomposition. IEEE Trans Very Large Scale Integr VLSI Syst 25(9):2673–2677. https://doi.org/10.1109/TVLSI.2017.2707077

    Article  Google Scholar 

  43. Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp 27–40

    Google Scholar 

  44. Park Y, Kellis M (2015) Deep learning for regulatory genomics. Nat Biotechnol 33(8):825

    Article  Google Scholar 

  45. Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR)

    Google Scholar 

  46. Park J, Kung J, Yi W, Kim JJ (2018) Maximizing system performance by balancing computation loads in LSTM accelerators. In: Design, Automation & Test in Europe conference & exhibition (DATE), 2018. IEEE, Piscataway, pp 7–12

    Chapter  Google Scholar 

  47. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279

    Google Scholar 

  48. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

    Google Scholar 

  49. Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES 2016, Pittsburgh, Pennsylvania, USA, October 1–7, 2016, pp 34:1–34:10. https://doi.org/10.1145/2968456.2968458. http://doi.acm.org/10.1145/2968456.2968458

  50. Venkataramani S, Bahl V, Hua XS, Liu J, Li J, Phillipose M, Priyantha B, Shoaib M (2015) Sapphire: an always-on context-aware computer vision system for portable devices. In: 2015 Design, Automation Test in Europe conference exhibition (DATE), pp 1491–1496. https://doi.org/10.7873/DATE.2015.0369

  51. Venkataramani S, Raghunathan A, Liu J, Shoaib M (2015): Scalable-effort classifiers for energy-efficient machine learning. In: Proceedings of the 52nd annual Design Automation Conference, DAC’15. ACM, New York, pp 67:1–67:6. https://doi.org/10.1145/2744769.2744904. http://doi.acm.org/10.1145/2744769.2744904

  52. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I-511–I-518. https://doi.org/10.1109/CVPR.2001.990517

  53. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. http://arxiv.org/abs/1608.03665

  54. Wu S, Li G, Chen F, Shi L (2018) Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680

    Google Scholar 

  55. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146

    Google Scholar 

  56. Zhang Y, Pezeshki M, Brakel P, Zhang S, Laurent C, Bengio Y, Courville AC (2017) Towards end-to-end speech recognition with deep convolutional neural networks. CoRR abs/1701.02720

    Google Scholar 

  57. Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160

    Google Scholar 

  58. Zhou S, Wang Y, Wen H, He Q, Zou Y (2017) Balanced quantization: an effective and efficient approach to quantized neural networks. CoRR abs/1706.07145

    Google Scholar 

  59. Zhu M, Gupta S (2017) To prune, or not to prune: exploring the efficacy of pruning for model compression. http://arxiv.org/abs/1710.01878

  60. Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: International Conference on Learning Representations (ICLR)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jungwook Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Choi, J., Venkataramani, S. (2019). Approximate Computing Techniques for Deep Neural Networks. In: Reda, S., Shafique, M. (eds) Approximate Circuits. Springer, Cham. https://doi.org/10.1007/978-3-319-99322-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99322-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99321-8

  • Online ISBN: 978-3-319-99322-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics