Approximate Circuits pp 289-305 | Cite as

# Lightweight Deep Neural Network Accelerators Using Approximate SW/HW Techniques

## Abstract

Deep neural networks (DNNs) provide state-of-the-art accuracy performances in many application domains, such as computer vision and speech recognition. At the same time, DNNs require millions of expensive floating-point operations to process each input, which limit their applicability to resource-constrained systems that are limited in hardware design area or power consumption. Our goal is to devise lightweight, approximate accelerators for DNN accelerations that use less hardware resources with negligible reduction in accuracy. To simplify the hardware requirements, we analyze a spectrum of data precision methods ranging from fixed-point, dynamic fixed-point, powers-of-two to binary data precision. In conjunction, we provide new training methods to compensate for the simpler hardware resources. To boost the accuracy of the proposed lightweight accelerators, we describe ensemble processing techniques that use an ensemble of lightweight DNN accelerators to achieve the same or better accuracy than the original floating-point accelerator, while still using much less hardware resources. Using 65 nm technology libraries and industrial-strength design flow, we demonstrate a custom hardware accelerator design and training procedure which achieve low-power, low-latency while incurring insignificant accuracy degradation. We evaluate our design and technique on the CIFAR-10 and ImageNet datasets and show that significant reduction in power and inference latency is realized.

## Notes

### Acknowledgements

We would like to thank Professor R. Iris Bahar and N. Anthony for their contributions to this project [8, 25]. In comparison to our two previous publications in [8, 25], we provide in this chapter additional experimental results for various quantization schemes and ensemble deployment. More specifically, the novel contributions in this chapter include implementations of accelerators capable of performing ensemble inference for fixed-point (16,16), (8,8), and power-of-two (6,16). We also provide the performance evaluations of these accelerators in side-by-side comparisons to those from our previous works in Figs. 14.7 and 14.8. We also generalize our ensemble technique to boost the accuracy to all types of quantized networks and not just dynamic fixed-point. The additional results contributed in this chapter complete the gaps between our two previous publications, which allow for a more complete design space exploration for approximate deep neural network accelerators. This work is supported by NSF grant 1420864 and by the generous GPU hardware donations from NVIDIA Corporation.

## References

- 1.Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems (NIPS 2014), pp 2654–2662Google Scholar
- 2.Bucilua C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of ACM SIGKDDGoogle Scholar
- 3.Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, & Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of ACM ASPLOS. ACM, New York, pp 269–284Google Scholar
- 4.Courbariaux M, Bengio Y, David JP (2014) Low precision arithmetic for deep learning. arXiv:1412.7024Google Scholar
- 5.Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323Google Scholar
- 6.Graham B (2014) Fractional max-pooling. arXiv:1412.6071Google Scholar
- 7.Gysel P (2016) Ristretto: hardware-oriented approximation of convolutional neural networks. CoRR, abs/1605.06402Google Scholar
- 8.Hashemi S, Anthony N, Tann H, Bahar RI, Reda S (2017) Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Proceedings of DATEGoogle Scholar
- 9.He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRefGoogle Scholar
- 10.He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
- 11.Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531Google Scholar
- 12.Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3Google Scholar
- 13.Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, New York, pp 4107–4115Google Scholar
- 14.Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093Google Scholar
- 15.Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of TorontoGoogle Scholar
- 16.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPSGoogle Scholar
- 17.Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400Google Scholar
- 18.Lin Z, Courbariaux M, Memisevic R, Bengio Y (2015) Neural networks with few multiplications. CoRR, abs/1510.03009Google Scholar
- 19.Rastegari M, Ordonez V, Redmon J, Farhadi A (2015) XNOR-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Berlin, pp 525–542Google Scholar
- 20.Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. CoRR, abs/1412.6550Google Scholar
- 21.Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
- 22.Shafique M, Hafiz R, Javed MU, Abbas S, Sekanina L, Vasicek Z, Mrazek V (2017) Adaptive and energy-efficient architectures for machine learning: challenges, opportunities, and research roadmap. In: 2017 IEEE computer society annual symposium on VLSI (ISVLSI). IEEE, Piscataway, pp 627–632CrossRefGoogle Scholar
- 23.Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Association for the advancement of artificial intelligence, vol 4, p 12Google Scholar
- 24.Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. CoRR, abs/1607.05418Google Scholar
- 25.Tann H, Hashemi S, Bahar RI, Reda S (2017) Hardware-software codesign of accurate, multiplier-free deep neural networks. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC). IEEE, Piscataway, pp 1–6Google Scholar
- 26.Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146Google Scholar