A Rank Decomposed Statistical Error Compensation Technique for Robust Convolutional Neural Networks in the Near Threshold Voltage Regime
- 168 Downloads
There has been a growing interest in implementing complex machine learning algorithms such as convolutional neural networks (CNNs) on lower power embedded platforms to enable on-device learning and inference. Many of these platforms are to be deployed as low power sensor nodes with low to medium throughput requirement. Near threshold voltage (NTV) designs are well-suited for these applications but suffer from a significant increase in variations. In this paper, we propose a variation-tolerant architecture for CNNs capable of operating in NTV regime for energy efficiency. A statistical error compensation (SEC) technique referred to as rank decomposed SEC (RD-SEC) is proposed. The key idea is to exploit inherent redundancy within matrix-vector multiplication (or dot product ensemble), a power-hungry operation in CNNs, to derive low-cost estimators for error detection and compensation. When evaluated in CNNs for both the MNIST and CIFAR-10 datasets, simulation results in 45 nm CMOS show that RD-SEC enables robust CNNs operating in the NTV regime. Specifically, the proposed architecture can achieve up to 11 × improvement in variation tolerance and enable up to 113 × reduction in the standard deviation of detection accuracy Pdet while incurring marginal degradation in the median detection accuracy.
KeywordsConvolutional neural networks Statistical error compensation Rank decomposition Near threshold voltage regime
This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.
- 1.Chen, Y.H., Krishna, T., Emer, J., Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In IEEE international solid-state circuits conference (ISSCC).Google Scholar
- 3.Mahesh, R., & Vinod, A. (2010). New reconfigurable architectures for implementing FIR filters with low complexity. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29(2), 275–288.Google Scholar
- 4.Liu, X., Zhou, J., Liao, X., Wang, C., Luo, J., Madihian, M., Je, M. (2012). Ultra-low-energy near-threshold biomedical signal processor for versatile wireless health monitoring. In 2012 IEEE Asian solid state circuits conference (a-SSCC) (pp. 381–384). https://doi.org/10.1109/ASSCC.2012.6570806.
- 5.Kim, Y., Hong, I., Yoo, H.J. (2015). 18.3 A 0.5v 54 uw ultra-low-power recognition processor with 93.5 compression. In 2015 IEEE international solid-state circuits conference - (ISSCC) digest of technical papers (pp. 1–3).Google Scholar
- 6.Dreslinski, R., Wieckowski, M., Blaauw, D., Sylvester, D., Mudge, T. (2010). Near-threshold computing: reclaiming Moore’s law through energy efficient integrated circuits. Proceedings of the IEEE, 98(2), 253–266.Google Scholar
- 7.Das, S., Blaauw, D., Bull, D., Flautner, K., Aitken, R. (2009). Addressing design margins through error-tolerant circuits. In 46th ACM/IEEE design automation conference (DAC) (pp. 11–12).Google Scholar
- 8.Tschanz, J., Bowman, K., Wilkerson, C., Lu, S.L., Karnik, T. (2009). Resilient circuits: enabling energy-efficient performance and reliability. In IEEE/ACM international conference on computer-aided design (ICCAD).Google Scholar
- 9.Bahar, R., Mundy, J., Chen, J. (2003). A probabilistic-based design methodology for nanoscale computation. In IEEE/ACM international conference on computer aided design (ICCAD) (pp. 480–486).Google Scholar
- 12.Choi, J., Kim, E.P., Rutenbar, R.A., Shanbhag, N.R. (2013). Error resilient MRF message passing architecture for stereo matching. In IEEE workshop on signal processing systems (siPS) (pp. 348–353).Google Scholar
- 13.Abdallah, R.A., & Shanbhag, N. R. (2013). Error-resilient systems via statistical signal processing. In IEEE workshop on signal processing systems (siPS).Google Scholar
- 15.Lin, Y., Zhang, S., Shanbhag, N.R. (2016). Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In 2016 IEEE international workshop on signal processing systems (SiPS). https://doi.org/10.1109/SiPS.2016.11 (pp. 17–22).
- 16.Zhang, S., & Shanbhag, N. (2016). Probabilistic error models for machine learning kernels implemented on stochastic nanoscale fabrics. In Design, automation test in Europe (DATE).Google Scholar
- 17.Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y. (2009). What is the best multi-stage architecture for object recognition?. In IEEE 12th international conference on computer vision (pp. 2146–2153).Google Scholar
- 18.Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.Google Scholar
- 19.Han, S., Pool, J., Tran, J., Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).Google Scholar
- 20.Wang, Y. et al. (2016). Low power convolutional neural networks on a chip. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS) (pp. 129–132).Google Scholar
- 21.Courbariaux, M. et al. (2016). Binarynet: training deep neural networks with weights and activations constrained to + 1 or − 1. arXiv:1602.02830.
- 22.Hwang, K., & Sung, W. (2014). Fixed-point feedforward deep neural network design using weights + 1, 0, and − 1. In IEEE workshop on signal processing systems (siPS), 2014 (pp. 1–6): IEEE.Google Scholar
- 23.Anwar, S., Hwang, K., Sung, W. (2015). Fixed point optimization of deep convolutional neural networks for object recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015 (pp. 1131–1135): IEEE.Google Scholar
- 24.Sung, W., Shin, S., Hwang, K. (2015). Resiliency of deep neural networks under quantization. arXiv:1511.06488.
- 25.Knag, P., Liu, C., Zhang, Z. (2016). A 1.40mm2 141mw 898gops sparse neuromorphic processor in 40nm cmos. In 2016 IEEE symposium on VLSI circuits (VLSI-circuits) (pp. 1–2).Google Scholar
- 26.Lin, Y., Sakr, C., Kim, Y., Shanbhag, N.R. (2017). Predictivenet: an energy-efficient convolutional neural network via zero prediction. In 2017 IEEE international symposium on circuits and systems (ISCAS).Google Scholar
- 27.Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O. (2014). DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM sigplan notices, (Vol. 49 pp. 269–284): ACM.Google Scholar
- 28.Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O. (2015). Shidiannao: shifting vision processing closer to the sensor. In ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), 2015 (pp. 92–104).Google Scholar
- 29.Kang, M., Gonugondla, S.K., Keel, M.S., Shanbhag, N.R. (2015). An energy-efficient memory-based high-throughput vlsi architecture for convolutional networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015 (pp. 1037–1041): IEEE.Google Scholar
- 30.Teodorescu, R., Nakano, J., Tiwari, A., Torrellas, J. (2007). Mitigating parameter variation with dynamic fine-grain body biasing. In 40th annual IEEE/ACM international symposium on microarchitecture (MICRO).Google Scholar
- 31.Liang, X., Wei, G.Y., Brooks, D. (2009). Revival: a variation-tolerant architecture using voltage interpolation and variable latency. IEEE Micro, 29, 127–138.Google Scholar
- 32.Strang, G. (2003). Introduction to linear algebra, 3rd edn. Wesley-Cambridge Press.Google Scholar
- 33.Bertsekas, D.P., & Tsitsiklis, J.N. (2008). Introduction to probability, 2nd edn. Belmont: Athena Scientific.Google Scholar
- 34.Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.222.9220
- 35.Rabaey, J.M., Chandrakasan, A., Nikolic, B. (2003). Digital integrated circuits: a design perspective. Upper Saddle River: Prentice-Hall, Inc.Google Scholar