Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition
- 6k Downloads
Deep-learning neural networks such as convolutional neural network (CNN) have shown great potential as a solution for difficult vision problems, such as object recognition. Spiking neural networks (SNN)-based architectures have shown great potential as a solution for realizing ultra-low power consumption using spike-based neuromorphic hardware. This work describes a novel approach for converting a deep CNN into a SNN that enables mapping CNN to spike-based hardware architectures. Our approach first tailors the CNN architecture to fit the requirements of SNN, then trains the tailored CNN in the same way as one would with CNN, and finally applies the learned network weights to an SNN architecture derived from the tailored CNN. We evaluate the resulting SNN on publicly available Defense Advanced Research Projects Agency (DARPA) Neovision2 Tower and CIFAR-10 datasets and show similar object recognition accuracy as the original CNN. Our SNN implementation is amenable to direct mapping to spike-based neuromorphic hardware, such as the ones being developed under the DARPA SyNAPSE program. Our hardware mapping analysis suggests that SNN implementation on such spike-based hardware is two orders of magnitude more energy-efficient than the original CNN implementation on off-the-shelf FPGA-based hardware.
KeywordsDeep learning Machine learning Convolutional neural networks Spiking neural networks Neuromorphic circuits Object recognition
This work was partially supported by the Defense Advanced Research Projects Agency Cognitive Technology Threat Warning System (CT2WS) and SyNAPSE programs (contracts W31P4Q-08-C-0264 and HR0011-09-C-0001). The views expressed in this document are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. We would like to thank Dr. Clement Farabet of New York University for providing the initial CNN structure on which the CNN model outlined in Fig. 1 is based; and the anonymous reviewers for their invaluable comments and recommendations that led to this revised manuscript.
- Arthur, J.V., Merolla, P.A., Akopyan, F., Alvarez, R., Cassidy, A., Chandra, S., & Modha, D.S. (2012). Building block of a programmable neuromorphic substrate: A digital neurosynaptic core. In The 2012 International Joint Conference on Neural Networks (IJCNN), (pp. 1–8).Google Scholar
- Cassidy, A.S., Merolla, P., Arthur, J.V., Esser, S.K., Jackson, B., Alvarez-Icaza, R., & Modha, D.S. (2013). Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores. In: The 2013 International Joint Conference on Neural Networks (IJCNN), (pp. 1–10).Google Scholar
- Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3642–3649).Google Scholar
- Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A matlab-like environment for machine learning. In: BigLearn, NIPS Workshop.Google Scholar
- Defense Advanced Research Projects Agency (DARPA) (2011), “Neovision2 Evaluation Results”, DISTAR Case Number 18620, December 22, 2011.Google Scholar
- Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., & Culurciello, E. (2010), “Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems”, In: IEEE International Symposium on Circuits and Systems (ISCAS’10), Paris, 2010.Google Scholar
- Farabet, C. (2013), Towards real-time image understanding with convolutional neural networks, Ph.D. Thesis, Universit’e Paris-Est, Dec. 19, 2013 (http://pub.clement.farabet.net/thesis.pdf, accessed July 10, 2014).
- Grossberg, S., & Huang, T.-R. (2009). ARTSCENE: A neural system for natural scene classification. Journal of Vision, 9(4), 6:1–19, doi: 10.1167/9.4.6.
- Ho, N. (2013). Convolutional neural network and CIFAR-10, part-3, http://nghiaho.com/?p=1997, July 7, 2013.
- Itti, L. (2013). Neovision2 annotated video datasets. http://ilab.usc.edu/neo2/dataset/, accessed July 10, 2014.
- Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In 12th International Conference on Computer Vision (ICCV) (pp. 2146–2153).Google Scholar
- Khosla, D., Chen, Y., Kim, K., Cheng, S.Y., Honda, A.L., & Zhang, L. (2013a). A neuromorphic system for multi-object detection and classification. Proc. SPIE. 8745, Signal Processing, Sensor Fusion, and Target Recognition XXII :87450X.Google Scholar
- Khosla, D., Chen, Y., Huber, D., Van Buer, D., & Kim, K. (2013b). Real-time, low-power neuromorphic hardware for autonomous object recognition. Proc. SPIE 8713, Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications X: 871313.Google Scholar
- Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images, MSc thesis, Univ. of Toronto, Dept. of Computer Science, 2009. (Also see http://www.cs.toronto.edu/~kriz/cifar.html for CIFAR-10 image data sets).
- Krizhevskey, A. (2012). Cuda-ConvNet, http://code.google.com/p/cuda-convnet/, accessed July 10, 2014.
- Krizhevsky, A., Sutskever, I, & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (pp. 1106–1114).Google Scholar
- Lin, M., Chen, Q., & Yan, S. (2014). Network in Network, in International Conference on Learning Representation (ICLR2014), Banff, Canada, April 14–16, 2014. Google Scholar
- Merolla, P., Arthur, J., Akopyan, F., Imam, N., Manohar, R., & Modha, D.S. (2011). A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45nm. In: Custom Integrated Circuits Conference (CICC), 2011 IEEE (pp. 1–4).Google Scholar
- Perez-Carrasco, J.A., Serrano, C., Acha, B., Serrano-Gotarredona, T., & Linares-Barranco, B. (2010). Spike-based convolutional network for real-time processing. In: 2010 International Conference on Pattern Recognition (pp. 3085–3088).Google Scholar
- Ranzato, M. A., Huang, F. J., Boureau, Y. L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07) (pp. 1–8).Google Scholar
- Razavian, A.S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN Features off-the-shelf: an Astounding Baseline for Recognition, http://arxiv.org/abs/1403.6382, DeepVision CVPR 2014 Workshop, Columbus, Ohio, June 28, 2014.
- Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. & LeCun, Y. (2014). “OverFeat: Integrated recognition, localization and detection using convolutional networks”,In International Conference on Learning Representations (ICLR 2014), April 2014.Google Scholar