Efficient Object Detection Using Embedded Binarized Neural Networks
Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this paper, we utilize and analyze the binarized neural network in doing human detection on infrared images. Our results show comparable algorithmic performance of binarized versus 32bit floating-point networks, with the added benefit of greatly simplified computation and reduced memory overhead. In addition, we present a system architecture designed specifically for computation using binary representation that achieves at least 4× speedup and the energy is improved by three orders of magnitude over GPU.
KeywordsDeep learning Embedded computer vision Binary neural network Low-power object detection
This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory (AFRL), and NSF (#1526399). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
- 1.Krizhevsky, A. et. al. (2012). ImageNet classification with deep convolutional neural networks. In: Proc Neural Information Processing Systems (NIPS), pp. 1097–1105.Google Scholar
- 2.Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013). Pedestrian detection with unsupervised multi-stage feature learning. In Proc Comput Vision Pattern Recog (CVPR), pp. 3626–3633, IEEE.Google Scholar
- 3.Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In Proc Int Conf Machine Learning (ICML), pp. 1017–1024.Google Scholar
- 6.Vanhoucke, V., & Senior, A. (2011). Improving the speed of neural networks on CPUs. In Proc Deep Learn Unsupervised Feature Learn. NIPS: Workshop.Google Scholar
- 7.Kung, J., Kim, D. & Mukhopadhyay, S. (2015). A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses. In Int Symp Low Power Electron, Design (ISLPED), pp. 85–90.Google Scholar
- 8.Sarwar, S. S., Venkataramani, S., Raghunathan, A., & Roy, K. (2016). Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Proc Design Automat, Test in Europe (DATE), pp. 145–150.Google Scholar
- 9.Gong, Y., Liu, L., Yang, M. & Bourdev, L. (2014). Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115.Google Scholar
- 10.Han, S. et al. (2016). EIE: Efficient inference engine on compresses deep neural network. arXiv preprint arXiv:1602:01528.Google Scholar
- 11.Courbariaux, M. et al. (2016). Binarized neural networks: Training neural networks with weights and activations constrained to +1 or −1. arXiv preprint arXiv:1602.02830.Google Scholar
- 12.Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-net: ImageNet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279.Google Scholar
- 13.Zhou, S. et al. (2016). DoReFa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160.Google Scholar
- 15.Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the gap to human-level performance in face verification. In Proc Comput Vision Pattern Recog (CVPR), pp. 1701–1708, IEEE.Google Scholar
- 16.Coates, A. et al. (2013). Deep learning with COTS HPC systems. In Proc. Int. Conf. Machine Learning (ICML), pp. 1337–1345.Google Scholar
- 17.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google Scholar
- 19.Wan, L. et al. (2013). Regularization of neural networks using DropConnect. In Proc Int Conf Machine Learning (ICML), pp. 1058–1066.Google Scholar
- 20.Chai, S., et al. (2016). Low precision neural network using subband decomposition. In Cognitive Architecture (CogArch). Google Scholar
- 21.Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus R. (2014). Training convolutional networks with noisy labels. arXiv:1406.2080.Google Scholar
- 22.Wu, Z., Fuller, N., Theriault, D., & Betke, M. (2014). A thermal infrared video benchmark for visual analysis. In Proc IEEE Workshop on Perception Beyond the Visible Spectrum (PBVS).Google Scholar
- 23.Zhang, D. et al. (2016). Unsupervised underwater fish detection fusing flow and objectiveness. In Proc Winter Appl Comput Vision Workshops (WACVW), pp. 1–7.Google Scholar
- 24.van de Sande, K. E. A., Uijlings, J. R. R., Gevers, T., & Smeulders, A. W. M. (2011). Segmentation as selective search for object recognition. In Proc Int Conf Comput Vision (ICCV).Google Scholar
- 25.Horowitz, M. Energy table for 45nm process. Stanford VLSI wiki. https://sites.google.com/site/seecproject/energy-table.
- 26.Synopsys 32/28nm generic library. https://www.synopsys.com/.