Recognizing People in Blind Spots Based on Surrounding Behavior
Abstract
Recent advances in computer vision have achieved remarkable performance improvements. These technologies mainly focus on recognition of visible targets. However, there are many invisible targets in blind spots in real situations. Humans may be able to recognize such invisible targets based on contexts (e.g. visible human behavior and environments) around the targets, and used such recognition to predict situations in blind spots on a daily basis. As the first step towards recognizing targets in blind spots captured in videos, we propose a convolutional neural network that recognizes whether or not there is a person in a blind spot. Based on the experiments that used the volleyball dataset, which includes various interactions of players, with artificial occlusions, our proposed method achieved 90.3% accuracy in the recognition.
Keywords
Action recognition Convolutional Neural NetworksReferences
- 1.Baradad, M., et al.: Inferring light fields from shadows. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6267–6275 (2018)Google Scholar
- 2.Bouman, K.L., et al.: Turning corners into cameras: Principles and methods. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2289–2297 (2017)Google Scholar
- 3.Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the Kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)Google Scholar
- 4.Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
- 5.Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6047–6056, June 2018Google Scholar
- 6.Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imageNet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6546–6555 (2018)Google Scholar
- 7.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
- 8.He, Y., Shirakabe, S., Satoh, Y., Kataoka, H.: Human action recognition without human. In: Proceedings of the ECCV Workshop on Brave New Ideas for Motion Representations in Videos, pp. 11–17 (2016)Google Scholar
- 9.Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980 (2016)Google Scholar
- 10.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
- 11.Kay, W., et al.: The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
- 12.Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
- 13.Mak, L.C., Furukawa, T.: Non-line-of-sight localization of a controlled sound source. In: Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp. 475–480 (2009)Google Scholar
- 14.Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 807–814. Omnipress (2010)Google Scholar
- 15.Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)Google Scholar
- 16.Zhao, M., et al.: Through-wall human pose estimation using radio signals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7356–7365 (2018)Google Scholar