Abstract
Real-time detection of human activities has become very important in terms of surveillance and security of Bank-Automated Teller Machines (ATMs), public offices because of the day-to-day increase in criminal activities. The current way of monitoring such constrained environments is done through monocular CCTV cameras which capture only RGB video. The RGB+D sensor provides depth data of the scene in addition to RGB data. To address the problem of online detection of abnormal activities in Bank ATMs, we propose a supervised deep learning framework based on multi-stream CNNs and RGB+D sensor. From the online video stream of RGB+D data, motion templates are created from RGB and depth video segments and then trained on CNNs to detect a suspicious event in ongoing activity. Moreover, due to the unavailability of any dataset for analyzing human activities in ATMs, we also contributed a novel RGB+D dataset in this paper. The proposed deep learning-based framework is evaluated on qualitative and quantitative statistical evaluation parameters and detect suspicious event with the precision of 0.932 and accuracy of 94.2%. Detailed statistical analysis of results shows that the proposed framework can detect the suspicious event in a real-time online manner before the abnormal activity gets completed.
Similar content being viewed by others
References
Hu, J.-F., Zheng, W.-S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5344–5352 (2015)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3), 16 (2011)
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T. L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 28–35 (2012)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on, IEEE, pp. 842–849 (2012)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley mhad: a comprehensive multimodal human action database. In: Applications of Computer Vision (WACV), 2013 IEEE Workshop on, IEEE, pp. 53–60 (2013)
Chen, C., Jafari, R., Kehtarnavaz, N.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, pp. 168–172 (2015)
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: Ntu RGB+ D: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019 (2016)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 1057–1060 (2012)
Liu, F., Tang, J., Zhao, R., Tang, Z.: Abnormal behavior recognition system for atm monitoring by RGB-D camera. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 1295–1296 (2012)
Nar, R., Singal, A., Kumar, P.: Abnormal activity detection for bank ATM surveillance. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, pp. 2042–2046 (2016)
Lee, W.-K., Leong, C.-F., Lai, W.-K., Leow, L.-K., Yap, T.-H.: Archcam: real time expert system for suspicious behaviour detection in ATM site. Expert Syst. Appl. 109, 12–24 (2018)
Imran, J., Kumar, P.: Human action recognition using RGB-D sensor and deep convolutional neural networks. In: international conference on advances in computing, communications and informatics (ICACCI). IEEE 2016, 144–148 (2016)
Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recogn. Lett. 115, 107–116 (2018)
Liu, M., Yuan, J.: Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1159–1168 (2018)
McNally, W., Wong, A., McPhee, J.: Star-net: action recognition using spatio-temporal activation reprojection. In: 2019 16th Conference on Computer and Robot Vision (CRV), IEEE, pp. 49–56 (2019)
Huynh-The, T., Hua, C.-H., Kim, D.-S.: Encoding pose features to images with data augmentation for 3-d action recognition. IEEE Trans. Industr. Inf. 16(5), 3100–3111 (2019)
Zhang, E., Xue, B., Cao, F., Duan, J., Lin, G., Lei, Y.: Fusion of 2d CNN and 3d densenet for dynamic gesture recognition. Electronics 8(12), 1511 (2019)
Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018)
Chen, Y., Wang, L., Li, C., Hou, Y., Li, W.: Convnets-based action recognition from skeleton motion maps. Multimed. Tools Appl. 79(3), 1707–1725 (2020)
Liu, M., Meng, F., Chen, C., Wu, S.: Joint dynamic pose image and space time reversal for human action recognition from videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 8762–8769 (2019)
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybernet.: Syst. 49(9), 1806–1819 (2018)
Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. J. Real-Time Image Proc. 12(1), 155–163 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556
Mansur, A., Makihara, Y., Yagi, Y.: Inverse dynamics for action recognition. IEEE Trans. Cybernet. 43(4), 1226–1236 (2013)
Karg, M., Kirsch, A.: Simultaneous plan recognition and monitoring (spram) for robot assistants, (2013)
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2013)
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp. 2720–2727 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018)
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.-S.: Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1933–1941 (2017)
Chong, Y. S., Tay, Y. H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks, Springer, pp. 189–196 (2017)
Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., Ogunbona, P.O.: Action recognition from depth maps using deep convolutional neural networks. IEEE Trans. Hum.-Mach. Syst. 46(4), 498–509 (2015)
Acknowledgements
This research was supported by Science and Engineering Research Board (SERB) under Project No. ECR/2016/000387, in cooperation with the Department of Science and Technology (DST), Government of India. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DST-SERB or the Government of India. The DST-SERB or Government of India is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Khaire, P.A., Kumar, P. RGB+D and deep learning-based real-time detection of suspicious event in Bank-ATMs. J Real-Time Image Proc 18, 1789–1801 (2021). https://doi.org/10.1007/s11554-021-01155-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01155-2