Abstract
In the recent era of technological advancements, surveillance cameras are installed in crowded areas to ensure public protection. In the video surveillance context, contents belonging to suspicious actions are very less in course of the surveillance stream. Therefore, manual monitoring of suspicious actions may become very exhaustive, which effects reliability and speed during emergencies due to monitoring tiredness, so the importance of suspicious action detection is very clear. We first address the issue of detecting suspicious activities from the surveillance videos with our proposed CNN-based autoencoder. The features are extracted using a three-dimensional convolutional neural network (C3D) and fed to our proposed autoencoder framework, which detects the localization of activity based on high reconstruction loss. For normal video clips, we have seen low reconstruction loss and the converse is seen for video clips containing suspicious actions. Secondly, we extract these suspicious clips from the long surveillance videos and use them to classify various suspicious actions with the help of our proposed generative adversarial network (GAN). We evaluate the performance of our work with benchmark datasets, namely UT interaction, hybrid crime action (HCA), and UCF crime. The results show the effectiveness of our work and as achieved accuracies are 97.5%, 89.6%, and 47.34% on UT interaction, HCA and UCF crime dataset, respectively.
Similar content being viewed by others
References
Foorthuis, R.: On the nature and types of anomalies: a review of deviations in data. Int. J. Data Sci. Anal. 12(4), 297–331 (2021)
Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C.: Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9592–9600 (2019)
Himeur, Y.; Ghanem, K.; Alsalemi, A.; Bensaali, F.; Amira, A.: Artificial intelligence based anomaly detection of energy consumption in buildings: a review, current trends and new perspectives. Appl. Energy 287, 116601 (2021)
Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R.: A unifying review of deep and shallow anomaly detection. In: Proceedings of the IEEE (2021)
Thudumu, S.; Branch, P.; Jin, J.; Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020)
Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W.: Cnn features with bi-directional lstm for real-time anomaly detection in surveillance networks. Multim. Tools Appl. 80(11), 16979–16995 (2021)
Landi, F.; Snoek, C.G.; Cucchiara, R.: Anomaly locality in video surveillance. arXiv preprint arXiv:1901.10364 (2019)
Nguyen, T.-N.; Meunier, J.: Anomaly detection in video sequence with appearance-motion correspondence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1273–1283 (2019)
Vu, H.; Nguyen, T.D.; Le, T.; Luo, W.; Phung, D.: Robust anomaly detection in videos using multilevel representations. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 5216–5223 (2019)
Ionescu, R.T.; Khan, F.S.; Georgescu, M.-I.; Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7842–7851 (2019)
Huynh-The, T.; Hua, C.-H.; Kim, D.-S.: Encoding pose features to images with data augmentation for 3-d action recognition. IEEE Trans. Ind. Infor. 16(5), 3100–3111 (2019)
Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)
Mohammadi, S.; Kiani, H.; Perina, A.; Murino, V.: Violence detection in crowded scenes using substantial derivative. In: 2015 12th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE , pp. 1–6 (2015)
Zhang, T.; Yang, Z.; Jia, W.; Yang, B.; Yang, J.; He, X.: A new method for violence detection in surveillance scenes. Multim. Tools Appl. 75(12), 7327–7349 (2016)
Shah, A.P.; Lamare, J.-B.; Nguyen-Anh, T.; Hauptmann, A.: Cadp: A novel dataset for cctv traffic camera based accident analysis. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp. 1–9 (2018)
Maha Vishnu, V.; Rajalakshmi, M.; Nedunchezhian, R.: Intelligent traffic video surveillance and accident detection system with dynamic traffic signal control. Cluster Comput. 21(1), 135–147 (2018)
Singh, D.; Mohan, C.K.: Deep spatio-temporal representation for detection of road accidents using stacked autoencoder. IEEE Trans. Intell. Transport. Sys. 20(3), 879–887 (2018)
Sabokrou, M.; Fayyaz, M.; Fathy, M.; Moayed, Z.; Klette, R.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vision Image Understand. 172, 88–97 (2018)
Chong, Y.S.; Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, pp. 189–196 (2017). Springer
Sultani, W.; Chen, C.; Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6479–6488 (2018)
Zhu, Y.; Newsam, S.: Motion-aware feature for improved video anomaly detection. arXiv preprint arXiv:1907.10211 (2019)
Li, L.; Jiang, R.; He, Z.; Chen, X.M.; Zhou, X.: Trajectory data-based traffic flow studies: a revisit. Trans. Res. Part C: Emerg. Technol. 114, 225–240 (2020)
Tian, Y.; Dehghan, A.; Shah, M.: On detection, data association and segmentation for multi-target tracking. IEEE Trans. Patt. Anal. Mach. Intell. 41(9), 2146–2160 (2018)
Cai, W.; Wei, Z.: Piigan: generative adversarial networks for pluralistic image inpainting. IEEE Access 8, 48451–48463 (2020)
You, H.; Tian, S.; Yu, L.; Lv, Y.: Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans. Geosci. Remote Sens. 58(2), 1281–1293 (2019)
Yang, Z.-L.; Guo, X.-Q.; Chen, Z.-M.; Huang, Y.-F.; Zhang, Y.-J.: Rnn-stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inform. Forens. Security 14(5), 1280–1295 (2018)
Zhang, L.; Zhu, G.; Shen, P.; Song, J.; Afaq Shah, S.; Bennamoun, M.: Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 3120–3128 (2017)
Sharma, R.; Sungheetha, A.; et al.: An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J. Soft Comput. Paradigm (JSCP) 3(02), 55–69 (2021)
Li, Y.; Liu, M.; Rehg, J.: In the eye of the beholder: gaze and actions in first person video. In: IEEE Transactions on pattern analysis and machine intelligence (2021)
Varghese, E.B.; Thampi, S.M.: A deep learning approach to predict crowd behavior based on emotion. In: International conference on smart multimedia, pp. 296–307 (2018). Springer
Maqsood, R.; Bajwa, U.I.; Saleem, G.; Raza, R.H.; Anwar, M.W.: Anomaly recognition from surveillance videos using 3d convolution neural network. Multim. Tools Appl. 80(12), 18693–18716 (2021)
Abavisani, M.; Joze, H.R.V.; Patel, V.M.: Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1165–1174 (2019)
Koppikar, U.; Sujatha, C.; Patil, P.; Mudenagudi, U.: Real-world anomaly detection using deep learning. In: International conference on intelligent computing and communication, pp. 333–342 (2019). Springer
Chalapathy, R.; Chawla, S.: Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019)
Kazakos, E.; Nagrani, A.; Zisserman, A.; Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 5492–5501 (2019)
Feichtenhofer, C.; Fan, H.; Xiong, B.; Girshick, R.; He, K.: A large-scale study on unsupervised spatiotemporal representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3299–3309 (2021)
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; et al.: Deep learning and process understanding for data-driven earth system science. Nature 566(7743), 195–204 (2019)
Zhang, Z.; Tao, D.: Slow feature analysis for human action recognition. IEEE Trans. Patt. Anal. Mach. Intell. 34(3), 436–450 (2012)
Jayaraman, D.; Grauman, K.: Slow and steady feature analysis: higher order temporal coherence in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3852–3861 (2016)
Qian, R.; Meng, T.; Gong, B.; Yang, M.-H.; Wang, H.; Belongie, S.; Cui, Y.: Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6964–6974 (2021)
Hong, X.; Lan, Y.; Pang, L.; Guo, J.; Cheng, X.: Transformation driven visual reasoning. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 6903–6912 (2021)
Sabokrou, M.; Fayyaz, M.; Fathy, M.; Klette, R.: Deep-cascade: cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)
Luo, W.; Liu, W.; Gao, S.: Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE , pp. 439–444 (2017)
Ahsan, U.; Sun, C.; Essa, I.: Discrimnet: semi-supervised action recognition from videos using generative adversarial networks. arXiv:1801.07230 (2018)
Mir, A.M.; Yousaf, M.H.; Dawood, H.: Criminal action recognition using spatiotemporal human motion acceleration descriptor. J. Electr. Imag. 27(6), 063016 (2018)
Ahmed, W.; Yousaf, M.H.; Yasin, A.: Robust suspicious action recognition approach using pose descriptor. Math. Prob. Eng. (2021). https://doi.org/10.1155/2021/2449603
Perez, M.; Liu, J.; Kot, A.C.: Interaction relational network for mutual action recognition. IEEE Trans. Multim. 24, 366–376 (2021)
Ko, K.-E.; Sim, K.-B.: Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 67, 226–234 (2018)
Sahoo, S.P.; Ari, S.: On an algorithm for human action recognition. Expert Sys. Appl. 115, 524–534 (2019)
Ke, Q.; Bennamoun, M.; An, S.; Sohel, F.; Boussaid, F.: Leveraging structural context models and ranking score fusion for human interaction prediction. IEEE Trans. Multim. 20(7), 1712–1723 (2017)
Acknowledgements
The authors acknowledge the funding from Natio-nal Centre for Robotics and Automation (NCRA) for this research work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ahmed, W., Yousaf, M.H. A Deep Autoencoder-Based Approach for Suspicious Action Recognition in Surveillance Videos. Arab J Sci Eng 49, 3517–3532 (2024). https://doi.org/10.1007/s13369-023-08038-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-08038-7