Skip to main content
Log in

A Deep Autoencoder-Based Approach for Suspicious Action Recognition in Surveillance Videos

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

In the recent era of technological advancements, surveillance cameras are installed in crowded areas to ensure public protection. In the video surveillance context, contents belonging to suspicious actions are very less in course of the surveillance stream. Therefore, manual monitoring of suspicious actions may become very exhaustive, which effects reliability and speed during emergencies due to monitoring tiredness, so the importance of suspicious action detection is very clear. We first address the issue of detecting suspicious activities from the surveillance videos with our proposed CNN-based autoencoder. The features are extracted using a three-dimensional convolutional neural network (C3D) and fed to our proposed autoencoder framework, which detects the localization of activity based on high reconstruction loss. For normal video clips, we have seen low reconstruction loss and the converse is seen for video clips containing suspicious actions. Secondly, we extract these suspicious clips from the long surveillance videos and use them to classify various suspicious actions with the help of our proposed generative adversarial network (GAN). We evaluate the performance of our work with benchmark datasets, namely UT interaction, hybrid crime action (HCA), and UCF crime. The results show the effectiveness of our work and as achieved accuracies are 97.5%, 89.6%, and 47.34% on UT interaction, HCA and UCF crime dataset, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Foorthuis, R.: On the nature and types of anomalies: a review of deviations in data. Int. J. Data Sci. Anal. 12(4), 297–331 (2021)

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C.: Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9592–9600 (2019)

  3. Himeur, Y.; Ghanem, K.; Alsalemi, A.; Bensaali, F.; Amira, A.: Artificial intelligence based anomaly detection of energy consumption in buildings: a review, current trends and new perspectives. Appl. Energy 287, 116601 (2021)

    Article  Google Scholar 

  4. Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R.: A unifying review of deep and shallow anomaly detection. In: Proceedings of the IEEE (2021)

  5. Thudumu, S.; Branch, P.; Jin, J.; Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020)

    Article  Google Scholar 

  6. Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W.: Cnn features with bi-directional lstm for real-time anomaly detection in surveillance networks. Multim. Tools Appl. 80(11), 16979–16995 (2021)

    Article  Google Scholar 

  7. Landi, F.; Snoek, C.G.; Cucchiara, R.: Anomaly locality in video surveillance. arXiv preprint arXiv:1901.10364 (2019)

  8. Nguyen, T.-N.; Meunier, J.: Anomaly detection in video sequence with appearance-motion correspondence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1273–1283 (2019)

  9. Vu, H.; Nguyen, T.D.; Le, T.; Luo, W.; Phung, D.: Robust anomaly detection in videos using multilevel representations. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 5216–5223 (2019)

  10. Ionescu, R.T.; Khan, F.S.; Georgescu, M.-I.; Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7842–7851 (2019)

  11. Huynh-The, T.; Hua, C.-H.; Kim, D.-S.: Encoding pose features to images with data augmentation for 3-d action recognition. IEEE Trans. Ind. Infor. 16(5), 3100–3111 (2019)

    Article  Google Scholar 

  12. Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)

  13. Mohammadi, S.; Kiani, H.; Perina, A.; Murino, V.: Violence detection in crowded scenes using substantial derivative. In: 2015 12th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE , pp. 1–6 (2015)

  14. Zhang, T.; Yang, Z.; Jia, W.; Yang, B.; Yang, J.; He, X.: A new method for violence detection in surveillance scenes. Multim. Tools Appl. 75(12), 7327–7349 (2016)

    Article  Google Scholar 

  15. Shah, A.P.; Lamare, J.-B.; Nguyen-Anh, T.; Hauptmann, A.: Cadp: A novel dataset for cctv traffic camera based accident analysis. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp. 1–9 (2018)

  16. Maha Vishnu, V.; Rajalakshmi, M.; Nedunchezhian, R.: Intelligent traffic video surveillance and accident detection system with dynamic traffic signal control. Cluster Comput. 21(1), 135–147 (2018)

    Article  Google Scholar 

  17. Singh, D.; Mohan, C.K.: Deep spatio-temporal representation for detection of road accidents using stacked autoencoder. IEEE Trans. Intell. Transport. Sys. 20(3), 879–887 (2018)

    Article  Google Scholar 

  18. Sabokrou, M.; Fayyaz, M.; Fathy, M.; Moayed, Z.; Klette, R.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vision Image Understand. 172, 88–97 (2018)

    Article  Google Scholar 

  19. Chong, Y.S.; Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, pp. 189–196 (2017). Springer

  20. Sultani, W.; Chen, C.; Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6479–6488 (2018)

  21. Zhu, Y.; Newsam, S.: Motion-aware feature for improved video anomaly detection. arXiv preprint arXiv:1907.10211 (2019)

  22. Li, L.; Jiang, R.; He, Z.; Chen, X.M.; Zhou, X.: Trajectory data-based traffic flow studies: a revisit. Trans. Res. Part C: Emerg. Technol. 114, 225–240 (2020)

    Article  Google Scholar 

  23. Tian, Y.; Dehghan, A.; Shah, M.: On detection, data association and segmentation for multi-target tracking. IEEE Trans. Patt. Anal. Mach. Intell. 41(9), 2146–2160 (2018)

    Article  Google Scholar 

  24. Cai, W.; Wei, Z.: Piigan: generative adversarial networks for pluralistic image inpainting. IEEE Access 8, 48451–48463 (2020)

    Article  Google Scholar 

  25. You, H.; Tian, S.; Yu, L.; Lv, Y.: Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans. Geosci. Remote Sens. 58(2), 1281–1293 (2019)

    Article  ADS  Google Scholar 

  26. Yang, Z.-L.; Guo, X.-Q.; Chen, Z.-M.; Huang, Y.-F.; Zhang, Y.-J.: Rnn-stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inform. Forens. Security 14(5), 1280–1295 (2018)

    Article  Google Scholar 

  27. Zhang, L.; Zhu, G.; Shen, P.; Song, J.; Afaq Shah, S.; Bennamoun, M.: Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 3120–3128 (2017)

  28. Sharma, R.; Sungheetha, A.; et al.: An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J. Soft Comput. Paradigm (JSCP) 3(02), 55–69 (2021)

    Article  Google Scholar 

  29. Li, Y.; Liu, M.; Rehg, J.: In the eye of the beholder: gaze and actions in first person video. In: IEEE Transactions on pattern analysis and machine intelligence (2021)

  30. Varghese, E.B.; Thampi, S.M.: A deep learning approach to predict crowd behavior based on emotion. In: International conference on smart multimedia, pp. 296–307 (2018). Springer

  31. Maqsood, R.; Bajwa, U.I.; Saleem, G.; Raza, R.H.; Anwar, M.W.: Anomaly recognition from surveillance videos using 3d convolution neural network. Multim. Tools Appl. 80(12), 18693–18716 (2021)

    Article  Google Scholar 

  32. Abavisani, M.; Joze, H.R.V.; Patel, V.M.: Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1165–1174 (2019)

  33. Koppikar, U.; Sujatha, C.; Patil, P.; Mudenagudi, U.: Real-world anomaly detection using deep learning. In: International conference on intelligent computing and communication, pp. 333–342 (2019). Springer

  34. Chalapathy, R.; Chawla, S.: Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019)

  35. Kazakos, E.; Nagrani, A.; Zisserman, A.; Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 5492–5501 (2019)

  36. Feichtenhofer, C.; Fan, H.; Xiong, B.; Girshick, R.; He, K.: A large-scale study on unsupervised spatiotemporal representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3299–3309 (2021)

  37. Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; et al.: Deep learning and process understanding for data-driven earth system science. Nature 566(7743), 195–204 (2019)

    Article  ADS  CAS  PubMed  Google Scholar 

  38. Zhang, Z.; Tao, D.: Slow feature analysis for human action recognition. IEEE Trans. Patt. Anal. Mach. Intell. 34(3), 436–450 (2012)

    Article  Google Scholar 

  39. Jayaraman, D.; Grauman, K.: Slow and steady feature analysis: higher order temporal coherence in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3852–3861 (2016)

  40. Qian, R.; Meng, T.; Gong, B.; Yang, M.-H.; Wang, H.; Belongie, S.; Cui, Y.: Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6964–6974 (2021)

  41. Hong, X.; Lan, Y.; Pang, L.; Guo, J.; Cheng, X.: Transformation driven visual reasoning. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 6903–6912 (2021)

  42. Sabokrou, M.; Fayyaz, M.; Fathy, M.; Klette, R.: Deep-cascade: cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  43. Luo, W.; Liu, W.; Gao, S.: Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE , pp. 439–444 (2017)

  44. Ahsan, U.; Sun, C.; Essa, I.: Discrimnet: semi-supervised action recognition from videos using generative adversarial networks. arXiv:1801.07230 (2018)

  45. Mir, A.M.; Yousaf, M.H.; Dawood, H.: Criminal action recognition using spatiotemporal human motion acceleration descriptor. J. Electr. Imag. 27(6), 063016 (2018)

    Google Scholar 

  46. Ahmed, W.; Yousaf, M.H.; Yasin, A.: Robust suspicious action recognition approach using pose descriptor. Math. Prob. Eng. (2021). https://doi.org/10.1155/2021/2449603

    Article  Google Scholar 

  47. Perez, M.; Liu, J.; Kot, A.C.: Interaction relational network for mutual action recognition. IEEE Trans. Multim. 24, 366–376 (2021)

  48. Ko, K.-E.; Sim, K.-B.: Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 67, 226–234 (2018)

    Article  Google Scholar 

  49. Sahoo, S.P.; Ari, S.: On an algorithm for human action recognition. Expert Sys. Appl. 115, 524–534 (2019)

    Article  Google Scholar 

  50. Ke, Q.; Bennamoun, M.; An, S.; Sohel, F.; Boussaid, F.: Leveraging structural context models and ranking score fusion for human interaction prediction. IEEE Trans. Multim. 20(7), 1712–1723 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the funding from Natio-nal Centre for Robotics and Automation (NCRA) for this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Haroon Yousaf.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, W., Yousaf, M.H. A Deep Autoencoder-Based Approach for Suspicious Action Recognition in Surveillance Videos. Arab J Sci Eng 49, 3517–3532 (2024). https://doi.org/10.1007/s13369-023-08038-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-08038-7

Keywords

Navigation