Skip to main content

Advertisement

Log in

An accurate violence detection framework using unsupervised spatial–temporal action translation network

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Automatic human behavior monitoring is essential for surveillance cameras in public and private environments. Violent action is challenging because the available violence dataset is insufficient for deep network training. Also, human behavior contains high intra-class variations and inter-class similarities that make violence detection very challenging. In this paper, we proposed an unsupervised Spatial–Temporal Action Translation (STAT) network to accurately distinguish between behaviors and overcome the insufficient violence data problem. Our framework comprises a person detector, motion feature extractor, STAT network, and output interpretation. The proposed framework performed well in different environments because it detects objects in each frame and removes irrelevant background information. As violent motion pattern changes rapidly with high velocity, temporal features play a crucial role in the recognition, and we use it as the input of the STAT network. The STAT network has been trained with normal behavior data, translating normal motion to the spatial frame. Due to the complicated actions in violent behavior, the STAT network cannot reconstruct the violent frame correctly, and therefore, actions will be categorized by comparing the actual and reconstructed frames and measuring the reconstruction error in the output interpretation part of the framework. The proposed unsupervised framework achieved comparable accuracy and outperformed previous works in terms of generality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data available on request from the authors.

References

  1. Ertl, A., Sheats, K.J., Petrosky, E., Betz, C.J., Yuan, K., Fowler, K.A.: Surveillance for violent deaths—national violent death reporting system, 32 states, 2016. MMWR Surveill. Summ. 68(9), 1 (2019)

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 1–32 (2021)

    Google Scholar 

  3. Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Vision-based human action recognition: an overview and real world challenges. Forensic Sci. Int. Digit. Investig. 32, 200901 (2020)

    Article  Google Scholar 

  4. Zhou, W., Ma, C., Yao, T., Chang, P., Zhang, Q., Kuijper, A.: Histograms of Gaussian normal distribution for 3D feature matching in cluttered scenes. Vis. Comput. 35(4), 489–505 (2019)

    Article  Google Scholar 

  5. Pareek, P., Thakkar, A.: A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 54(3), 2259–2322 (2021)

    Article  Google Scholar 

  6. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  7. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: International Conference on Computer Vision, pp. 2556–2563 (2011)

  8. Yu, J., Song, W., Zhou, G., Hou, J.J.: Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation. Multimed. Tools Appl. 78(7), 8497–8512 (2019)

    Article  Google Scholar 

  9. Zhou, P., Ding, Q., Luo, H., Hou, X.: Violence detection in surveillance video using low-level features. PLoS ONE 13(10), e0203668 (2018)

    Article  PubMed  PubMed Central  Google Scholar 

  10. Mohtavipour, S.M., Saeidi, M., Arabsorkhi, A.: A multi-stream CNN for deep violence detection in video sequences using handcrafted features. Vis. Comput. 38, 2057–2072 (2021)

    Article  Google Scholar 

  11. Farooq, M.U., Saad, M.N.M., Khan, S.D.: Motion-shape-based deep learning approach for divergence behavior detection in high-density crowd. Vis. Comput. 38, 1–25 (2021)

    Google Scholar 

  12. Qin, Y., Mo, L., Li, C., Luo, J.: Skeleton-based action recognition by part-aware graph convolutional networks. Vis. Comput. 36, 621–631 (2020)

    Article  Google Scholar 

  13. Li, D., Jahan, H., Huang, X., Feng, Z.: Human action recognition method based on historical point cloud trajectory characteristics. Vis. Comput. 38, 1–9 (2021)

    Google Scholar 

  14. Fernández-Ramírez, J., Álvarez-Meza, A., Pereira, E.M., Orozco-Gutiérrez, A., Castellanos-Dominguez, G.: Video-based social behavior recognition based on kernel relevance analysis. Vis. Comput. 36(8), 1535–1547 (2020)

    Article  Google Scholar 

  15. Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behaviour. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012)

  16. Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y.: Violence detection using oriented violent flows. Image Vis. Comput. 48, 37–41 (2016)

    Article  Google Scholar 

  17. Mabrouk, A.B., Zagrouba, E.: Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognit. Lett. 92, 62–67 (2017)

    Article  ADS  Google Scholar 

  18. Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R.: Violence detection in video using computer vision techniques. In: International Conference on Computer Analysis of Images and Patterns, pp. 332–339 (2011)

  19. Ehsan, T.Z., Nahvi, M.: Violence detection in indoor surveillance cameras using motion trajectory and differential histogram of optical flow. In: 8th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 153–158 (2018)

  20. Serrano Gracia, I., Deniz Suarez, O., Bueno Garcia, G., Kim, T.K.: Fast fight detection. PloS One 10(4), e0120448 (2015)

    Article  PubMed  PubMed Central  Google Scholar 

  21. Accattoli, S., Sernani, P., Falcionelli, N., Mekuria, D.N., Dragoni, A.F.: Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl. Artif. Intell. 34(4), 329–344 (2020)

    Article  Google Scholar 

  22. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

  23. Li, C., Zhu, L., Zhu, D., Chen, J., Pan, Z., Li, X., Wang, B.: End-to-end multiplayer violence detection based on deep 3D CNN. In: Proceedings of the VII International Conference on Network, Communication and Computing, pp. 227–230 (2018)

  24. Asad, M., Yang, J., He, J., Shamsolmoali, P., He, X.: Multi-frame feature-fusion-based model for violence detection. Vis. Comput. 37(6), 1415–1431 (2021)

    Article  Google Scholar 

  25. Deepak, K., Chandrakala, S., Mohan, C.K.: Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15, 215–222 (2021)

    Article  Google Scholar 

  26. Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Chinese Conference on Pattern Recognition, pp. 517–531 (2016)

  27. Ehsan, T.Z., Mohtavipour, S.M.: Vi-Net: a deep violent flow network for violence detection in video sequences. In: 11th International Conference on Information and Knowledge Technology (IKT), pp. 88–92 (2020)

  28. Serrano, I., Deniz, O., Espinosa-Aranda, J.L., Bueno, G.: Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  29. Foo, G.T., Goh, K.M.: Violence action recognition using region proposal in region convolution neural network. Intell. Decis. Technol. 13(1), 49–65 (2019)

    Article  Google Scholar 

  30. Li, H., Wang, J., Han, J., Zhang, J., Yang, Y., Zhao, Y.: A novel multi-stream method for violent interaction detection using deep learning. Meas. Control 53(5–6), 796–806 (2020)

    Article  Google Scholar 

  31. Ehsan, T.Z., Nahvi, M., Mohtavipour, S.M.: DABA-Net: deep acceleration-based autoencoder network for violence detection in surveillance cameras. In: International Conference on Machine Vision and Image Processing (MVIP), pp. 1–6 (2022)

  32. Ehsan, T.Z, Nahvi, M., Mohtavipour, S.M.: Learning deep latent space for unsupervised violence detection. Multimed. Tools Appl. 82, 1–20 (2022)

    Google Scholar 

  33. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)

    Article  Google Scholar 

  34. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  35. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  36. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of DARPA Image Understanding Workshop, pp. 121–130 (1981)

  37. Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)

    Article  Google Scholar 

  38. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis, pp. 363–370 (2003)

  39. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A..: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  40. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)

  41. Sernani, P., Falcionelli, N., Tomassini, S., Contardo, P., Dragoni, A.F.: Deep learning for automatic violence detection: tests on the AIRTLab dataset. IEEE Access 9, 160580–160595 (2021)

    Article  Google Scholar 

  42. Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 19(11), 2472 (2019)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  43. Rendón-Segador, F.J., Álvarez-García, J.A., Enríquez, F., Deniz, O.: Violencenet: dense multi-head self-attention with bidirectional convolutional lstm for detecting violence. Electronics 10(13), 1601 (2021)

    Article  Google Scholar 

  44. Soliman, M.M., Kamal, M.H., Nashed, M.A.E.M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In: 9th International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85 (2019)

  45. Zhang, T., Jia, W., Gong, C., Sun, J., Song, X.: Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognit. Lett. 107, 98–104 (2018)

    Article  ADS  Google Scholar 

  46. Chang, Y., Tu, Z., Xie, W., Luo, B., Zhang, S., Sui, H., Yuan, J.: Video anomaly detection with spatio-temporal dissociation. Pattern Recognit. 122, 108213 (2022)

    Article  Google Scholar 

  47. Buckchash, H., Raman, B.: Towards zero shot learning of geometry of motion streams and its application to anomaly recognition. Expert Syst. Appl. 177, 114916 (2021)

    Article  Google Scholar 

  48. Hao, Y., Li, J., Wang, N., Wang, X., Gao, X.: Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit. 121, 108232 (2022)

    Article  Google Scholar 

  49. Hu, X., Dai, J., Huang, Y.P., Yang, H.M., Zhang, L., Chen, W.M., Yang, G.K., Zhang, D.W.: A weakly supervised framework for abnormal behavior detection and localization. Neurocomputing 383, 270–281 (2020)

    Article  Google Scholar 

  50. Deepak, K., Chandrakala, S., Mohan, C.K.: Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1), 215–222 (2021)

    Article  Google Scholar 

  51. Sun, J., Wang, X., Xiong, N., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018)

    Article  Google Scholar 

  52. Samuel, D.J., Cuzzolin, F.: Svd-gan for real-time unsupervised video anomaly detection (2021).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoochehr Nahvi.

Ethics declarations

Conflict of interest

Authors certified that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ehsan, T.Z., Nahvi, M. & Mohtavipour, S.M. An accurate violence detection framework using unsupervised spatial–temporal action translation network. Vis Comput 40, 1515–1535 (2024). https://doi.org/10.1007/s00371-023-02865-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02865-3

Keywords

Navigation