An accurate violence detection framework using unsupervised spatial–temporal action translation network

Ehsan, Tahereh Zarrat; Nahvi, Manoochehr; Mohtavipour, Seyed Mehdi

doi:10.1007/s00371-023-02865-3

An accurate violence detection framework using unsupervised spatial–temporal action translation network

Original article
Published: 03 May 2023

Volume 40, pages 1515–1535, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Tahereh Zarrat Ehsan¹,
Manoochehr Nahvi ORCID: orcid.org/0000-0001-9846-314X¹ &
Seyed Mehdi Mohtavipour²

385 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Automatic human behavior monitoring is essential for surveillance cameras in public and private environments. Violent action is challenging because the available violence dataset is insufficient for deep network training. Also, human behavior contains high intra-class variations and inter-class similarities that make violence detection very challenging. In this paper, we proposed an unsupervised Spatial–Temporal Action Translation (STAT) network to accurately distinguish between behaviors and overcome the insufficient violence data problem. Our framework comprises a person detector, motion feature extractor, STAT network, and output interpretation. The proposed framework performed well in different environments because it detects objects in each frame and removes irrelevant background information. As violent motion pattern changes rapidly with high velocity, temporal features play a crucial role in the recognition, and we use it as the input of the STAT network. The STAT network has been trained with normal behavior data, translating normal motion to the spatial frame. Due to the complicated actions in violent behavior, the STAT network cannot reconstruct the violent frame correctly, and therefore, actions will be categorized by comparing the actual and reconstructed frames and measuring the reconstruction error in the output interpretation part of the framework. The proposed unsupervised framework achieved comparable accuracy and outperformed previous works in terms of generality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Deepfake video detection: challenges and opportunities

Article Open access 29 May 2024

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Data availability

Data available on request from the authors.

References

Ertl, A., Sheats, K.J., Petrosky, E., Betz, C.J., Yuan, K., Fowler, K.A.: Surveillance for violent deaths—national violent death reporting system, 32 states, 2016. MMWR Surveill. Summ. 68(9), 1 (2019)
Article PubMed PubMed Central Google Scholar
Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 1–32 (2021)
Google Scholar
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Vision-based human action recognition: an overview and real world challenges. Forensic Sci. Int. Digit. Investig. 32, 200901 (2020)
Article Google Scholar
Zhou, W., Ma, C., Yao, T., Chang, P., Zhang, Q., Kuijper, A.: Histograms of Gaussian normal distribution for 3D feature matching in cluttered scenes. Vis. Comput. 35(4), 489–505 (2019)
Article Google Scholar
Pareek, P., Thakkar, A.: A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 54(3), 2259–2322 (2021)
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: International Conference on Computer Vision, pp. 2556–2563 (2011)
Yu, J., Song, W., Zhou, G., Hou, J.J.: Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation. Multimed. Tools Appl. 78(7), 8497–8512 (2019)
Article Google Scholar
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violence detection in surveillance video using low-level features. PLoS ONE 13(10), e0203668 (2018)
Article PubMed PubMed Central Google Scholar
Mohtavipour, S.M., Saeidi, M., Arabsorkhi, A.: A multi-stream CNN for deep violence detection in video sequences using handcrafted features. Vis. Comput. 38, 2057–2072 (2021)
Article Google Scholar
Farooq, M.U., Saad, M.N.M., Khan, S.D.: Motion-shape-based deep learning approach for divergence behavior detection in high-density crowd. Vis. Comput. 38, 1–25 (2021)
Google Scholar
Qin, Y., Mo, L., Li, C., Luo, J.: Skeleton-based action recognition by part-aware graph convolutional networks. Vis. Comput. 36, 621–631 (2020)
Article Google Scholar
Li, D., Jahan, H., Huang, X., Feng, Z.: Human action recognition method based on historical point cloud trajectory characteristics. Vis. Comput. 38, 1–9 (2021)
Google Scholar
Fernández-Ramírez, J., Álvarez-Meza, A., Pereira, E.M., Orozco-Gutiérrez, A., Castellanos-Dominguez, G.: Video-based social behavior recognition based on kernel relevance analysis. Vis. Comput. 36(8), 1535–1547 (2020)
Article Google Scholar
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behaviour. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012)
Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y.: Violence detection using oriented violent flows. Image Vis. Comput. 48, 37–41 (2016)
Article Google Scholar
Mabrouk, A.B., Zagrouba, E.: Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognit. Lett. 92, 62–67 (2017)
Article ADS Google Scholar
Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R.: Violence detection in video using computer vision techniques. In: International Conference on Computer Analysis of Images and Patterns, pp. 332–339 (2011)
Ehsan, T.Z., Nahvi, M.: Violence detection in indoor surveillance cameras using motion trajectory and differential histogram of optical flow. In: 8^th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 153–158 (2018)
Serrano Gracia, I., Deniz Suarez, O., Bueno Garcia, G., Kim, T.K.: Fast fight detection. PloS One 10(4), e0120448 (2015)
Article PubMed PubMed Central Google Scholar
Accattoli, S., Sernani, P., Falcionelli, N., Mekuria, D.N., Dragoni, A.F.: Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl. Artif. Intell. 34(4), 329–344 (2020)
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Li, C., Zhu, L., Zhu, D., Chen, J., Pan, Z., Li, X., Wang, B.: End-to-end multiplayer violence detection based on deep 3D CNN. In: Proceedings of the VII International Conference on Network, Communication and Computing, pp. 227–230 (2018)
Asad, M., Yang, J., He, J., Shamsolmoali, P., He, X.: Multi-frame feature-fusion-based model for violence detection. Vis. Comput. 37(6), 1415–1431 (2021)
Article Google Scholar
Deepak, K., Chandrakala, S., Mohan, C.K.: Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15, 215–222 (2021)
Article Google Scholar
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Chinese Conference on Pattern Recognition, pp. 517–531 (2016)
Ehsan, T.Z., Mohtavipour, S.M.: Vi-Net: a deep violent flow network for violence detection in video sequences. In: 11th International Conference on Information and Knowledge Technology (IKT), pp. 88–92 (2020)
Serrano, I., Deniz, O., Espinosa-Aranda, J.L., Bueno, G.: Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)
Article MathSciNet PubMed ADS Google Scholar
Foo, G.T., Goh, K.M.: Violence action recognition using region proposal in region convolution neural network. Intell. Decis. Technol. 13(1), 49–65 (2019)
Article Google Scholar
Li, H., Wang, J., Han, J., Zhang, J., Yang, Y., Zhao, Y.: A novel multi-stream method for violent interaction detection using deep learning. Meas. Control 53(5–6), 796–806 (2020)
Article Google Scholar
Ehsan, T.Z., Nahvi, M., Mohtavipour, S.M.: DABA-Net: deep acceleration-based autoencoder network for violence detection in surveillance cameras. In: International Conference on Machine Vision and Image Processing (MVIP), pp. 1–6 (2022)
Ehsan, T.Z, Nahvi, M., Mohtavipour, S.M.: Learning deep latent space for unsupervised violence detection. Multimed. Tools Appl. 82, 1–20 (2022)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of DARPA Image Understanding Workshop, pp. 121–130 (1981)
Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis, pp. 363–370 (2003)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A..: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
Sernani, P., Falcionelli, N., Tomassini, S., Contardo, P., Dragoni, A.F.: Deep learning for automatic violence detection: tests on the AIRTLab dataset. IEEE Access 9, 160580–160595 (2021)
Article Google Scholar
Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 19(11), 2472 (2019)
Article PubMed PubMed Central ADS Google Scholar
Rendón-Segador, F.J., Álvarez-García, J.A., Enríquez, F., Deniz, O.: Violencenet: dense multi-head self-attention with bidirectional convolutional lstm for detecting violence. Electronics 10(13), 1601 (2021)
Article Google Scholar
Soliman, M.M., Kamal, M.H., Nashed, M.A.E.M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In: 9^th International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85 (2019)
Zhang, T., Jia, W., Gong, C., Sun, J., Song, X.: Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognit. Lett. 107, 98–104 (2018)
Article ADS Google Scholar
Chang, Y., Tu, Z., Xie, W., Luo, B., Zhang, S., Sui, H., Yuan, J.: Video anomaly detection with spatio-temporal dissociation. Pattern Recognit. 122, 108213 (2022)
Article Google Scholar
Buckchash, H., Raman, B.: Towards zero shot learning of geometry of motion streams and its application to anomaly recognition. Expert Syst. Appl. 177, 114916 (2021)
Article Google Scholar
Hao, Y., Li, J., Wang, N., Wang, X., Gao, X.: Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit. 121, 108232 (2022)
Article Google Scholar
Hu, X., Dai, J., Huang, Y.P., Yang, H.M., Zhang, L., Chen, W.M., Yang, G.K., Zhang, D.W.: A weakly supervised framework for abnormal behavior detection and localization. Neurocomputing 383, 270–281 (2020)
Article Google Scholar
Deepak, K., Chandrakala, S., Mohan, C.K.: Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1), 215–222 (2021)
Article Google Scholar
Sun, J., Wang, X., Xiong, N., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018)
Article Google Scholar
Samuel, D.J., Cuzzolin, F.: Svd-gan for real-time unsupervised video anomaly detection (2021).

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, University of Guilan, Rasht, Iran
Tahereh Zarrat Ehsan & Manoochehr Nahvi
School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
Seyed Mehdi Mohtavipour

Authors

Tahereh Zarrat Ehsan
View author publications
You can also search for this author in PubMed Google Scholar
Manoochehr Nahvi
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mehdi Mohtavipour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoochehr Nahvi.

Ethics declarations

Conflict of interest

Authors certified that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ehsan, T.Z., Nahvi, M. & Mohtavipour, S.M. An accurate violence detection framework using unsupervised spatial–temporal action translation network. Vis Comput 40, 1515–1535 (2024). https://doi.org/10.1007/s00371-023-02865-3

Download citation

Accepted: 06 April 2023
Published: 03 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02865-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An accurate violence detection framework using unsupervised spatial–temporal action translation network

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An accurate violence detection framework using unsupervised spatial–temporal action translation network

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation