Violence detection in videos using interest frame extraction and 3D convolutional neural network

Mahmoodi, Javad; Nezamabadi-pour, Hossein; Abbasi-Moghadam, Dariush

doi:10.1007/s11042-022-12532-9

Violence detection in videos using interest frame extraction and 3D convolutional neural network

Published: 12 March 2022

Volume 81, pages 20945–20961, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Javad Mahmoodi ORCID: orcid.org/0000-0003-3758-0451¹,
Hossein Nezamabadi-pour¹ &
Dariush Abbasi-Moghadam¹

533 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

With the rapid development of detecting violent behaviors in surveillance cameras, requests on systems that automatically recognize violent events are expanded. Nowadays, violence detection has become an active research field in image processing and machine learning. The relevant works in such a field are classified into hand-crafted and deep learning methods. Despite the effectiveness of hand-crafted ones, their computational cost may be suppressive for practical applications. Additionally, deep learning techniques usually exploit 3D Convolutional Networks (3D ConvNets) to do this task. To improve the accuracy of these networks, meaningful regions and temporal changes in videos should be considered. Consequently, the performance of a 3D ConvNet can be reinforced by selecting significant temporal information and noticing to special regions in two spatial dimensions. In this work, we propose a novel 3D ConvNet along with a technique for extracting interest frames. The Structural Similarity Index Measure (SSIM) is exploited to extract interest frames as significant temporal information. Indeed, the SSIM uses the statistical features of two consecutive frames for this reason. In this way, sixteen video frames with the smallest SSIM are considered as dominant motion frames, which are then sent to a 3D CNN for classification. Moreover, a spatial attention module is exploited to make attention on the specific regions. Furthermore, three benchmark datasets are employed to evaluate the performance of the proposed method. The results show that in terms of accuracy, our scheme outperforms existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Article 25 September 2020

Notes

https://www.sites.google.com/view/javadmahmoodi

References

Bellamine I and Tairi H (2016) "Motion detection using color space-time interest points," in lecture notes in electrical engineering
Ben Mabrouk A, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recogn Lett 92:62–67
Article Google Scholar
Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491
Article Google Scholar
Bermejo Nievas E, Deniz Suarez O, Bueno García G and Sukthankar R (2011) "violence detection in video using computer vision techniques," in computer analysis of images and patterns, Violence detection in video using computer vision techniques.
Bilen H, Fernando B, Gavves E, Vedaldi A (Dec. 2018) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40(12):2799–2813
Article Google Scholar
Chen MY and Hauptmann A (2009) MoSIFT: Recognizing Human Actions in Surveillance Videos
Dai Q, Zhao R, Wu Z, Wang X, Gu Z, Wu W, Jiang Y (2015) "Fudan-Huawei at MediaEval 2015: detecting violent scenes and affective impact in movies with deep learning," in MediaEval
Dalal N, Triggs B and Schmid C (2006) "human detection using oriented histograms of flow and appearance," in computer vision – ECCV 2006, .
Google Scholar
De Souza FD, Cha GC, Do Valle EA, De A, Araujo A "Violence detection in video using spatio-temporal features," 2010 23rd SIBGRAPI conference on graphics. Patt Images 2010
Deepak K, Vignesh LKP, Chandrakala S (2020) Autocorrelation of gradients based violence detection in surveillance videos. ICT Express 6(3):155–159
Article Google Scholar
Demarty C, Penet C, Soleymani M, Gravier G (2014) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74(17):7379–7404
Article Google Scholar
Ding C, Fan S , Zhu M, Feng W and Jia B (2014) "Violence detection in video by using 3D convolutional neural networks," in Advances in Visual Computing. ISVC 2014. Lect Notes Comput Sci,.
Dong Z, Qin J and Wang Y, "Multi-stream deep networks for person to person violence detection in videos," in Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, Singapore, 2016.
Febin IP, Jayasree K, Joy PT (2020) Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement fltering algorithm. Pattern Anal Applic 23:611–623
Article Google Scholar
Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48-49:37–41
Article Google Scholar
Giannakopoulos T, Kosmopoulos D, Aristidou A and Theodoridis S (2006) "violence content classification using audio features," in advances in artificial intelligence,
Giannakopoulos T, Pikrakis A, Theodoridis S (2007) "A multi-class audio classification method with respect to violent content in movies using Bayesian networks," in 2007 IEEE 9th workshop on multimedia signal processing
Gu C, Wu X, Wang S (2020) Violent video detection based on semantic correspondence. IEEE Access:85958–85967
Harris C, Stephens M (1988) "a combined corner and edge detector," in Procedings of the Alvey vision conference
Google Scholar
T. Hassner, Y. Itcher and O. Kliper-Gross (2012) "Violent flows: Real-time detection of violent crowd behavior," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, .
Jain A, Vishwakarma DK (2020) "Deep NeuralNet for violence detection using motion features from dynamic images," in 2020 third international conference on smart systems and inventive technology (ICSSIT). Tirunelveli, India
Keçeli A, Kaya A (2017) Violent activity detection with transfer learning method. Electron Lett 53(13):1047–1048
Article Google Scholar
Kooij J, Liem M, Krijnders J, Andringa T, Gavrila D (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Laptev and Lindeberg (2003) "Space-time interest points," Proceedings Ninth IEEE International Conference on Computer Vision,
Liang Q, Li Y, Chen B, Yang K (2021) Violence behavior recognition of two-Cascade temporal shift module with attention mechanism. J Electronic Imaging 30(4)
Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mahmoodi J, Salajeghe A (2019) A classification method based on optical flow for violence detection. 127:121–127
Meng Z, Yuan J and Li Z (2017) "Trajectory-pooled deep convolutional networks for violence detection in videos," in lecture notes in computer science,
Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst Appl
Ramzan M, Abid A, Khan HU, Awan SM, Ismail A, Ahmed M, Mahmood A (2019) A review on state-of-the-art violence detection techniques. IEEE Access 7:107560–107575
Article Google Scholar
Rendón-Segador FJ, Álvarez-García JA, Enríquez F, Deniz O (2021) ViolenceNet: Dense multi-head self-attention with bidirectional Convolutional LSTM for detecting violence. Electronics 10(13):1601
Article Google Scholar
Roman DG, Chavez GC (2020) "violence detection and localization in surveillance video," in 2020 33rd SIBGRAPI conference on graphics. Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil
Google Scholar
Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G (2018) Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans Image Process 27(10):4787–4797
Article MathSciNet Google Scholar
Shi X, Chen Z, Wang H, Yeung DY (2015) "convolutional LSTM network: a machine learning approach for precipitation Nowcasting," in neural information processing systems (NIPS). Montreal, Canada
Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7:39172–39179
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Studholme C, Hill D, Hawkes D (1999) An overlap invariant entropy measure of 3D medical image alignment. Pattern Recogn 32(1):71–86
Article Google Scholar
Sudhakaran S, Lanz O (2017) "Learning to detect violent videos using convolutional long short-term memory," in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS)
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) "Learning spatiotemporal features with 3D convolutional networks," in 2015 IEEE international conference on computer vision (ICCV)
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. Computer Vision – ECCV 2018:3–19
Google Scholar
Xia Q, Zhang P, Wang J, Tian M and Fei C(2018) "real time violence detection based on deep spatio-temporal features," in biometric recognition,
Xu L, Gong C, Yang J, Wu Q, Yao L (2014) "violent video detection based on MoSIFT feature and sparse coding," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2015) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75(12):7327–7349
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Javad Mahmoodi, Hossein Nezamabadi-pour & Dariush Abbasi-Moghadam

Authors

Javad Mahmoodi
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Nezamabadi-pour
View author publications
You can also search for this author in PubMed Google Scholar
Dariush Abbasi-Moghadam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javad Mahmoodi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmoodi, J., Nezamabadi-pour, H. & Abbasi-Moghadam, D. Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimed Tools Appl 81, 20945–20961 (2022). https://doi.org/10.1007/s11042-022-12532-9

Download citation

Received: 17 March 2021
Revised: 18 September 2021
Accepted: 25 January 2022
Published: 12 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11042-022-12532-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violence detection in videos using interest frame extraction and 3D convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Violence detection in videos using interest frame extraction and 3D convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation