Skip to main content
Log in

Violence detection in videos using interest frame extraction and 3D convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid development of detecting violent behaviors in surveillance cameras, requests on systems that automatically recognize violent events are expanded. Nowadays, violence detection has become an active research field in image processing and machine learning. The relevant works in such a field are classified into hand-crafted and deep learning methods. Despite the effectiveness of hand-crafted ones, their computational cost may be suppressive for practical applications. Additionally, deep learning techniques usually exploit 3D Convolutional Networks (3D ConvNets) to do this task. To improve the accuracy of these networks, meaningful regions and temporal changes in videos should be considered. Consequently, the performance of a 3D ConvNet can be reinforced by selecting significant temporal information and noticing to special regions in two spatial dimensions. In this work, we propose a novel 3D ConvNet along with a technique for extracting interest frames. The Structural Similarity Index Measure (SSIM) is exploited to extract interest frames as significant temporal information. Indeed, the SSIM uses the statistical features of two consecutive frames for this reason. In this way, sixteen video frames with the smallest SSIM are considered as dominant motion frames, which are then sent to a 3D CNN for classification. Moreover, a spatial attention module is exploited to make attention on the specific regions. Furthermore, three benchmark datasets are employed to evaluate the performance of the proposed method. The results show that in terms of accuracy, our scheme outperforms existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.sites.google.com/view/javadmahmoodi

References

  1. Bellamine I and Tairi H (2016) "Motion detection using color space-time interest points," in lecture notes in electrical engineering

  2. Ben Mabrouk A, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recogn Lett 92:62–67

    Article  Google Scholar 

  3. Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491

    Article  Google Scholar 

  4. Bermejo Nievas E, Deniz Suarez O, Bueno García G and Sukthankar R (2011) "violence detection in video using computer vision techniques," in computer analysis of images and patterns, Violence detection in video using computer vision techniques.

  5. Bilen H, Fernando B, Gavves E, Vedaldi A (Dec. 2018) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40(12):2799–2813

    Article  Google Scholar 

  6. Chen MY and Hauptmann A (2009) MoSIFT: Recognizing Human Actions in Surveillance Videos

  7. Dai Q, Zhao R, Wu Z, Wang X, Gu Z, Wu W, Jiang Y (2015) "Fudan-Huawei at MediaEval 2015: detecting violent scenes and affective impact in movies with deep learning," in MediaEval

  8. Dalal N, Triggs B and Schmid C (2006) "human detection using oriented histograms of flow and appearance," in computer vision – ECCV 2006, .

    Google Scholar 

  9. De Souza FD, Cha GC, Do Valle EA, De A, Araujo A "Violence detection in video using spatio-temporal features," 2010 23rd SIBGRAPI conference on graphics. Patt Images 2010

  10. Deepak K, Vignesh LKP, Chandrakala S (2020) Autocorrelation of gradients based violence detection in surveillance videos. ICT Express 6(3):155–159

    Article  Google Scholar 

  11. Demarty C, Penet C, Soleymani M, Gravier G (2014) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74(17):7379–7404

    Article  Google Scholar 

  12. Ding C, Fan S , Zhu M, Feng W and Jia B (2014) "Violence detection in video by using 3D convolutional neural networks," in Advances in Visual Computing. ISVC 2014. Lect Notes Comput Sci,.

  13. Dong Z, Qin J and Wang Y, "Multi-stream deep networks for person to person violence detection in videos," in Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, Singapore, 2016.

  14. Febin IP, Jayasree K, Joy PT (2020) Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement fltering algorithm. Pattern Anal Applic 23:611–623

    Article  Google Scholar 

  15. Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48-49:37–41

    Article  Google Scholar 

  16. Giannakopoulos T, Kosmopoulos D, Aristidou A and Theodoridis S (2006) "violence content classification using audio features," in advances in artificial intelligence,

  17. Giannakopoulos T, Pikrakis A, Theodoridis S (2007) "A multi-class audio classification method with respect to violent content in movies using Bayesian networks," in 2007 IEEE 9th workshop on multimedia signal processing

  18. Gu C, Wu X, Wang S (2020) Violent video detection based on semantic correspondence. IEEE Access:85958–85967

  19. Harris C, Stephens M (1988) "a combined corner and edge detector," in Procedings of the Alvey vision conference

    Google Scholar 

  20. T. Hassner, Y. Itcher and O. Kliper-Gross (2012) "Violent flows: Real-time detection of violent crowd behavior," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, .

  21. Jain A, Vishwakarma DK (2020) "Deep NeuralNet for violence detection using motion features from dynamic images," in 2020 third international conference on smart systems and inventive technology (ICSSIT). Tirunelveli, India

  22. Keçeli A, Kaya A (2017) Violent activity detection with transfer learning method. Electron Lett 53(13):1047–1048

    Article  Google Scholar 

  23. Kooij J, Liem M, Krijnders J, Andringa T, Gavrila D (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120

    Article  Google Scholar 

  24. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  25. Laptev and Lindeberg (2003) "Space-time interest points," Proceedings Ninth IEEE International Conference on Computer Vision,

  26. Liang Q, Li Y, Chen B, Yang K (2021) Violence behavior recognition of two-Cascade temporal shift module with attention mechanism. J Electronic Imaging 30(4)

  27. Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  28. Mahmoodi J, Salajeghe A (2019) A classification method based on optical flow for violence detection. 127:121–127

  29. Meng Z, Yuan J and Li Z (2017) "Trajectory-pooled deep convolutional networks for violence detection in videos," in lecture notes in computer science,

  30. Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst Appl

  31. Ramzan M, Abid A, Khan HU, Awan SM, Ismail A, Ahmed M, Mahmood A (2019) A review on state-of-the-art violence detection techniques. IEEE Access 7:107560–107575

    Article  Google Scholar 

  32. Rendón-Segador FJ, Álvarez-García JA, Enríquez F, Deniz O (2021) ViolenceNet: Dense multi-head self-attention with bidirectional Convolutional LSTM for detecting violence. Electronics 10(13):1601

    Article  Google Scholar 

  33. Roman DG, Chavez GC (2020) "violence detection and localization in surveillance video," in 2020 33rd SIBGRAPI conference on graphics. Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil

    Google Scholar 

  34. Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G (2018) Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans Image Process 27(10):4787–4797

    Article  MathSciNet  Google Scholar 

  35. Shi X, Chen Z, Wang H, Yeung DY (2015) "convolutional LSTM network: a machine learning approach for precipitation Nowcasting," in neural information processing systems (NIPS). Montreal, Canada

  36. Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7:39172–39179

    Article  Google Scholar 

  37. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  38. Studholme C, Hill D, Hawkes D (1999) An overlap invariant entropy measure of 3D medical image alignment. Pattern Recogn 32(1):71–86

    Article  Google Scholar 

  39. Sudhakaran S, Lanz O (2017) "Learning to detect violent videos using convolutional long short-term memory," in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS)

  40. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) "Learning spatiotemporal features with 3D convolutional networks," in 2015 IEEE international conference on computer vision (ICCV)

  41. Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  42. Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. Computer Vision – ECCV 2018:3–19

    Google Scholar 

  43. Xia Q, Zhang P, Wang J, Tian M and Fei C(2018) "real time violence detection based on deep spatio-temporal features," in biometric recognition,

  44. Xu L, Gong C, Yang J, Wu Q, Yao L (2014) "violent video detection based on MoSIFT feature and sparse coding," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)

  45. Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2015) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75(12):7327–7349

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javad Mahmoodi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmoodi, J., Nezamabadi-pour, H. & Abbasi-Moghadam, D. Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimed Tools Appl 81, 20945–20961 (2022). https://doi.org/10.1007/s11042-022-12532-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12532-9

Keywords

Navigation