Skip to main content
Log in

An ensemble based approach for violence detection in videos using deep transfer learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The detection of violence in videos has become an extremely valuable application in real-life situations, which aim to maintain and protect people’s safety. Despite the complexities inherent in videos and the abrupt nature of violent actions, the field has seen several approaches, yet achieving consistent performance remains elusive, especially with advanced real-life datasets. Presenting a solution, the paper proposes a Bagging ensemble based approach comprising three pretrained models integrated with stacked Long Short-Term Memory (LSTM) to enhance individual model performance. This ensemble approach is rigorously analyzed on two publicly accessible datasets, RLVS and RWF-2000, providing remarkable accuracy (96.6%, 92.7%) and F1-scores (96.6%, 93.0%). Additionally, a cross-dataset analysis demonstrates the model’s ability to generalize across diverse datasets. Furthermore, a study of ablation highlighting the efficacy and optimal selection of components in augmenting the proposed ensemble’s efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

Data will be made available on request.

References

  1. Naik AJ, Gopalakrishna M (2021) Deep-violence: individual person violent activity detection in video. Multimed Tools Appl 80(12):18365–18380

    Article  Google Scholar 

  2. Zhang T, Jia W, Gong C, Sun J, Song X (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognition Lett 107:98–104

    Article  Google Scholar 

  3. Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L et al (2022) Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12009–12019

  4. Lu Y, Wang Q, Ma S, Geng T, Chen YV, Chen H, Liu D (2023) Transflow: Transformer as flow learner. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18063–18073

  5. Mishra PK, Saroha G (2016) A study on video surveillance system for object detection and tracking. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), IEEE, pp 221–226

  6. Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769

    Article  Google Scholar 

  7. Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11

    Article  Google Scholar 

  8. Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825

  9. Yan L, Ma S, Wang Q, Chen Y, Zhang X, Savakis A, Liu D (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656

    Article  Google Scholar 

  10. Yan L, Wang Q, Ma S, Wang J, Yu C (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406

    Article  Google Scholar 

  11. Wu B, Niu G, Yu J, Xiao X, Zhang J, Wu H (2022) Towards knowledge-aware video captioning via transitive visual relationship detection. IEEE Trans Circuits Syst Video Technol 32(10):6753–6765

    Article  Google Scholar 

  12. Chen W-H, Cho P-C, Jiang Y-L (2017) Activity recognition using transfer learning. Sensors & Mater vol. 29

  13. Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 463–469

  14. Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: A survey. Image Vision Comput 60:4–21

    Article  Google Scholar 

  15. Prati A, Shan C, Wang KI-K (2019) Sensors, vision and networks: From video surveillance to activity recognition and health monitoring. J Ambient Intell Smart Environ 11(1):5–22

    Google Scholar 

  16. Rendón-Segador FJ, Álvarez-García JA, Enríquez F, Deniz O (2021) Violencenet: Dense multi-head self-attention with bidirectional convolutional lstm for detecting violence. Electronics 10(13)1601

  17. Mumtaz N, Ejaz N, Habib S, Mohsin SM, Tiwari P, S. S. Band, and N. Kumar, “An overview of violence detection techniques: current challenges and future directions,” Artificial intelligence review, vol. 56, no. 5, pp. 4641–4666, 2023

  18. Choqueluque-Roman D, Camara-Chavez G (2022) Weakly supervised violence detection in surveillance video. Sensors 22(12):4502

    Article  Google Scholar 

  19. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255

  20. Oza P, Sharma P, Patel S (2023) Deep ensemble transfer learning-based framework for mammographic image classification. J Supercomput 79(7):8048–8069

    Article  Google Scholar 

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  22. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  25. Vallabhajosyula S, Sistla V, Kolli VKK (2022) Transfer learning-based deep ensemble neural network for plant leaf disease detection. J Plant Diseases Protection 129(3):545–558

    Article  Google Scholar 

  26. AlJame M, Ahmad I, Imtiaz A, Mohammed A (2020) Ensemble learning model for diagnosing covid-19 from routine blood tests. Inf Med Unlocked 21:100449

    Article  Google Scholar 

  27. Yu X, Zhang Z, Wu L, Pang W, Chen H, Yu Z, Li B (2020) Deep ensemble learning for human action recognition in still images. Complexity 2020:1–23

    Article  Google Scholar 

  28. Ganaie MA, Hu M, Malik A, Tanveer M, Suganthan P (2022) Ensemble deep learning: A review. Eng Appl Artif Intell 115:105151

    Article  Google Scholar 

  29. Sarman S, Sert M (2018) Audio based violent scene classification using ensemble learning. In: 2018 6th International symposium on digital forensic and security (ISDFS), IEEE, pp 1–5

  30. Page S, Mangalvedhekar S, Deshpande K, Chavan T, Sonawane S (2023) Mavericks at blp-2023 task 1: Ensemble-based approach using language models for violence inciting text detection. In: Proceedings of the first workshop on bangla language processing (BLP-2023), pp 190–195

  31. Zarnoufi R, Abik M (2020) Big five personality traits and ensemble machine learning to detect cyber-violence in social media. In: Innovation in Information systems and technologies to support learning research: proceedings of EMENA-ISTL 2019 3, Springer, pp 194–202

  32. Soliman MM, Kamal MH, El-Massih Nashed MA, Mostafa YM, Chawky BS, Khattab D (2019) Violence recognition from videos using deep learning techniques. In: 2019 Ninth international conference on intelligent computing and information systems (ICICIS), pp 80–85

  33. Cheng M, Cai K, Li M (2021) Rwf-2000: An open large scale video database for violence detection. In: 2020 25th International conference on pattern recognition (ICPR), pp 4183–4190

  34. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning

  35. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) conndensely ected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  36. Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings 1998 international conference on image processing. ICIP98 (Cat. No. 98CB36269), IEEE, 1:353–357

  37. Cheng W-H, Chu W-T, Wu J-L (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, pp 109–115

  38. Giannakopoulos T, Kosmopoulos D, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Advances in Artificial Intelligence: 4th Helenic Conference on AI, SETN 2006, Heraklion, Crete, Greece, May 18-20, 2006. Proceedings 4, Springer, pp 502–507

  39. Mugunga I, Dong J, Rigall E, Guo S, Madessa AH, Nawaz HS (2021) A frame-based feature model for violence detection from surveillance cameras using convlstm network. In: 2021 6th International conference on image, vision and computing (ICIVC), IEEE, pp 55–60

  40. Deepak K, Vignesh L, Chandrakala S (2020) Autocorrelation of gradients based violence detection in surveillance videos. ICT Express 6(3):155–159

    Article  Google Scholar 

  41. Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Advances in Multimedia Information Processing-PCM 2008: 9th Pacific Rim Conference on Multimedia, Tainan, Taiwan, December 9-13, 2008. Proceedings 9, Springer, pp 317–326

  42. Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst Appl 38(11):14102–14116

    Google Scholar 

  43. Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. In: 2002 International conference on pattern recognition, IEEE, 1:433–438

  44. Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Computer Analysis of Images and Patterns: 14th International Conference, CAIP 2011, Seville, Spain, August 29-31, 2011, Proceedings, Part II 14, Springer pp 332–339

  45. Xu L, Gong C, Yang J, Wu Q, Yao L (2014) Violent video detection based on mosift feature and sparse coding. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3538–3542

  46. Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: Real-time detection of violent crowd behavior. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 1–6

  47. Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vision Comput 48:37–41

    Article  Google Scholar 

  48. Mahmoodi J, Salajeghe A (2019) A classification method based on optical flow for violence detection. Expert Syst Appl 127:121–127

    Article  Google Scholar 

  49. Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75:7327–7349

    Article  Google Scholar 

  50. Bilinski P, Bremond F (2016) Human violence recognition and detection in surveillance videos. In: 2016 13th IEEE International conference on advanced video and signal based surveillance (AVSS), IEEE, pp 30–36

  51. Zhang T, Jia W, He X, Yang J (2016) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709

    Article  Google Scholar 

  52. Senst T, Eiselein V, Kuhn A, Sikora T (2017) Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation. IEEE Trans Inf Forensics Sec 12(12):2945–2956

    Article  Google Scholar 

  53. Febin I, Jayasree K, Joy PT (2020) Violence detection in videos for an intelligent surveillance system using mobsift and movement filtering algorithm. Pattern Anal Appl 23(2):611–623

    Article  Google Scholar 

  54. Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3d convolutional neural networks. In: Advances in Visual Computing: 10th international symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II 10, Springer, pp 551–558

  55. Song W, Zhang D, Zhao X, Yu J, Zheng R (2019) Wang A (2017) A novel violent video detection scheme based on modified 3d convolutional neural networks. IEEE Access 7:39172–39179

    Article  Google Scholar 

  56. Jiang B, Xu F, Tu W, Yang C (2019) Channel-wise attention in 3d convolutional networks for violence detection. In: 2019 International conference on intelligent computing and its emerging applications (ICEA), IEEE pp 59–64

  57. Ye L, Liu T, Han T, Ferdinando H, Seppänen T, Alasaarela E (2021) Campus violence detection based on artificial intelligent interpretation of surveillance video sequences. Remote Sens 13(4):628

    Article  Google Scholar 

  58. Xu X, Wu X, Wang G, Wang H (2018) Violent video classification based on spatial-temporal cues using deep learning. In: 2018 11th international symposium on computational intelligence and design (ISCID), IEEE, 1:319–322

  59. Moaaz MM, Mohamed EH (2020) Violence detection in surveillance videos using deep learning. 2(2):1–6

    Google Scholar 

  60. Halder R, Chatterjee R (2020) Cnn-bilstm model for violence detection in smart surveillance. SN Comput Sci. 1(4):201

    Article  Google Scholar 

  61. Mumtaz A, Sargano AB, Habib Z (2018) Violence detection in surveillance videos with deep network using transfer learning. In: 2018 2nd European conference on electrical engineering and computer science (EECS), pp 558–563

  62. Diethe T, Twomey N, Flach P (2016) Active transfer learning for activity recognition. In: European symposium on artificial neural networks

  63. Durães D, Santos F, Marcondes FS, Lange S, Machado J (2021) Comparison of transfer learning behaviour in violence detection with different public datasets. In: Progress in Artificial Intelligence: 20th EPIA Conference on Artificial Intelligence, EPIA 2021, Virtual Event, September 7–9, 2021, Proceedings 20, Springer, pp 290–298

  64. de Oliveira Lima JP, Figueiredo CMS (2021) A temporal fusion approach for video classification with convolutional and lstm neural networks applied to violence detection. Intel Artif 24(67):40–50

    Article  Google Scholar 

  65. Sudhakaran S, Lanz O (2017) Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–6

  66. Soliman MM, Kamal MH, Nashed MA, Mostafa YM, Chawky BS, Khattab D (2019) Violence recognition from videos using deep learning techniques. In: 2019 Ninth international conference on intelligent computing and information systems (ICICIS), IEEE, pp 80–85

  67. Butt UM, Letchmunan S, Hassan FH, Zia S, Baqir A (2020) Detecting video surveillance using vgg19 convolutional neural networks. Int J Adv Comput Sci Appl 11(2)

  68. Islam Z, Rukonuzzaman M, Ahmed R, Kabir MH, Farazi M (2021) Efficient two-stream network for violence detection using separable convolutional lstm. In: 2021 International joint conference on neural networks (IJCNN), IEEE, pp 1–8

  69. Sumon SA, Goni R, Hashem NB, Shahria T, Rahman RM (2020) Violence detection by pretrained modules with different deep learning approaches. Vietnam J Comput Sci 7(01):19–40

    Article  Google Scholar 

  70. Vijeikis R, Raudonis V, Dervinis G (2022) Efficient violence detection in surveillance. Sensors 22:2216

    Google Scholar 

  71. Yang S, Quan Z, Nie M, Yang W (2021) Transpose: Keypoint localization via transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11802–11812

  72. Abdali AR (2021) Data efficient video transformer for violence detection. In: 2021 IEEE International conference on communication, networks and satellite (COMNETSAT), IEEE, pp 195–199

  73. Constantin MG, Ionescu B (2022) Two-stage spatio-temporal vision transformer for the detection of violent scenes. In: 2022 14th International Conference on Communications (COMM), IEEE, pp 1–5

  74. Rendón-Segador FJ, Álvarez-García JA, Salazar-González JL, Tommasi T (2023) Crimenet: Neural structured learning using vision transformer for violence detection. Neural Netw 161:318–329

    Article  Google Scholar 

  75. Li C, Yang X, Liang G (2023) Keyframe-guided video swin transformer with multi-path excitation for violence detection. Computer J p bxad103

  76. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Visual Commun Image Represent 23(7):1031–1040

    Article  Google Scholar 

  77. Garg S, Singh P (2022) Transfer learning based lightweight ensemble model for imbalanced breast cancer classification. IEEE/ACM Trans Comput Biol Bioinf 20(2):1529–1539

    Article  Google Scholar 

  78. Memory LS-T (2010) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  79. Lutins E (2017) Ensemble methods in machine learning: What are they and why use them. Towards Data Sci

  80. Huszár VD, Adhikarla VK, Négyesi I, Krasznay C (2023) Toward fast and accurate violence detection for automated video surveillance applications. IEEE Access 11:18772–18793

    Article  Google Scholar 

  81. Zhou L (2022) End-to-end video violence detection with transformer. In: 2022 5th International conference on pattern recognition and artificial intelligence (PRAI), IEEE, pp 880–884

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gurmeet Kaur.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, G., Singh, S. An ensemble based approach for violence detection in videos using deep transfer learning. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19388-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19388-1

Keywords

Navigation