An ensemble based approach for violence detection in videos using deep transfer learning

Kaur, Gurmeet; Singh, Sarbjeet

doi:10.1007/s11042-024-19388-1

An ensemble based approach for violence detection in videos using deep transfer learning

Published: 20 May 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gurmeet Kaur¹ &
Sarbjeet Singh¹

45 Accesses
Explore all metrics

Abstract

The detection of violence in videos has become an extremely valuable application in real-life situations, which aim to maintain and protect people’s safety. Despite the complexities inherent in videos and the abrupt nature of violent actions, the field has seen several approaches, yet achieving consistent performance remains elusive, especially with advanced real-life datasets. Presenting a solution, the paper proposes a Bagging ensemble based approach comprising three pretrained models integrated with stacked Long Short-Term Memory (LSTM) to enhance individual model performance. This ensemble approach is rigorously analyzed on two publicly accessible datasets, RLVS and RWF-2000, providing remarkable accuracy (96.6%, 92.7%) and F1-scores (96.6%, 93.0%). Additionally, a cross-dataset analysis demonstrates the model’s ability to generalize across diverse datasets. Furthermore, a study of ablation highlighting the efficacy and optimal selection of components in augmenting the proposed ensemble’s efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfake video detection: challenges and opportunities

Article Open access 29 May 2024

A review on the long short-term memory model

Article 13 May 2020

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Article 27 May 2024

Data Availability

Data will be made available on request.

References

Naik AJ, Gopalakrishna M (2021) Deep-violence: individual person violent activity detection in video. Multimed Tools Appl 80(12):18365–18380
Article Google Scholar
Zhang T, Jia W, Gong C, Sun J, Song X (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognition Lett 107:98–104
Article Google Scholar
Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L et al (2022) Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12009–12019
Lu Y, Wang Q, Ma S, Geng T, Chen YV, Chen H, Liu D (2023) Transflow: Transformer as flow learner. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18063–18073
Mishra PK, Saroha G (2016) A study on video surveillance system for object detection and tracking. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), IEEE, pp 221–226
Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769
Article Google Scholar
Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Article Google Scholar
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825
Yan L, Ma S, Wang Q, Chen Y, Zhang X, Savakis A, Liu D (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656
Article Google Scholar
Yan L, Wang Q, Ma S, Wang J, Yu C (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406
Article Google Scholar
Wu B, Niu G, Yu J, Xiao X, Zhang J, Wu H (2022) Towards knowledge-aware video captioning via transitive visual relationship detection. IEEE Trans Circuits Syst Video Technol 32(10):6753–6765
Article Google Scholar
Chen W-H, Cho P-C, Jiang Y-L (2017) Activity recognition using transfer learning. Sensors & Mater vol. 29
Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 463–469
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: A survey. Image Vision Comput 60:4–21
Article Google Scholar
Prati A, Shan C, Wang KI-K (2019) Sensors, vision and networks: From video surveillance to activity recognition and health monitoring. J Ambient Intell Smart Environ 11(1):5–22
Google Scholar
Rendón-Segador FJ, Álvarez-García JA, Enríquez F, Deniz O (2021) Violencenet: Dense multi-head self-attention with bidirectional convolutional lstm for detecting violence. Electronics 10(13)1601
Mumtaz N, Ejaz N, Habib S, Mohsin SM, Tiwari P, S. S. Band, and N. Kumar, “An overview of violence detection techniques: current challenges and future directions,” Artificial intelligence review, vol. 56, no. 5, pp. 4641–4666, 2023
Choqueluque-Roman D, Camara-Chavez G (2022) Weakly supervised violence detection in surveillance video. Sensors 22(12):4502
Article Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Oza P, Sharma P, Patel S (2023) Deep ensemble transfer learning-based framework for mammographic image classification. J Supercomput 79(7):8048–8069
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Vallabhajosyula S, Sistla V, Kolli VKK (2022) Transfer learning-based deep ensemble neural network for plant leaf disease detection. J Plant Diseases Protection 129(3):545–558
Article Google Scholar
AlJame M, Ahmad I, Imtiaz A, Mohammed A (2020) Ensemble learning model for diagnosing covid-19 from routine blood tests. Inf Med Unlocked 21:100449
Article Google Scholar
Yu X, Zhang Z, Wu L, Pang W, Chen H, Yu Z, Li B (2020) Deep ensemble learning for human action recognition in still images. Complexity 2020:1–23
Article Google Scholar
Ganaie MA, Hu M, Malik A, Tanveer M, Suganthan P (2022) Ensemble deep learning: A review. Eng Appl Artif Intell 115:105151
Article Google Scholar
Sarman S, Sert M (2018) Audio based violent scene classification using ensemble learning. In: 2018 6th International symposium on digital forensic and security (ISDFS), IEEE, pp 1–5
Page S, Mangalvedhekar S, Deshpande K, Chavan T, Sonawane S (2023) Mavericks at blp-2023 task 1: Ensemble-based approach using language models for violence inciting text detection. In: Proceedings of the first workshop on bangla language processing (BLP-2023), pp 190–195
Zarnoufi R, Abik M (2020) Big five personality traits and ensemble machine learning to detect cyber-violence in social media. In: Innovation in Information systems and technologies to support learning research: proceedings of EMENA-ISTL 2019 3, Springer, pp 194–202
Soliman MM, Kamal MH, El-Massih Nashed MA, Mostafa YM, Chawky BS, Khattab D (2019) Violence recognition from videos using deep learning techniques. In: 2019 Ninth international conference on intelligent computing and information systems (ICICIS), pp 80–85
Cheng M, Cai K, Li M (2021) Rwf-2000: An open large scale video database for violence detection. In: 2020 25th International conference on pattern recognition (ICPR), pp 4183–4190
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) conndensely ected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings 1998 international conference on image processing. ICIP98 (Cat. No. 98CB36269), IEEE, 1:353–357
Cheng W-H, Chu W-T, Wu J-L (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, pp 109–115
Giannakopoulos T, Kosmopoulos D, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Advances in Artificial Intelligence: 4th Helenic Conference on AI, SETN 2006, Heraklion, Crete, Greece, May 18-20, 2006. Proceedings 4, Springer, pp 502–507
Mugunga I, Dong J, Rigall E, Guo S, Madessa AH, Nawaz HS (2021) A frame-based feature model for violence detection from surveillance cameras using convlstm network. In: 2021 6th International conference on image, vision and computing (ICIVC), IEEE, pp 55–60
Deepak K, Vignesh L, Chandrakala S (2020) Autocorrelation of gradients based violence detection in surveillance videos. ICT Express 6(3):155–159
Article Google Scholar
Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Advances in Multimedia Information Processing-PCM 2008: 9th Pacific Rim Conference on Multimedia, Tainan, Taiwan, December 9-13, 2008. Proceedings 9, Springer, pp 317–326
Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst Appl 38(11):14102–14116
Google Scholar
Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. In: 2002 International conference on pattern recognition, IEEE, 1:433–438
Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Computer Analysis of Images and Patterns: 14th International Conference, CAIP 2011, Seville, Spain, August 29-31, 2011, Proceedings, Part II 14, Springer pp 332–339
Xu L, Gong C, Yang J, Wu Q, Yao L (2014) Violent video detection based on mosift feature and sparse coding. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3538–3542
Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: Real-time detection of violent crowd behavior. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 1–6
Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vision Comput 48:37–41
Article Google Scholar
Mahmoodi J, Salajeghe A (2019) A classification method based on optical flow for violence detection. Expert Syst Appl 127:121–127
Article Google Scholar
Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75:7327–7349
Article Google Scholar
Bilinski P, Bremond F (2016) Human violence recognition and detection in surveillance videos. In: 2016 13th IEEE International conference on advanced video and signal based surveillance (AVSS), IEEE, pp 30–36
Zhang T, Jia W, He X, Yang J (2016) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709
Article Google Scholar
Senst T, Eiselein V, Kuhn A, Sikora T (2017) Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation. IEEE Trans Inf Forensics Sec 12(12):2945–2956
Article Google Scholar
Febin I, Jayasree K, Joy PT (2020) Violence detection in videos for an intelligent surveillance system using mobsift and movement filtering algorithm. Pattern Anal Appl 23(2):611–623
Article Google Scholar
Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3d convolutional neural networks. In: Advances in Visual Computing: 10th international symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II 10, Springer, pp 551–558
Song W, Zhang D, Zhao X, Yu J, Zheng R (2019) Wang A (2017) A novel violent video detection scheme based on modified 3d convolutional neural networks. IEEE Access 7:39172–39179
Article Google Scholar
Jiang B, Xu F, Tu W, Yang C (2019) Channel-wise attention in 3d convolutional networks for violence detection. In: 2019 International conference on intelligent computing and its emerging applications (ICEA), IEEE pp 59–64
Ye L, Liu T, Han T, Ferdinando H, Seppänen T, Alasaarela E (2021) Campus violence detection based on artificial intelligent interpretation of surveillance video sequences. Remote Sens 13(4):628
Article Google Scholar
Xu X, Wu X, Wang G, Wang H (2018) Violent video classification based on spatial-temporal cues using deep learning. In: 2018 11th international symposium on computational intelligence and design (ISCID), IEEE, 1:319–322
Moaaz MM, Mohamed EH (2020) Violence detection in surveillance videos using deep learning. 2(2):1–6
Google Scholar
Halder R, Chatterjee R (2020) Cnn-bilstm model for violence detection in smart surveillance. SN Comput Sci. 1(4):201
Article Google Scholar
Mumtaz A, Sargano AB, Habib Z (2018) Violence detection in surveillance videos with deep network using transfer learning. In: 2018 2nd European conference on electrical engineering and computer science (EECS), pp 558–563
Diethe T, Twomey N, Flach P (2016) Active transfer learning for activity recognition. In: European symposium on artificial neural networks
Durães D, Santos F, Marcondes FS, Lange S, Machado J (2021) Comparison of transfer learning behaviour in violence detection with different public datasets. In: Progress in Artificial Intelligence: 20th EPIA Conference on Artificial Intelligence, EPIA 2021, Virtual Event, September 7–9, 2021, Proceedings 20, Springer, pp 290–298
de Oliveira Lima JP, Figueiredo CMS (2021) A temporal fusion approach for video classification with convolutional and lstm neural networks applied to violence detection. Intel Artif 24(67):40–50
Article Google Scholar
Sudhakaran S, Lanz O (2017) Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–6
Soliman MM, Kamal MH, Nashed MA, Mostafa YM, Chawky BS, Khattab D (2019) Violence recognition from videos using deep learning techniques. In: 2019 Ninth international conference on intelligent computing and information systems (ICICIS), IEEE, pp 80–85
Butt UM, Letchmunan S, Hassan FH, Zia S, Baqir A (2020) Detecting video surveillance using vgg19 convolutional neural networks. Int J Adv Comput Sci Appl 11(2)
Islam Z, Rukonuzzaman M, Ahmed R, Kabir MH, Farazi M (2021) Efficient two-stream network for violence detection using separable convolutional lstm. In: 2021 International joint conference on neural networks (IJCNN), IEEE, pp 1–8
Sumon SA, Goni R, Hashem NB, Shahria T, Rahman RM (2020) Violence detection by pretrained modules with different deep learning approaches. Vietnam J Comput Sci 7(01):19–40
Article Google Scholar
Vijeikis R, Raudonis V, Dervinis G (2022) Efficient violence detection in surveillance. Sensors 22:2216
Google Scholar
Yang S, Quan Z, Nie M, Yang W (2021) Transpose: Keypoint localization via transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11802–11812
Abdali AR (2021) Data efficient video transformer for violence detection. In: 2021 IEEE International conference on communication, networks and satellite (COMNETSAT), IEEE, pp 195–199
Constantin MG, Ionescu B (2022) Two-stage spatio-temporal vision transformer for the detection of violent scenes. In: 2022 14th International Conference on Communications (COMM), IEEE, pp 1–5
Rendón-Segador FJ, Álvarez-García JA, Salazar-González JL, Tommasi T (2023) Crimenet: Neural structured learning using vision transformer for violence detection. Neural Netw 161:318–329
Article Google Scholar
Li C, Yang X, Liang G (2023) Keyframe-guided video swin transformer with multi-path excitation for violence detection. Computer J p bxad103
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Visual Commun Image Represent 23(7):1031–1040
Article Google Scholar
Garg S, Singh P (2022) Transfer learning based lightweight ensemble model for imbalanced breast cancer classification. IEEE/ACM Trans Comput Biol Bioinf 20(2):1529–1539
Article Google Scholar
Memory LS-T (2010) Long short-term memory. Neural Comput 9(8):1735–1780
Google Scholar
Lutins E (2017) Ensemble methods in machine learning: What are they and why use them. Towards Data Sci
Huszár VD, Adhikarla VK, Négyesi I, Krasznay C (2023) Toward fast and accurate violence detection for automated video surveillance applications. IEEE Access 11:18772–18793
Article Google Scholar
Zhou L (2022) End-to-end video violence detection with transformer. In: 2022 5th International conference on pattern recognition and artificial intelligence (PRAI), IEEE, pp 880–884

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, UIET, Panjab University, Chandigarh, India
Gurmeet Kaur & Sarbjeet Singh

Authors

Gurmeet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Sarbjeet Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gurmeet Kaur.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kaur, G., Singh, S. An ensemble based approach for violence detection in videos using deep transfer learning. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19388-1

Download citation

Received: 24 February 2024
Revised: 02 May 2024
Accepted: 07 May 2024
Published: 20 May 2024
DOI: https://doi.org/10.1007/s11042-024-19388-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble based approach for violence detection in videos using deep transfer learning

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

A review on the long short-term memory model

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An ensemble based approach for violence detection in videos using deep transfer learning

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

A review on the long short-term memory model

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation