Real-world malicious event recognition in CCTV recording using Quasi-3D network

Jan, Atif; Khan, Gul Muhammad

doi:10.1007/s12652-022-03702-6

Real-world malicious event recognition in CCTV recording using Quasi-3D network

Original Research
Published: 28 January 2022

Volume 14, pages 10457–10472, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

370 Accesses
4 Citations
Explore all metrics

Abstract

Identification of exact malicious instant in lengthy CCTV recordings depends solely on Auto activity cognizance. The 3D CNN has previously been explored for the analysis of motion in video streams. Studies exhibit that, using separate filters for encoding spatial and temporal information has the same level of efficiency as that of 3D convolution filters. This study presents a novel approach through introduction of independent filters for event recognition in videos. This aims at learning extended Spatio-temporal features utilizing modified ResNet architecture. A novel 2D block termed as Quasi-3D (Q3D) decouples 3D information by combining 2D filters. The proposed Quasi-3D block encodes not only the spatial information in each frame but also the relative motion of objects along the x-axis and y-axis in a set of frames. Three variations of Quasi-3D block have been introduced to emphasize more on the features for further enhancing performance. A multi-class malicious activity recognition video dataset CrimesScene (https://drive:google:com/file/d/1omiQG9sxx375HjL97DqXxIX9nnfW3oQ/view?usp=sharing) inclusive of annotated video segments from 4 different classes of volume crimes has been developed. Results exhibit that the proposed Q3D ResNet model outperforms all other variants by achieving the overall detection accuracy of \(94.9\%\) and \(93.07\%\) on Hockey Fight and CrimesScene datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

A Review of Deepfake Technology: An Emerging AI Threat

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Abdul-Aziz GA, Aly AS (2018) Trialing a smart face-recognition computer system to recognize lost people visiting the two holy mosques. Arab J Forensic Sci & Forensic Med 1(8):1120–1132
Article Google Scholar
Ainsworth T (2002) Buyer beware. Security Oz 19:18–26
Google Scholar
Altalhi S, Gutub A (2021) A survey on predictions of cyber-attacks utilizing real-time twitter tracing recognition. J Ambient Intell Human Comput 12:10209–10221
Article Google Scholar
Aly S, Gutub A (2018) Intelligent recognition system for identifying items and pilgrims. NED Univ J Res 15(2):17–23
Google Scholar
Amraee S, Vafaei A, Jamshidi K et al (2018) Anomaly detection and localization in crowded scenes using connected component analysis. Multimed Tools Appl 77(12):14767–14782
Article Google Scholar
Boiman O, Irani M (2007) Detecting irregularities in images and in video. Int J Comput Vis 74(1):17–31
Article Google Scholar
Calderara S, Heinemann U, Prati A et al (2011) Detecting anomalies in people’s trajectories using spectral graph analysis. Comput Vis Image Underst 115(8):1099–1111
Article Google Scholar
Chen D, Wactlar H, Chen My, et al (2008) Recognition of aggressive human behavior using binary local motion descriptors. In: 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, pp 5238–5241
Chen J, Hsiao J, Ho CM (2020) Residual frames with efficient pseudo-3d cnn for human action recognition. arXiv preprint arXiv:200801057
Cronje F (2015) Human action recognition with 3d convolutional neural networks. PhD thesis, University of Cape Town
De Souza FD, Chavez GC, do Valle Jr EA, et al (2010) Violence detection in video using spatio-temporal features. In: 2010 23rd SIBGRAPI Conference on Graphics. Patterns and Images, IEEE, pp 224–230
Duan L, Hu T, Cheng E, et al (2017) Deep convolutional neural networks for spatiotemporal crime prediction. In: Proceedings of the International Conference on Information and Knowledge Engineering (IKE), The Steering Committee of The World Congress in Computer Science, Computer, pp 61–67
Ermis EB, Saligrama V, Jodoin PM, et al (2008) Motion segmentation and abnormal behavior detection via behavior clustering. In: 2008 15th IEEE International Conference on Image Processing, IEEE, pp 769–772
Eyben F, Weninger F, Lehment N et al (2013) Affective video retrieval: violence detection in hollywood movies by large-scale segmental feature extraction. PloS One 8(12):e78506
Article Google Scholar
Farooqi N, Gutub A, Khozium MO (2019) Smart community challenges: enabling iot/m2m technology case study. Life Sci J 16(7):11
Google Scholar
Feichtenhofer C (2020) X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 203–213
Fenil E, Manogaran G, Vivekananda G et al (2019) Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional lstm. Comput Netw 151:191–200
Article Google Scholar
Gong D, Liu L, Le V, et al (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. arXiv preprint arXiv:190402639
Hakim NL, Shih TK, Arachchi K et al (2019) Dynamic hand gesture recognition using 3dcnn and lstm with fsm context-aware model. Sensors 19(24):5429
Article Google Scholar
Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp 1–6
Jiang F, Yuan J, Tsaftaris SA et al (2011) Anomalous video event detection using spatiotemporal context. Comput Vis Image Underst 115(3):323–333
Article Google Scholar
Kaltsa V, Briassouli A, Kompatsiaris I et al (2015) Swarm intelligence for detecting interesting events in crowded environments. IEEE Trans Image Process 24(7):2153–2166
Article MathSciNet MATH Google Scholar
Khan MUK, Park HS, Kyung CM (2018) Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur 14(2):541–556
Article Google Scholar
Kim S, Guy SJ, Hillesland K et al (2015) Velocity-based modeling of physical interactions in dense crowds. Vis Comput 31(5):541–555
Article Google Scholar
Li W, Mahadevan V, Vasconcelos N (2013) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32
Google Scholar
Lima T, Fernandes B, Barros P (2017) Human action recognition with 3d convolutional neural network. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), IEEE, pp 1–6
Liu Z, Zhang C, Tian Y (2016) 3d-based deep convolutional neural network for action recognition with depth sequences. Image Vis Comput 55:93–100
Article Google Scholar
Lu N, Wu Y, Feng L et al (2018) Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323
Article Google Scholar
Ma X, Wang H, Xue B et al (2014) Depth-based human fall detection via shape features and improved extreme learning machine. IEEE J Biomed Health Inform 18(6):1915–1922
Article Google Scholar
Mahadevan V, Li W, Bhalodia V, et al (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1975–1981
Morris BT, Trivedi MM (2011) Trajectory learning for activity understanding: unsupervised, multilevel, and long-term adaptive approach. IEEE Trans Pattern Anal Mach Intell 33(11):2287–2301
Article Google Scholar
Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No. 98CB36269), IEEE, pp 353–357
Neimark D, Bar O, Zohar M, et al (2021) Video transformer network. arXiv preprint arXiv:210200719
Nguyen NT, Phung DQ, Venkatesh S, et al (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, pp 955–960
Nievas EB, Suarez OD, Garcia GB, et al (2011) Hockey fight detection dataset. In: Computer Analysis of Images and Patterns, Springer, pp 332–339, http://visilab.etsii.uclm.es/personas/oscar/FightDetection/
Pannurat N, Thiemjarus S, Nantajeewarawat E (2014) Automatic fall monitoring: a review. Sensors 14(7):12900–12936
Article Google Scholar
Pawan P (2016) Urbanization and its causes and effects: a review. Int J Res Sci Innov 31:110–112
Google Scholar
Penet C, Demarty CH, Gravier G, Gros P (2011) Technicolor and INRIA/IRISA at MediaEval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, Multimedia Benchmark Workshop, vol 807
Google Scholar
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: proceedings of the IEEE International Conference on Computer Vision, pp 5533–5541
Ravanbakhsh M, Nabi M, Sangineto E, et al (2017) Abnormal event detection in videos using generative adversarial nets. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1577–1581
Reddy V, Sanderson C, Lovell BC (2011) Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture. In: CVPR 2011 WORKSHOPS, IEEE, pp 55–61
Ritchie H, Roser M (2018) Urbanization. Our world in data Https://ourworldindata.org/urbanization
Stergiou A, Poppe R (2021) Learn to cycle: time-consistent feature discovery for action recognition. Pattern Recogn Lett 141:1–7
Article Google Scholar
Stone EE, Skubic M (2014) Fall detection in homes of older adults using the microsoft kinect. IEEE J Biomed Health Inform 19(1):290–301
Article Google Scholar
Sudhakaran S, Lanz O (2017) Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–6
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6479–6488
Tahboub K, Reibman AR, Delp EJ (2017) Accuracy prediction for pedestrian detection. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 4192–4196
Ullah FUM, Ullah A, Muhammad K et al (2019) Violence detection using spatiotemporal features with 3d convolutional neural network. Sensors 19(11):2472
Article Google Scholar
Vilamala MR, Hiley L, Hicks Y, et al (2019) A pilot study on detecting violence in videos fusing proxy models
Wang T, Snoussi H (2014) Detection of abnormal visual events via global optical flow orientation histogram. IEEE Trans Inform Forensics Secur 9(6):988–998
Article Google Scholar
Wang J, Jiao J, Bao L, et al (2019) Self-supervised spatio-temporal representation learning for videos by predicting motion and appearance statistics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4006–4015
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 2054–2060
Xiao T, Zhang C, Zha H (2015) Learning to detect anomalies in surveillance video. IEEE Signal Process Lett 22(9):1477–1481
Article Google Scholar
Zhang C, Tian Y, Capezuti E (2012) Privacy preserving automatic fall detection for elderly using rgbd cameras. In: International Conference on Computers for Handicapped Persons, Springer, pp 625–633
Zhang X, Li Z, Change Loy C, et al (2017) Polynet: a pursuit of structural diversity in very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 718–726
Zhou S, Shen W, Zeng D et al (2016) Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process: Image Commun 47:358–368
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, University of Engineering and Technology, Peshawar, 25000, Pakistan
Atif Jan
National Center of Artificial Intelligence, University of Engineering and Technology, Peshawar, 25000, Pakistan
Gul Muhammad Khan

Authors

Atif Jan
View author publications
You can also search for this author in PubMed Google Scholar
Gul Muhammad Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atif Jan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jan, A., Khan, G.M. Real-world malicious event recognition in CCTV recording using Quasi-3D network. J Ambient Intell Human Comput 14, 10457–10472 (2023). https://doi.org/10.1007/s12652-022-03702-6

Download citation

Received: 16 April 2021
Accepted: 10 January 2022
Published: 28 January 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s12652-022-03702-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-world malicious event recognition in CCTV recording using Quasi-3D network

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

A Review of Deepfake Technology: An Emerging AI Threat

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-world malicious event recognition in CCTV recording using Quasi-3D network

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

A Review of Deepfake Technology: An Emerging AI Threat

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation