Skip to main content
Log in

Robust learning for real-world anomalies in surveillance videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Anomaly detection has significant importance for developing autonomous surveillance systems. Real-world anomalous events are far more complex and harder to capture due to diverse human behaviors and a wide range of anomaly types. A key factor in defining activity is the temporal length or duration of the activity. The time period required for an anomalous activity to be completely understandable and meaningful depends on the nature and speed of the event. Some events are as fast to be captured within a few frames; however, some activities are slow and may require several thousands of video frames to define an activity. Deep learning architectures have a limited input temporal sequence length and suffer from learning very long sequences. There is a need to re-investigate the problem from the frame sequences perspective to better define an activity in the limited temporal length. In this research work, our contribution is two-fold. Firstly, a novel strategy of dynamic frame-skipping is proposed for producing meaningful temporal sequences for model learning. Secondly, a new deep learning model based on the Inflated Inception network (I3D) is proposed for learning spatial and temporal information from video frames. In order to evaluate the performance of the proposed model, experiments are performed on one of the most challenging real-world anomalies UCF-Crime dataset. The results confirm that the proposed model is robust and significantly outperforms state-of-the-art methods in terms of accuracy. In addition to this, the proposed model has achieved the highest F1 score for fast and slow activities, such as explosions, road accidents, robbery, and stealing, and the AUC score of 0.837.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  2. Bai S et al (2019) Traffic anomaly detection via perspective map based on spatial-temporal information matrix. In: Proc. CVPR Workshops, pp 117–124

    Google Scholar 

  3. Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 1–8

    Google Scholar 

  4. Carreira J, Zisserman A (2017) Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

    Google Scholar 

  5. Chalapathy R, Toth E, Chawla S (2019) Group anomaly detection using deep generative models. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 11051 LNAI, pp 173–189

  6. Cheng KW, Chen YT, Fang WH (2015) Gaussian process regression-based video anomaly detection and localization with hierarchical feature representation. IEEE Trans Image Process 24(12):5288–5301

    Article  MathSciNet  MATH  Google Scholar 

  7. Chidananda K, Kumar S (2022) Human anomaly detection in surveillance videos: a review. Inf Commun Technol Compet Strateg:791–802

  8. Chong YS, Tay YH (2015) Modeling representation of videos for anomaly detection using deep learning: a review. arXiv Prepr. arXiv1505.00523

  9. Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks, pp 189–196

    Google Scholar 

  10. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3449–3456

    Google Scholar 

  11. Dhole H, Sutaone M, Vyas V (2019) Anomaly detection using convolutional spatiotemporal autoencoder. In: 2019 10th international conference on computing, communication and networking technologies, ICCCNT 2019

    Google Scholar 

  12. Dong F, Zhang Y, Nie X (2020) Dual discriminator generative adversarial network for video anomaly detection. IEEE Access 8

  13. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 580–587

    Google Scholar 

  14. Gong D et al (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE international conference on computer vision, pp 1705–1714

    Google Scholar 

  15. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 733–742

    Google Scholar 

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

    Google Scholar 

  17. He C, Shao J, Sun J (2018) An anomaly-introduced learning method for abnormal event detection. Multimed Tools Appl 77(22):29573–29588

    Article  Google Scholar 

  18. Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE international conference on computer vision

    Google Scholar 

  19. Hou R, Chen C, Shah M (2017) Tube Convolutional Neural Network (T-CNN) for action detection in videos. In: Proceedings of the IEEE international conference on computer vision, vol 2017-Octob, pp 5822–5831

    Google Scholar 

  20. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

    Google Scholar 

  21. Ionescu RT, Khan FS, Georgescu MI, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 7842–7851.

  22. Kay W et al (2017) The kinetics human action video dataset. arXiv Prepr. arXiv1705.06950

  23. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In: IEEE conference on computer vision and pattern recognition, pp 2921–2928

    Google Scholar 

  24. Kratz L, Nishino K (2009) Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: IEEE conference on computer vision and pattern recognition, pp 1446–1453

    Google Scholar 

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  26. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2556–2563

    Google Scholar 

  27. Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32

    Article  Google Scholar 

  28. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection - a new baseline. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 6536–6545

    Google Scholar 

  29. Liu Y, Liu J, Lin J, Zhao M, Song L (2022) Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circuits Syst. II Express Briefs

  30. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 FPS in MATLAB. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727

    Google Scholar 

  31. Luo W, Liu W, Gao S (2017) Remembering history with convolutional LSTM for anomaly detection. In: IEEE International Conference on Multimedia and Expo (ICME), pp 439–444

    Chapter  Google Scholar 

  32. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349

    Google Scholar 

  33. Maqsood R, Bajwa UI, Saleem G, Raza RH, Anwar MW (2021) Anomaly recognition from surveillance videos using 3D convolution neural network. Multimed Tools Appl 80(12):18693–18716

    Article  Google Scholar 

  34. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: IEEE conference on computer vision and pattern recognition, pp 935–942

    Google Scholar 

  35. Mumtaz A, Sargano AB, Habib Z (2018) Violence detection in surveillance videos with deep network using transfer learning. In: 2nd European Conference on Electrical Engineering and Computer Science (EECS), pp 558–563

    Google Scholar 

  36. Mumtaz A, Sargano AB, Habib Z (2020) Fast learning through deep multi-net CNN model for violence recognition in video surveillance

  37. Narasimhan MG, Sowmya Kamath S (2018) Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed Tools Appl 77(11):13173–13195

    Article  Google Scholar 

  38. Nayak R, Pati UC, Das SK (2020) A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis Comput 106:104078

    Article  Google Scholar 

  39. Ramachandra B, Jones M (2020) Street scene: a new dataset and evaluation protocol for video anomaly detection. In: The IEEE winter conference on applications of computer vision, pp 2569–2578

    Google Scholar 

  40. Ramachandra B, Jones MJ, Vatsavai RR (2020) A survey of single-scene video anomaly detection. IEEE Trans Pattern Anal Mach Intell 44:1–18

    Article  Google Scholar 

  41. Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) Abnormal event detection in videos using generative adversarial nets. In: Proceedings - International Conference on Image Processing, ICIP, pp 1577–1581

    Google Scholar 

  42. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004

    Article  MathSciNet  MATH  Google Scholar 

  43. Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97

    Article  MATH  Google Scholar 

  44. Saligrama V, Konrad J, Jodoin PM (2010) Video anomaly identification. IEEE Signal Process Mag 27:18–33

    Article  Google Scholar 

  45. Sargano AB, Angelov P, Habib Z (2016) Human action recognition from multiple views based on view-invariant feature descriptor using support vector machines. Appl Sci 6(10):309

    Article  Google Scholar 

  46. Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp 463–469

    Chapter  Google Scholar 

  47. Sargano A, Angelov P, Habib Z (2017) A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl Sci 7(1):110

    Article  Google Scholar 

  48. Se SAP, Ravanbakhsh M, Nabi M, Mousavi H, Sangineto E, Sebe N (2018) Plug-and-play CNN for crowd motion analysis: An application in abnormal event detection. In: Proceedings - 2018 IEEE winter conference on applications of computer vision, WACV 2018

    Google Scholar 

  49. Shah AP, Lamare JB, Nguyen-Anh T, Hauptmann A (2019) CADP: a novel dataset for CCTV traffic camera based accident analysis. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–9

    Google Scholar 

  50. Shao J, Loy C-C, Kang K, Wang X (2016) Slicing convolutional neural network for crowd video understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5628

    Google Scholar 

  51. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos, pp 1–9

  52. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, pp 1–14

  53. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv Prepr. arXiv1212.0402

  54. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 6479–6488

    Google Scholar 

  55. Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130

    Article  Google Scholar 

  56. Tian Y, Pang G, Chen Y, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4975–4986

    Google Scholar 

  57. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

    Google Scholar 

  58. Ullah H, Ullah M, Conci N (2014) Dominant motion analysis in regular and irregular crowd scenes. In: International workshop on human behavior understanding, pp 62–72

    Chapter  Google Scholar 

  59. Ullah W, Ullah A, Hussain T, Khan ZA, Baik SW (2021) An efficient anomaly recognition framework using an attention residual lstm in surveillance videos. Sensors 21(8):2811

    Article  Google Scholar 

  60. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103

    Chapter  Google Scholar 

  61. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(12):3371–3408

    MathSciNet  MATH  Google Scholar 

  62. Wang G, Yuan X, Zhang A, Hsu H-M, Hwang J-N (2019) Anomaly candidate identification and starting time estimation of vehicles from traffic videos. In: AI City Challenge Workshop, IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Conference, Long Beach, California, pp 382–390

    Google Scholar 

  63. Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. In: In British Machine Vision Conference (BMVC), pp 1–3

    Google Scholar 

  64. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127

    Article  Google Scholar 

  65. Ye M, Peng X, Gan W, Wu W, Qiao Y (2019) AnoPCN: Video anomaly detection via deep predictive coding network. In: Proceedings of the 27th ACM international conference on multimedia, pp 1805–1813

    Chapter  Google Scholar 

  66. Yuan FN, Zhang L, Shi JT, Xia X, Li G (2019) Theories and applications of auto-encoder neural networks: a literature survey. Jisuanji Xuebao/Chinese J Comput 42(1):203–230

    Google Scholar 

  67. Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3313–3320

    Google Scholar 

  68. Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua XS (2017) Spatio-temporal AutoEncoder for video anomaly detection. Proceedings of the 25th ACM international conference on multimedia, pp 1933–1941

  69. Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1237–1246

    Google Scholar 

  70. Zhu Y, Newsam S (2019) Motion-aware feature for improved video anomaly detection 30th Br. Mach. Vis. Conf. 2019, BMVC 2019

  71. Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. IEEE J Sel Top Signal Process 7(1):91–101

    Article  Google Scholar 

  72. Zhu S, Chen C, Sultani W (2020) Video anomaly detection for smart surveillance. arXiv Prepr. arXiv2004.00222

Download references

Funding

This research is supported by the PDE-GIR project, which has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 778035.

Author information

Authors and Affiliations

Authors

Contributions

A.M., A.B.S. and Z.H. conceived and designed the research direction; A.M. proposed/implemented methodology and performed the research experiments; A.B.S. and Z.H. analyzed the data; A.B.S. and A.M. contributed reagents/materials/analysis tools; A.M. wrote the research paper. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Allah Bux Sargano or Zulfiqar Habib.

Ethics declarations

Institutional review board statement

Not applicable.

Informed consent

Not applicable.

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mumtaz, A., Sargano, A.B. & Habib, Z. Robust learning for real-world anomalies in surveillance videos. Multimed Tools Appl 82, 20303–20322 (2023). https://doi.org/10.1007/s11042-023-14425-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14425-x

Keywords

Navigation