Skip to main content
Log in

STemGAN: spatio-temporal generative adversarial network for video anomaly detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adversarial Network (STemGAN). This framework consists of a generator and discriminator that learns from the video context, utilizing both spatial and temporal information to predict future frames. The generator follows an Autoencoder (AE) architecture, having a dual-stream encoder for extracting appearance and motion information, and a decoder having a Channel Attention (CA) module to focus on dynamic foreground features. In addition, we provide a transfer-learning method that enhances the generalizability of STemGAN. We use benchmark Anomaly Detection (AD) datasets to compare the performance of our approach with the existing state-of-the-art approaches using standard evaluation metrics, i.e., AUC (Area Under Curve) and EER (Equal Error Rate). The empirical results show that our proposed STemGAN outperforms the existing state-of-the-art methods achieving an AUC score of 97.5% on UCSDPed2, 86.0% on CUHK Avenue, 90.4% on Subway-entrance, and 95.2% on Subway-exit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets used in this work are publicly available datasets.

References

  1. Li W, Mahadevan V, Vasconcelos N (2013) Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence 36(1):18–32

    Google Scholar 

  2. Ramachandra B, Jones M, Vatsavai RR (2020) A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelligence

  3. Xia X, Pan X, Li N, He X, Ma L, Zhang X, Ding N (2022) Gan-based anomaly detection: A review. Neurocomputing

  4. Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2054–2060. IEEE

  5. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1975–1981. IEEE

  6. Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 2112–2119. IEEE

  7. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 Conference on Computer Vision and Pattern Recognition, pp 2921–2928. IEEE

  8. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011, pp 3449–3456. IEEE

  9. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2720–2727

  10. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, pp 428–441. Springer

  11. Pan Y (2016) Heading toward artificial intelligence 2.0. Engineering 2(4):409–413

    Article  Google Scholar 

  12. Xing EP, Ho Q, Xie P, Wei D (2016) Strategies and principles of distributed machine learning on big data. Engineering 2(2):179–195

    Article  Google Scholar 

  13. Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 806–813

  14. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  15. Zhao Z-Q, Zheng P, Xu S-t, Wu X (2019) Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems 30(11):3212–3232

    Article  Google Scholar 

  16. Shen Y, Ji R, Wang C, Li X, Li X (2018) Weakly supervised object detection via object-specific pixel gradient. IEEE transactions on neural networks and learning systems 29(12):5960–5970

    Article  Google Scholar 

  17. Wan Z, He H (2017) Weakly supervised object localization with deep convolutional neural network based on spatial pyramid saliency map. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 4177–4181. IEEE

  18. Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35(1):221–231

    Article  Google Scholar 

  19. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27

  20. Chen X, Weng J, Lu W, Xu J, Weng J (2017) Deep manifold learning combined with convolutional neural networks for action recognition. IEEE transactions on neural networks and learning systems 29(9):3938–3952

    Article  Google Scholar 

  21. Mao X, Shen C, Yang Y-B (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Advances in neural information processing systems 29

  22. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742

  23. Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, pp 189–196. Springer

  24. Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE transactions on pattern analysis and machine intelligence 43(3):1070–1084

    Article  Google Scholar 

  25. Sabokrou M, Fathy M, Hoseini M (2016) Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron Lett 52(13):1122–1124

    Article  Google Scholar 

  26. Tran HT, Hogg D (2017) Anomaly detection using a convolutional winner-take-all autoencoder. In: Proceedings of the British Machine Vision Conference 2017. British Machine Vision Association

  27. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622

    Article  MathSciNet  Google Scholar 

  28. Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897

    Article  MathSciNet  Google Scholar 

  29. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4681–4690

  30. Wu P, Liu J, Shen F (2019) A deep one-class neural network for anomalous event detection in complex scenes. IEEE transactions on neural networks and learning systems 31(7):2609– 2622

  31. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1125–1134

  32. Yu J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems

  33. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545

  34. Bird N, Atev S, Caramelli N, Martin R, Masoud O, Papanikolopoulos N (2006) Real time, online detection of abandoned objects in public areas. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp 3775–3780. IEEE

  35. Fan Y, Wen G, Li D, Qiu S, Levine MD, Xiao F (2020) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comp Vision Image Underst 195:102920

    Article  Google Scholar 

  36. Li N, Chang F (2019) Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder. Neurocomputing 369:92–105

    Article  Google Scholar 

  37. Li N, Chang F, Liu C (2020) Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes. IEEE Transactions on Multimedia 23:203–215

    Article  Google Scholar 

  38. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732

  39. Lin J, Gan C, Han S (2019) Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7083–7093

  40. Li Y, Cai Y, Liu J, Lang S, Zhang X (2019) Spatio-temporal unity networking for video anomaly detection. IEEE Access 7:172425–172432

    Article  Google Scholar 

  41. Lu Y, Kumar KM, shahabeddin Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16Th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–8. IEEE

  42. Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14(10):2537–2550

    Article  Google Scholar 

  43. Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M (2021) A survey on long short-term memory networks for time series prediction. Procedia CIRP 99:650–655

    Article  Google Scholar 

  44. Wu Y, He F, Zhang D, Li X (2015) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353

    Article  Google Scholar 

  45. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 28

  46. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR

  47. Woo S, Park J., Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  48. Zhou JT, Zhang L, Fang Z, Du J, Peng X, Xiao Y (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology 30(12):4639–4647

    Article  Google Scholar 

  49. Bi H-B, Lu D, Zhu H-H, Yang L-N, Guan H-P (2021) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell 51:3450–3459

    Article  Google Scholar 

  50. Li Y, Guo K, Lu Y, Liu L (2021) Cropping and attention based approach for masked face recognition. Appl Intell 51:3012–3025

    Article  Google Scholar 

  51. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 935–942. IEEE

  52. Benezeth Y, Jodoin P-M, Saligrama V, Rosenberger C (2009) Abnormal events detection based on spatio-temporal co-occurences. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 2458–2465. IEEE

  53. Nayak R, Pati UC, Das SK (2021) A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis Comput 106:104078

    Article  Google Scholar 

  54. Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics 16(1):393–402

    Article  Google Scholar 

  55. Schlegl T, Seeböck P, Waldstein SM, Langs G, Schmidt-Erfurth U (2019) f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 54:30–44

    Article  Google Scholar 

  56. Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recognit 109335

  57. Wei H, Li K, Li H, Lyu Y, Hu X (2019) Detecting video anomaly with a stacked convolutional lstm framework. In: International Conference on Computer Vision Systems, pp 330–342. Springer

  58. Doshi K, Yilmaz Y (2022) Rethinking video anomaly detection-a continual learning approach. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3961–3970

  59. Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European Conference on Computer Vision, pp 329–345. Springer

  60. Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2020) Multi-encoder towards effective anomaly detection in videos. IEEE Transactions on Multimedia 23:4106–4116

    Article  Google Scholar 

  61. Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X-S (2017) Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1933–1941

  62. Li D, Nie X, Li X, Zhang Y, Yin Y (2022) Context-related video anomaly detection via generative adversarial network. Pattern Recogn Lett 156:183–189

    Article  Google Scholar 

  63. Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recognit 114:107865

    Article  Google Scholar 

  64. Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit 121:108232

    Article  Google Scholar 

  65. Li C, Li H, Zhang G (2023) Future frame prediction based on generative assistant discriminative network for anomaly detection. Appl Intell 53(1):542–559

    Article  Google Scholar 

  66. Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440

  67. Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133

    Article  Google Scholar 

  68. Lin J, Gan C, Han S (2018) Temporal shift module for efficient video understanding. CoRR abs/1811.08383 (1811)

  69. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  70. Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301

  71. Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European Conference on Computer Vision, pp 702–716. Springer

  72. Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. Advances in neural information processing systems 28

  73. Lu Y, Yu F, Reddy MKK, Wang Y (2020) Few-shot scene-adaptive anomaly detection. In: European Conference on Computer Vision, pp 125–141. Springer

  74. Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2018) Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222

  75. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE transactions on pattern analysis and machine intelligence 30(3):555–560

    Article  Google Scholar 

  76. Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp 3313–3320. IEEE

  77. Le V-T, Kim Y-G (2022) Attention-based residual autoencoder for video anomaly detection. Appl Intell 1–15

  78. Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) Abnormal event detection in videos using generative adversarial nets. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1577–1581. IEEE

  79. Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130

    Article  Google Scholar 

  80. Yang Y, Zhan D, Yang F, Zhou X-D, Yan Y, Wang Y (2020) Improving video anomaly detection performance with patch-level loss and segmentation map. In: 2020 IEEE 6th International Conference on Computer and Communications (ICCC), pp 1832–1839. IEEE

  81. Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 481–490

  82. Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714

  83. Deepak K, Chandrakala S, Mohan CK (2021) Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1):215–222

    Article  Google Scholar 

  84. Ravanbakhsh M, Sangineto E, Nabi M, Sebe N (2019) Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1896–1904. IEEE

  85. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 439–444. IEEE

  86. Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2895–2903

  87. Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1951–1960. https://doi.org/10.1109/WACV.2019.00212

  88. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comp Vision Image Underst 156:117–127

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rituraj Singh, Krishanu Saini or Anikeit Sethi.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Krishanu Saini and Anikeit Sethi contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R., Saini, K., Sethi, A. et al. STemGAN: spatio-temporal generative adversarial network for video anomaly detection. Appl Intell 53, 28133–28152 (2023). https://doi.org/10.1007/s10489-023-04940-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04940-7

Keywords

Navigation