STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Singh, Rituraj; Saini, Krishanu; Sethi, Anikeit; Tiwari, Aruna; Saurav, Sumeet; Singh, Sanjay

doi:10.1007/s10489-023-04940-7

STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Published: 22 September 2023

Volume 53, pages 28133–28152, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

941 Accesses
2 Citations
Explore all metrics

Abstract

Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adversarial Network (STemGAN). This framework consists of a generator and discriminator that learns from the video context, utilizing both spatial and temporal information to predict future frames. The generator follows an Autoencoder (AE) architecture, having a dual-stream encoder for extracting appearance and motion information, and a decoder having a Channel Attention (CA) module to focus on dynamic foreground features. In addition, we provide a transfer-learning method that enhances the generalizability of STemGAN. We use benchmark Anomaly Detection (AD) datasets to compare the performance of our approach with the existing state-of-the-art approaches using standard evaluation metrics, i.e., AUC (Area Under Curve) and EER (Equal Error Rate). The empirical results show that our proposed STemGAN outperforms the existing state-of-the-art methods achieving an AUC score of 97.5% on UCSDPed2, 86.0% on CUHK Avenue, 90.4% on Subway-entrance, and 95.2% on Subway-exit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis

Article 27 August 2022

Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model

Article 30 July 2022

VALD-GAN: video anomaly detection using latent discriminator augmented GAN

Article 18 October 2023

Data Availability

The datasets used in this work are publicly available datasets.

References

Li W, Mahadevan V, Vasconcelos N (2013) Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence 36(1):18–32
Google Scholar
Ramachandra B, Jones M, Vatsavai RR (2020) A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelligence
Xia X, Pan X, Li N, He X, Ma L, Zhang X, Ding N (2022) Gan-based anomaly detection: A review. Neurocomputing
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2054–2060. IEEE
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1975–1981. IEEE
Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 2112–2119. IEEE
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 Conference on Computer Vision and Pattern Recognition, pp 2921–2928. IEEE
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011, pp 3449–3456. IEEE
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2720–2727
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, pp 428–441. Springer
Pan Y (2016) Heading toward artificial intelligence 2.0. Engineering 2(4):409–413
Article Google Scholar
Xing EP, Ho Q, Xie P, Wei D (2016) Strategies and principles of distributed machine learning on big data. Engineering 2(2):179–195
Article Google Scholar
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 806–813
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Zhao Z-Q, Zheng P, Xu S-t, Wu X (2019) Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems 30(11):3212–3232
Article Google Scholar
Shen Y, Ji R, Wang C, Li X, Li X (2018) Weakly supervised object detection via object-specific pixel gradient. IEEE transactions on neural networks and learning systems 29(12):5960–5970
Article Google Scholar
Wan Z, He H (2017) Weakly supervised object localization with deep convolutional neural network based on spatial pyramid saliency map. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 4177–4181. IEEE
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35(1):221–231
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27
Chen X, Weng J, Lu W, Xu J, Weng J (2017) Deep manifold learning combined with convolutional neural networks for action recognition. IEEE transactions on neural networks and learning systems 29(9):3938–3952
Article Google Scholar
Mao X, Shen C, Yang Y-B (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Advances in neural information processing systems 29
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, pp 189–196. Springer
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE transactions on pattern analysis and machine intelligence 43(3):1070–1084
Article Google Scholar
Sabokrou M, Fathy M, Hoseini M (2016) Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron Lett 52(13):1122–1124
Article Google Scholar
Tran HT, Hogg D (2017) Anomaly detection using a convolutional winner-take-all autoencoder. In: Proceedings of the British Machine Vision Conference 2017. British Machine Vision Association
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622
Article MathSciNet Google Scholar
Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897
Article MathSciNet Google Scholar
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4681–4690
Wu P, Liu J, Shen F (2019) A deep one-class neural network for anomalous event detection in complex scenes. IEEE transactions on neural networks and learning systems 31(7):2609– 2622
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1125–1134
Yu J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545
Bird N, Atev S, Caramelli N, Martin R, Masoud O, Papanikolopoulos N (2006) Real time, online detection of abandoned objects in public areas. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp 3775–3780. IEEE
Fan Y, Wen G, Li D, Qiu S, Levine MD, Xiao F (2020) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comp Vision Image Underst 195:102920
Article Google Scholar
Li N, Chang F (2019) Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder. Neurocomputing 369:92–105
Article Google Scholar
Li N, Chang F, Liu C (2020) Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes. IEEE Transactions on Multimedia 23:203–215
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732
Lin J, Gan C, Han S (2019) Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7083–7093
Li Y, Cai Y, Liu J, Lang S, Zhang X (2019) Spatio-temporal unity networking for video anomaly detection. IEEE Access 7:172425–172432
Article Google Scholar
Lu Y, Kumar KM, shahabeddin Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16Th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–8. IEEE
Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14(10):2537–2550
Article Google Scholar
Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M (2021) A survey on long short-term memory networks for time series prediction. Procedia CIRP 99:650–655
Article Google Scholar
Wu Y, He F, Zhang D, Li X (2015) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353
Article Google Scholar
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 28
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR
Woo S, Park J., Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Zhou JT, Zhang L, Fang Z, Du J, Peng X, Xiao Y (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology 30(12):4639–4647
Article Google Scholar
Bi H-B, Lu D, Zhu H-H, Yang L-N, Guan H-P (2021) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell 51:3450–3459
Article Google Scholar
Li Y, Guo K, Lu Y, Liu L (2021) Cropping and attention based approach for masked face recognition. Appl Intell 51:3012–3025
Article Google Scholar
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 935–942. IEEE
Benezeth Y, Jodoin P-M, Saligrama V, Rosenberger C (2009) Abnormal events detection based on spatio-temporal co-occurences. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 2458–2465. IEEE
Nayak R, Pati UC, Das SK (2021) A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis Comput 106:104078
Article Google Scholar
Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics 16(1):393–402
Article Google Scholar
Schlegl T, Seeböck P, Waldstein SM, Langs G, Schmidt-Erfurth U (2019) f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 54:30–44
Article Google Scholar
Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recognit 109335
Wei H, Li K, Li H, Lyu Y, Hu X (2019) Detecting video anomaly with a stacked convolutional lstm framework. In: International Conference on Computer Vision Systems, pp 330–342. Springer
Doshi K, Yilmaz Y (2022) Rethinking video anomaly detection-a continual learning approach. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3961–3970
Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European Conference on Computer Vision, pp 329–345. Springer
Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2020) Multi-encoder towards effective anomaly detection in videos. IEEE Transactions on Multimedia 23:4106–4116
Article Google Scholar
Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X-S (2017) Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1933–1941
Li D, Nie X, Li X, Zhang Y, Yin Y (2022) Context-related video anomaly detection via generative adversarial network. Pattern Recogn Lett 156:183–189
Article Google Scholar
Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recognit 114:107865
Article Google Scholar
Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit 121:108232
Article Google Scholar
Li C, Li H, Zhang G (2023) Future frame prediction based on generative assistant discriminative network for anomaly detection. Appl Intell 53(1):542–559
Article Google Scholar
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133
Article Google Scholar
Lin J, Gan C, Han S (2018) Temporal shift module for efficient video understanding. CoRR abs/1811.08383 (1811)
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301
Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European Conference on Computer Vision, pp 702–716. Springer
Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. Advances in neural information processing systems 28
Lu Y, Yu F, Reddy MKK, Wang Y (2020) Few-shot scene-adaptive anomaly detection. In: European Conference on Computer Vision, pp 125–141. Springer
Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2018) Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE transactions on pattern analysis and machine intelligence 30(3):555–560
Article Google Scholar
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp 3313–3320. IEEE
Le V-T, Kim Y-G (2022) Attention-based residual autoencoder for video anomaly detection. Appl Intell 1–15
Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) Abnormal event detection in videos using generative adversarial nets. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1577–1581. IEEE
Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130
Article Google Scholar
Yang Y, Zhan D, Yang F, Zhou X-D, Yan Y, Wang Y (2020) Improving video anomaly detection performance with patch-level loss and segmentation map. In: 2020 IEEE 6th International Conference on Computer and Communications (ICCC), pp 1832–1839. IEEE
Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 481–490
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714
Deepak K, Chandrakala S, Mohan CK (2021) Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1):215–222
Article Google Scholar
Ravanbakhsh M, Sangineto E, Nabi M, Sebe N (2019) Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1896–1904. IEEE
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 439–444. IEEE
Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2895–2903
Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1951–1960. https://doi.org/10.1109/WACV.2019.00212
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comp Vision Image Underst 156:117–127
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Indian Institute of Technology Indore, Indore, 452020, Madhya Pradesh, India
Rituraj Singh, Krishanu Saini, Anikeit Sethi & Aruna Tiwari
Intelligent System Groups, CSIR-CEERI, PILANI, Pilani, 333031, Rajasthan, India
Sumeet Saurav & Sanjay Singh

Authors

Rituraj Singh
View author publications
You can also search for this author in PubMed Google Scholar
Krishanu Saini
View author publications
You can also search for this author in PubMed Google Scholar
Anikeit Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sumeet Saurav
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rituraj Singh, Krishanu Saini or Anikeit Sethi.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Krishanu Saini and Anikeit Sethi contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, R., Saini, K., Sethi, A. et al. STemGAN: spatio-temporal generative adversarial network for video anomaly detection. Appl Intell 53, 28133–28152 (2023). https://doi.org/10.1007/s10489-023-04940-7

Download citation

Accepted: 02 August 2023
Published: 22 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-04940-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Abstract

Access this article

Similar content being viewed by others

Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis

Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model

VALD-GAN: video anomaly detection using latent discriminator augmented GAN

Data Availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Abstract

Access this article

Similar content being viewed by others

Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis

Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model

VALD-GAN: video anomaly detection using latent discriminator augmented GAN

Data Availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation