Abstract
Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adversarial Network (STemGAN). This framework consists of a generator and discriminator that learns from the video context, utilizing both spatial and temporal information to predict future frames. The generator follows an Autoencoder (AE) architecture, having a dual-stream encoder for extracting appearance and motion information, and a decoder having a Channel Attention (CA) module to focus on dynamic foreground features. In addition, we provide a transfer-learning method that enhances the generalizability of STemGAN. We use benchmark Anomaly Detection (AD) datasets to compare the performance of our approach with the existing state-of-the-art approaches using standard evaluation metrics, i.e., AUC (Area Under Curve) and EER (Equal Error Rate). The empirical results show that our proposed STemGAN outperforms the existing state-of-the-art methods achieving an AUC score of 97.5% on UCSDPed2, 86.0% on CUHK Avenue, 90.4% on Subway-entrance, and 95.2% on Subway-exit.
Similar content being viewed by others
Data Availability
The datasets used in this work are publicly available datasets.
References
Li W, Mahadevan V, Vasconcelos N (2013) Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence 36(1):18–32
Ramachandra B, Jones M, Vatsavai RR (2020) A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelligence
Xia X, Pan X, Li N, He X, Ma L, Zhang X, Ding N (2022) Gan-based anomaly detection: A review. Neurocomputing
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2054–2060. IEEE
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1975–1981. IEEE
Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 2112–2119. IEEE
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 Conference on Computer Vision and Pattern Recognition, pp 2921–2928. IEEE
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011, pp 3449–3456. IEEE
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2720–2727
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, pp 428–441. Springer
Pan Y (2016) Heading toward artificial intelligence 2.0. Engineering 2(4):409–413
Xing EP, Ho Q, Xie P, Wei D (2016) Strategies and principles of distributed machine learning on big data. Engineering 2(2):179–195
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 806–813
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Zhao Z-Q, Zheng P, Xu S-t, Wu X (2019) Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems 30(11):3212–3232
Shen Y, Ji R, Wang C, Li X, Li X (2018) Weakly supervised object detection via object-specific pixel gradient. IEEE transactions on neural networks and learning systems 29(12):5960–5970
Wan Z, He H (2017) Weakly supervised object localization with deep convolutional neural network based on spatial pyramid saliency map. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 4177–4181. IEEE
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35(1):221–231
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27
Chen X, Weng J, Lu W, Xu J, Weng J (2017) Deep manifold learning combined with convolutional neural networks for action recognition. IEEE transactions on neural networks and learning systems 29(9):3938–3952
Mao X, Shen C, Yang Y-B (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Advances in neural information processing systems 29
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, pp 189–196. Springer
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE transactions on pattern analysis and machine intelligence 43(3):1070–1084
Sabokrou M, Fathy M, Hoseini M (2016) Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron Lett 52(13):1122–1124
Tran HT, Hogg D (2017) Anomaly detection using a convolutional winner-take-all autoencoder. In: Proceedings of the British Machine Vision Conference 2017. British Machine Vision Association
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622
Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4681–4690
Wu P, Liu J, Shen F (2019) A deep one-class neural network for anomalous event detection in complex scenes. IEEE transactions on neural networks and learning systems 31(7):2609– 2622
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1125–1134
Yu J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545
Bird N, Atev S, Caramelli N, Martin R, Masoud O, Papanikolopoulos N (2006) Real time, online detection of abandoned objects in public areas. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp 3775–3780. IEEE
Fan Y, Wen G, Li D, Qiu S, Levine MD, Xiao F (2020) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comp Vision Image Underst 195:102920
Li N, Chang F (2019) Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder. Neurocomputing 369:92–105
Li N, Chang F, Liu C (2020) Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes. IEEE Transactions on Multimedia 23:203–215
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732
Lin J, Gan C, Han S (2019) Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7083–7093
Li Y, Cai Y, Liu J, Lang S, Zhang X (2019) Spatio-temporal unity networking for video anomaly detection. IEEE Access 7:172425–172432
Lu Y, Kumar KM, shahabeddin Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16Th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–8. IEEE
Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14(10):2537–2550
Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M (2021) A survey on long short-term memory networks for time series prediction. Procedia CIRP 99:650–655
Wu Y, He F, Zhang D, Li X (2015) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 28
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR
Woo S, Park J., Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Zhou JT, Zhang L, Fang Z, Du J, Peng X, Xiao Y (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology 30(12):4639–4647
Bi H-B, Lu D, Zhu H-H, Yang L-N, Guan H-P (2021) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell 51:3450–3459
Li Y, Guo K, Lu Y, Liu L (2021) Cropping and attention based approach for masked face recognition. Appl Intell 51:3012–3025
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 935–942. IEEE
Benezeth Y, Jodoin P-M, Saligrama V, Rosenberger C (2009) Abnormal events detection based on spatio-temporal co-occurences. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 2458–2465. IEEE
Nayak R, Pati UC, Das SK (2021) A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis Comput 106:104078
Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics 16(1):393–402
Schlegl T, Seeböck P, Waldstein SM, Langs G, Schmidt-Erfurth U (2019) f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 54:30–44
Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recognit 109335
Wei H, Li K, Li H, Lyu Y, Hu X (2019) Detecting video anomaly with a stacked convolutional lstm framework. In: International Conference on Computer Vision Systems, pp 330–342. Springer
Doshi K, Yilmaz Y (2022) Rethinking video anomaly detection-a continual learning approach. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3961–3970
Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European Conference on Computer Vision, pp 329–345. Springer
Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2020) Multi-encoder towards effective anomaly detection in videos. IEEE Transactions on Multimedia 23:4106–4116
Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X-S (2017) Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1933–1941
Li D, Nie X, Li X, Zhang Y, Yin Y (2022) Context-related video anomaly detection via generative adversarial network. Pattern Recogn Lett 156:183–189
Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recognit 114:107865
Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit 121:108232
Li C, Li H, Zhang G (2023) Future frame prediction based on generative assistant discriminative network for anomaly detection. Appl Intell 53(1):542–559
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133
Lin J, Gan C, Han S (2018) Temporal shift module for efficient video understanding. CoRR abs/1811.08383 (1811)
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301
Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European Conference on Computer Vision, pp 702–716. Springer
Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. Advances in neural information processing systems 28
Lu Y, Yu F, Reddy MKK, Wang Y (2020) Few-shot scene-adaptive anomaly detection. In: European Conference on Computer Vision, pp 125–141. Springer
Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2018) Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE transactions on pattern analysis and machine intelligence 30(3):555–560
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp 3313–3320. IEEE
Le V-T, Kim Y-G (2022) Attention-based residual autoencoder for video anomaly detection. Appl Intell 1–15
Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) Abnormal event detection in videos using generative adversarial nets. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1577–1581. IEEE
Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130
Yang Y, Zhan D, Yang F, Zhou X-D, Yan Y, Wang Y (2020) Improving video anomaly detection performance with patch-level loss and segmentation map. In: 2020 IEEE 6th International Conference on Computer and Communications (ICCC), pp 1832–1839. IEEE
Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 481–490
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714
Deepak K, Chandrakala S, Mohan CK (2021) Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1):215–222
Ravanbakhsh M, Sangineto E, Nabi M, Sebe N (2019) Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1896–1904. IEEE
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 439–444. IEEE
Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2895–2903
Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1951–1960. https://doi.org/10.1109/WACV.2019.00212
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comp Vision Image Underst 156:117–127
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Krishanu Saini and Anikeit Sethi contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singh, R., Saini, K., Sethi, A. et al. STemGAN: spatio-temporal generative adversarial network for video anomaly detection. Appl Intell 53, 28133–28152 (2023). https://doi.org/10.1007/s10489-023-04940-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04940-7