Abstract
In response to the imbalance between normal and abnormal samples in existing anomaly detection datasets, as well as the complexity in defining anomalies, we introduce a new dataset named Remote Stop to provide data support for existing algorithms. Concurrently, we propose an unsupervised video anomaly detection method based on conditional generative adversarial networks. Our approach trains the model to learn the distribution of normal video data, enabling it to identify anomalous events. The incorporation of a spatial attention mechanism enhances the model’s performance in detecting abnormal behaviors in video frames while maintaining high processing efficiency. Moreover, unlike other methods that assess the entire image, our approach uses overlapping image blocks to determine anomalies, enhancing the accuracy and robustness of the model in image segmentation. These innovations not only address the issues of scarce samples and high-cost labeling but also provide new perspectives and tools for video anomaly detection in the field of public safety. The effectiveness of the model was validated on the Avenue and Ped2 datasets and applied to our newly created dataset (Remote Stop), achieving an AUC of 84.3% and processing 61 video frames per second. This enables efficient sequential processing of large-scale video data, offering positive contributions to enhancing public road safety by providing early warnings and enabling timely preventive measures.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig5_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09911-8/MediaObjects/521_2024_9911_Fig11_HTML.png)
Similar content being viewed by others
References
Islam A, Long C, Radke R (2021) A hybrid attention mechanism for weakly-supervised temporal action localization. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 1637–1645
Kiran BR, Thomas DM, Parakkal R (2018) An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J Imaging 4(2):36
Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407
Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 481–490
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742
Sabokrou M, Khalooei M, Fathy M, Adeli E (2018) Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3379–3388
Nguyen T-N, Meunier J (2019) Anomaly detection in video sequence with appearance-motion correspondence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1273–1283
Lu Y, Kumar KM, Nabavi S, Wang Y (2019) Future frame prediction using convolutional VRNN for anomaly detection. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–8
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel AVD (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469
Frolov S, Hinz T, Raue F, Hees J, Dengel A (2021) Adversarial text-to-image synthesis: a review. Neural Netw 144:187–209
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011. IEEE, pp 3313–3320
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011, pp. 3449–3456. IEEE
Kim J, Grauman K (2009) Observe locally, infer globally: a space–time MRF for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2928
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25
Sabokrou M, Fathy M, Hoseini M (2016) Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron Lett 52(13):1122–1124
Fan Y, Wen G, Li D, Qiu S, Levine MD, Xiao F (2020) Video anomaly detection and localization via Gaussian mixture fully convolutional variational autoencoder. Comput Vis Image Underst 195:102920
Luo W, Liu W, Gao S (2017) Remembering history with convolutional LSTM for anomaly detection. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 439–444
Zhou X-G, Zhang L-Q (2015) Abnormal event detection using recurrent neural network. In: 2015 International conference on computer science and applications (CSA). IEEE, pp. 222–226
Lee S, Kim HG, Ro YM (2018) Stan: spatio-temporal adversarial networks for abnormal event detection. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1323–1327
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection—a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545
Liu Z, Nie Y, Long C, Zhang Q, Li G (2021) A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13588–13597
Tran HTM, Hogg D (2022) Anomaly detection using prediction error with spatio-temporal convolutional LSTM. arXiv preprint arXiv:2205.08812
Monakhov V, Thambawita V, Halvorsen P, Riegler MA (2022) Grid HTM: Hierarchical temporal memory for anomaly detection in videos. arXiv preprint arXiv:2205.15407
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, Proceedings, Part III, vol 18. Springer, pp 234–241
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: CVPR, pp 2720–2727
Mahadevan V, Li WX, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: CVPR
Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2903
Funding
The study was acknowledged by the Shanghai Key Science and Technology Project (19DZ1208903); National Natural Science Foundation of China (Grant Nos. 61572325 and 60970012); Ministry of Education Doctoral Fund of Ph.D. Supervisor of China (Grant No. 20113120110008); Shanghai Key Science and Technology Project in Information Technology Field (Grant Nos. 14511107902 and 16DZ1203603); Shanghai Leading Academic Discipline Project (No. XTKX2012); Shanghai Engineering Research Center Project (Nos. GCZX14014 and C14001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing financial interests.
Human and/or animals rights
This study analyzed anonymous video data captured by public surveillance cameras. During the research process, there was no direct interaction with any pedestrians nor was any personally identifiable information collected.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xi, B., Chen, Q. Real-time anomaly detection for ‘Remote’ bus stop surveillance using unsupervised conditional generative adversarial networks. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09911-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00521-024-09911-8