Abstract
With the popularity of smart surveillance devices and the increase of people’s security awareness, video anomaly detection has become an important task. However, learning rich multi-scale spatio-temporal information from high-dimensional videos to predict anomalous behaviors is a challenging task due to the large local redundancy and complex global dependencies among video frames. Although Convolutional Neural Network (CNN) has extraordinary bias induction capabilities, their inherent localization limitations lead to their lack of ability to capture long-term spatio-temporal features. Therefore, we propose a Transformer with spatio-temporal representation for video anomaly detection. The network combines the convolution operation with the Transformer operation, and uses the convolution operation to extract shallow spatial features to facilitate the recovery of sampled images. At the same time, Transformer operation is used to encode patches and efficiently capture remote dependencies through a self-attention mechanism, and to reduce the limitations in local redundancy. Experimental results on the UCSD Ped2, CUHK Avenue and ShanghaiTech datasets demonstrate the effectiveness of the proposed network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real-Time Image Proc. 17(4), 915–930 (2020)
Li, C.B., Li, H.J., Zhang, G.A.: Future frame prediction based on generative assistant discriminative network for anomaly detection. Appl. Intell. (2022).https://doi.org/10.1007/s10489-022-03488-2
Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 156, 117–127 (2017)
D’Afflisio, En., Braca, P., Millefiori, L.M., Willett, P.: Detecting anomalous deviations from standard maritime routes using the Ornstein-Uhlenbeck process. IEEE Trans. Signal Process. 66(24), 6474–6487 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical Image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 234–241 (2015)
Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 7794–7803 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.S.: Spatiotemporal autoEncoder for video anomaly detection. In: Processing of the 25th ACM Multimedia Conference, pp. 1933–1941 (2017)
Yan, S.Y., Smith, J.S., Lu, W.J., Zhang, B.L.: Abnormal event detection from videos using a two-stream recurrent variational autoencoder. IEEE Trans. Cognit. Dev. Syst. 12(1), 30–42 (2020)
Parmar, N., et al.: Image transformer. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4055–4064 (2018)
Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTM for anomaly detection. In: Processing of the IEEE International Conference on Multimedia and Expo, pp. 439–444 (2017)
Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: Proceedings of the 19th IEEE Workshop on Application of Computer Vision, Waikoloa Village, USA, pp. 1896–1904 (2019)
Liu, W., Luo, W.X., Lian, D.Z., Gao, S.H.: Future frame prediction for anomaly detection – a new baseline. In: Processing of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 6536–6545 (2018)
Villegas, R., Yang, J., Hong, S., Lin X., Lee, H.: Decomposing motion and content for natural video sequence prediction. In: Processing of the International Conference on Learning Representations, Toulon, France, pp. 1–22 (2017)
Arnab, A., Dehghani, M., Heigold, G., Sun, C. Lučić, M., Schmid, C.: ViViT: a video vision transformer (2021). http://arxiv.org/abs/2103.15691
Li, W.X., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in MATLAB. In: Processing of the IEEE International Conference on Computer Vision, Sydney, pp. 2720–2727. IEEE (2013)
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: Processing of the IEEE International Conference on Computer Vision, Sydney, pp. 341–349. IEEE (2017)
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14360–14369 (2020)
Ye, M., Peng, X., Gan, W., Wu, W., Qiao, Y.: AnoPCN: video anomaly detection via deep predictive coding network. In: Processing of the 27th ACM International Conference on Multimedia, pp. 1805–1813 (2019)
Acknowledgment
This work is supported in part by National Natural Science Foundation of China under Grant 61871241, Grant 61971245 and Grant 61976120, in part by Nantong Science and Technology Program JC2021131 and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX21_3084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, X., Chen, J., Shen, X., Li, H. (2022). Transformer with Spatio-Temporal Representation for Video Anomaly Detection. In: Krzyzak, A., Suen, C.Y., Torsello, A., Nobile, N. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2022. Lecture Notes in Computer Science, vol 13813. Springer, Cham. https://doi.org/10.1007/978-3-031-23028-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-23028-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23027-1
Online ISBN: 978-3-031-23028-8
eBook Packages: Computer ScienceComputer Science (R0)