Transformer with Spatio-Temporal Representation for Video Anomaly Detection

Sun, Xiaohu; Chen, Jinyi; Shen, Xulin; Li, Hongjun

doi:10.1007/978-3-031-23028-8_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13813))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

541 Accesses
1 Citations

Abstract

With the popularity of smart surveillance devices and the increase of people’s security awareness, video anomaly detection has become an important task. However, learning rich multi-scale spatio-temporal information from high-dimensional videos to predict anomalous behaviors is a challenging task due to the large local redundancy and complex global dependencies among video frames. Although Convolutional Neural Network (CNN) has extraordinary bias induction capabilities, their inherent localization limitations lead to their lack of ability to capture long-term spatio-temporal features. Therefore, we propose a Transformer with spatio-temporal representation for video anomaly detection. The network combines the convolution operation with the Transformer operation, and uses the convolution operation to extract shallow spatial features to facilitate the recovery of sampled images. At the same time, Transformer operation is used to encode patches and efficiently capture remote dependencies through a self-attention mechanism, and to reduce the limitations in local redundancy. Experimental results on the UCSD Ped2, CUHK Avenue and ShanghaiTech datasets demonstrate the effectiveness of the proposed network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-scale Siamese prediction network for video anomaly detection

Article 18 June 2022

MPAT: multi-path attention temporal method for video anomaly detection

Article 22 September 2022

Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

References

Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real-Time Image Proc. 17(4), 915–930 (2020)
Article Google Scholar
Li, C.B., Li, H.J., Zhang, G.A.: Future frame prediction based on generative assistant discriminative network for anomaly detection. Appl. Intell. (2022).https://doi.org/10.1007/s10489-022-03488-2
Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 156, 117–127 (2017)
Article Google Scholar
D’Afflisio, En., Braca, P., Millefiori, L.M., Willett, P.: Detecting anomalous deviations from standard maritime routes using the Ornstein-Uhlenbeck process. IEEE Trans. Signal Process. 66(24), 6474–6487 (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical Image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 234–241 (2015)
Google Scholar
Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)
Article Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 7794–7803 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.S.: Spatiotemporal autoEncoder for video anomaly detection. In: Processing of the 25th ACM Multimedia Conference, pp. 1933–1941 (2017)
Google Scholar
Yan, S.Y., Smith, J.S., Lu, W.J., Zhang, B.L.: Abnormal event detection from videos using a two-stream recurrent variational autoencoder. IEEE Trans. Cognit. Dev. Syst. 12(1), 30–42 (2020)
Article Google Scholar
Parmar, N., et al.: Image transformer. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4055–4064 (2018)
Google Scholar
Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTM for anomaly detection. In: Processing of the IEEE International Conference on Multimedia and Expo, pp. 439–444 (2017)
Google Scholar
Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: Proceedings of the 19th IEEE Workshop on Application of Computer Vision, Waikoloa Village, USA, pp. 1896–1904 (2019)
Google Scholar
Liu, W., Luo, W.X., Lian, D.Z., Gao, S.H.: Future frame prediction for anomaly detection – a new baseline. In: Processing of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 6536–6545 (2018)
Google Scholar
Villegas, R., Yang, J., Hong, S., Lin X., Lee, H.: Decomposing motion and content for natural video sequence prediction. In: Processing of the International Conference on Learning Representations, Toulon, France, pp. 1–22 (2017)
Google Scholar
Arnab, A., Dehghani, M., Heigold, G., Sun, C. Lučić, M., Schmid, C.: ViViT: a video vision transformer (2021). http://arxiv.org/abs/2103.15691
Li, W.X., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)
Article Google Scholar
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in MATLAB. In: Processing of the IEEE International Conference on Computer Vision, Sydney, pp. 2720–2727. IEEE (2013)
Google Scholar
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: Processing of the IEEE International Conference on Computer Vision, Sydney, pp. 341–349. IEEE (2017)
Google Scholar
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14360–14369 (2020)
Google Scholar
Ye, M., Peng, X., Gan, W., Wu, W., Qiao, Y.: AnoPCN: video anomaly detection via deep predictive coding network. In: Processing of the 27th ACM International Conference on Multimedia, pp. 1805–1813 (2019)
Google Scholar

Download references

Acknowledgment

This work is supported in part by National Natural Science Foundation of China under Grant 61871241, Grant 61971245 and Grant 61976120, in part by Nantong Science and Technology Program JC2021131 and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX21_3084.

Author information

Authors and Affiliations

School of Information Science and Technology, Nantong University, Nantong, 226019, China
Xiaohu Sun, Jinyi Chen, Xulin Shen & Hongjun Li

Authors

Xiaohu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jinyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xulin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hongjun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjun Li .

Editor information

Editors and Affiliations

Concordia University, Montreal, QC, Canada
Adam Krzyzak
Concordia University, Montreal, QC, Canada
Ching Y. Suen
Ca Foscari University of Venice, Venezia, Italy
Andrea Torsello
Concordia University, Montreal, QC, Canada
Nicola Nobile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, X., Chen, J., Shen, X., Li, H. (2022). Transformer with Spatio-Temporal Representation for Video Anomaly Detection. In: Krzyzak, A., Suen, C.Y., Torsello, A., Nobile, N. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2022. Lecture Notes in Computer Science, vol 13813. Springer, Cham. https://doi.org/10.1007/978-3-031-23028-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-23028-8_22
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23027-1
Online ISBN: 978-3-031-23028-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transformer with Spatio-Temporal Representation for Video Anomaly Detection

Abstract

Access this chapter

Similar content being viewed by others

Multi-scale Siamese prediction network for video anomaly detection

MPAT: multi-path attention temporal method for video anomaly detection

Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Transformer with Spatio-Temporal Representation for Video Anomaly Detection

Abstract

Access this chapter

Similar content being viewed by others

Multi-scale Siamese prediction network for video anomaly detection

MPAT: multi-path attention temporal method for video anomaly detection

Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation