Abstract
Detecting anomalous in video sequences is one of the most popular computer vision topics. It is considered a challenging task in video analysis due to its definition, which is subjective or context-dependent. Various deep learning models such as convolutional neural networks (CNNs) have been previously utilized for this purpose. This paper proposes a novel solution based on the state-of-the-art deep learning models called Vision Transformer, since it is a trendy topic nowadays and it is performance. We are going to fine-tune a pre-trained Vision Transformer model on the UCSD dataset, which enables the automatic classification of video frames (abnormal and normal objects). The evaluation of this model shows that it achieves a good Accuracy score.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition—a review. IEEE Trans. Syst. Man Cybern. C 42(6), 865–878 (2012). https://doi.org/10.1109/TSMCC.2011.2178594
Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R.: Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018). https://doi.org/10.1016/j.cviu.2018.02.006
Pang, G., Yan, C., Shen, C., Hengel, A.V.D., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 2020, pp. 12170–12179 (2020). https://doi.org/10.1109/CVPR42600.2020.01219
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning Temporal Regularity in Video Sequences, April 2016. arXiv:1604.04574 [cs]. http://arxiv.org/abs/1604.04574. Accessed 04 July 2021
Bidirectional Convolutional LSTM Autoencoder for Risk Detection. IJATCSE 9(5), 8585–8589 (2020). https://doi.org/10.30534/ijatcse/2020/241952020
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, June 2018, pp. 6536–6545 (2018). https://doi.org/10.1109/CVPR.2018.00684
Mahmood, S.A., Abid, A.M., Lafta, S.H.: Anomaly event detection and localization of video clips using global and local outliers. IJEECS 24(2), 1063 (2021). https://doi.org/10.11591/ijeecs.v24.i2.pp1063-1073
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal Event Detection in Videos using Generative Adversarial Nets, August 2017. arXiv:1708.09644 [cs]. http://arxiv.org/abs/1708.09644. Accessed 25 July 2021
Goodfellow, I., et al.: Generative Adversarial Nets, p. 9
Atghaei, A., Ziaeinejad, S., Rahmati, M.: Abnormal Event Detection in Urban Surveillance Videos Using GAN and Transfer Learning , November 2020. arXiv:2011.09619 [cs]. http://arxiv.org/abs/2011.09619. Accessed 17 May 2021
O’Shea, K., Nash, R.: An Introduction to Convolutional Neural Networks, 02 December 2015. arXiv. http://arxiv.org/abs/1511.08458. Accessed 22 May 2022
Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 03 June 2021. arXiv. http://arxiv.org/abs/2010.11929. Accessed 22 May 2022
Vaswani, A., et al.: Attention Is All You Need, December 2017. arXiv:1706.03762 [cs]. http://arxiv.org/abs/1706.03762. Accessed 24 July 2021
Chen, H., et al.: GasHis-transformer: a multi-scale visual transformer approach for gastric histopathology image classification, 17 February 2022. arXiv. http://arxiv.org/abs/2104.14528. Accessed 01 June 2022
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End Object Detection with Transformers, 28 May 2020. arXiv. http://arxiv.org/abs/2005.12872. Accessed 18 May 2022
Lin, J.Y.-Y., Liao, S.-M., Huang, H.-J., Kuo, W.-T., Ou, O.H.-M.: Galaxy Morphological Classification with Efficient Vision Transformer, 03 February 2022. arXiv. http://arxiv.org/abs/2110.01024. Accessed 30 June 2022
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 24 May 2019. arXiv. http://arxiv.org/abs/1810.04805. Accessed 22 May 2022
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization, 21 July 2016. arXiv. http://arxiv.org/abs/1607.06450. Accessed 02 June 2022
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, 29 January 2017. arXiv. http://arxiv.org/abs/1412.6980. Accessed 12 June 2022
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, June 2010, pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Berroukham, A., Housni, K., Lahraichi, M. (2023). Fine-Tuning Pre-trained Vision Transformer Model for Anomaly Detection in Video Sequences. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-28387-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28386-4
Online ISBN: 978-3-031-28387-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)