Fine-Tuning Pre-trained Vision Transformer Model for Anomaly Detection in Video Sequences

Berroukham, Abdelhafid; Housni, Khalid; Lahraichi, Mohammed

doi:10.1007/978-3-031-28387-1_24

Fine-Tuning Pre-trained Vision Transformer Model for Anomaly Detection in Video Sequences

Abdelhafid Berroukham¹⁴,
Khalid Housni¹⁴ &
Mohammed Lahraichi¹⁵

Conference paper
First Online: 29 March 2023

268 Accesses
1 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 625))

Abstract

Detecting anomalous in video sequences is one of the most popular computer vision topics. It is considered a challenging task in video analysis due to its definition, which is subjective or context-dependent. Various deep learning models such as convolutional neural networks (CNNs) have been previously utilized for this purpose. This paper proposes a novel solution based on the state-of-the-art deep learning models called Vision Transformer, since it is a trendy topic nowadays and it is performance. We are going to fine-tune a pre-trained Vision Transformer model on the UCSD dataset, which enables the automatic classification of video frames (abnormal and normal objects). The evaluation of this model shows that it achieves a good Accuracy score.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition—a review. IEEE Trans. Syst. Man Cybern. C 42(6), 865–878 (2012). https://doi.org/10.1109/TSMCC.2011.2178594
Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R.: Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018). https://doi.org/10.1016/j.cviu.2018.02.006
Pang, G., Yan, C., Shen, C., Hengel, A.V.D., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 2020, pp. 12170–12179 (2020). https://doi.org/10.1109/CVPR42600.2020.01219
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning Temporal Regularity in Video Sequences, April 2016. arXiv:1604.04574 [cs]. http://arxiv.org/abs/1604.04574. Accessed 04 July 2021
Bidirectional Convolutional LSTM Autoencoder for Risk Detection. IJATCSE 9(5), 8585–8589 (2020). https://doi.org/10.30534/ijatcse/2020/241952020
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, June 2018, pp. 6536–6545 (2018). https://doi.org/10.1109/CVPR.2018.00684
Mahmood, S.A., Abid, A.M., Lafta, S.H.: Anomaly event detection and localization of video clips using global and local outliers. IJEECS 24(2), 1063 (2021). https://doi.org/10.11591/ijeecs.v24.i2.pp1063-1073
Article Google Scholar
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal Event Detection in Videos using Generative Adversarial Nets, August 2017. arXiv:1708.09644 [cs]. http://arxiv.org/abs/1708.09644. Accessed 25 July 2021
Goodfellow, I., et al.: Generative Adversarial Nets, p. 9
Google Scholar
Atghaei, A., Ziaeinejad, S., Rahmati, M.: Abnormal Event Detection in Urban Surveillance Videos Using GAN and Transfer Learning , November 2020. arXiv:2011.09619 [cs]. http://arxiv.org/abs/2011.09619. Accessed 17 May 2021
O’Shea, K., Nash, R.: An Introduction to Convolutional Neural Networks, 02 December 2015. arXiv. http://arxiv.org/abs/1511.08458. Accessed 22 May 2022
Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 03 June 2021. arXiv. http://arxiv.org/abs/2010.11929. Accessed 22 May 2022
Vaswani, A., et al.: Attention Is All You Need, December 2017. arXiv:1706.03762 [cs]. http://arxiv.org/abs/1706.03762. Accessed 24 July 2021
Chen, H., et al.: GasHis-transformer: a multi-scale visual transformer approach for gastric histopathology image classification, 17 February 2022. arXiv. http://arxiv.org/abs/2104.14528. Accessed 01 June 2022
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End Object Detection with Transformers, 28 May 2020. arXiv. http://arxiv.org/abs/2005.12872. Accessed 18 May 2022
Lin, J.Y.-Y., Liao, S.-M., Huang, H.-J., Kuo, W.-T., Ou, O.H.-M.: Galaxy Morphological Classification with Efficient Vision Transformer, 03 February 2022. arXiv. http://arxiv.org/abs/2110.01024. Accessed 30 June 2022
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 24 May 2019. arXiv. http://arxiv.org/abs/1810.04805. Accessed 22 May 2022
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization, 21 July 2016. arXiv. http://arxiv.org/abs/1607.06450. Accessed 02 June 2022
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, 29 January 2017. arXiv. http://arxiv.org/abs/1412.6980. Accessed 12 June 2022
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, June 2010, pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872

Download references

Author information

Authors and Affiliations

L@RI Laboratory, MISC Team, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco
Abdelhafid Berroukham & Khalid Housni
CRMEF, Casablanca, Morocco
Mohammed Lahraichi

Authors

Abdelhafid Berroukham
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Housni
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Lahraichi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelhafid Berroukham .

Editor information

Editors and Affiliations

ENSIAS, Mohammed V University, Rabat, Morocco
Mohamed Lazaar
FST, Abdelmalek Essaâdi University, Tangier, Morocco
El Mokhtar En-Naimi
FST, Abdelmalek Essaâdi University, Tangier, Morocco
Abdelhamid Zouhair
ENSA, Abdelmalek Essaâdi University, Tetuan, Morocco
Mohammed Al Achhab
ENSA, Abdelmalek Essaadi University, Tetouan, Morocco
Oussama Mahboub

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berroukham, A., Housni, K., Lahraichi, M. (2023). Fine-Tuning Pre-trained Vision Transformer Model for Anomaly Detection in Video Sequences. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-28387-1_24
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28386-4
Online ISBN: 978-3-031-28387-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics