Skip to main content
Log in

Video anomaly detection based on attention and efficient spatio-temporal feature extraction

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

An anomaly is a pattern, behavior, or event that does not frequently happen in an environment. Video anomaly detection has always been a challenging task. Home security, public area monitoring, and quality control in production lines are only a few applications of video anomaly detection. The spatio-temporal nature of the videos, the lack of an exact definition for anomalies, and the inefficiencies of feature extraction for videos are examples of the challenges that researchers face in video anomaly detection. To find a solution to these challenges, we propose a method that uses parallel deep structures to extract informative features from the videos. The method consists of different units including an attention unit, frame sampling units, spatial and temporal feature extractors, and thresholding. Using these units, we propose a video anomaly detection that aggregates the results of four parallel structures. Aggregating the results brings generality and flexibility to the algorithm. The proposed method achieves satisfying results for four popular video anomaly detection benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are publicly available. The Avenue dataset is available at http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html. UCSD Ped1 and UCSD Ped2 datasets are available at http://www.svcl.ucsd.edu/projects/anomaly/dataset.html. ShanghaiTech dataset is available at https://svip-lab.github.io/dataset/campus_dataset.html.

References

  1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/arXiv.1409.1556

  2. Nayak, R., Pati, U., Das, S.: A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis. Comput. 106, 104078 (2021). https://doi.org/10.1016/j.imavis.2020.104078

    Article  Google Scholar 

  3. Acsintoae, A., Florescu, A., Georgescu, M., Mare, T., Sumedrea, P., Ionescu, R., Shahbaz Khan, F., Shah, M.: Ubnormal: new benchmark for supervised open-set video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20143–20153 (2022). https://doi.org/10.48550/arXiv.2111.08644

  4. Raja, R., Sharma, P., Mahmood, M., Saini, D.: Analysis of anomaly detection in surveillance video: recent trends and future vision. Multimed. Tools Appl. 82, 12635–12651 (2023). https://doi.org/10.1007/s11042-022-13954-1

    Article  Google Scholar 

  5. Zhang, J., Jia, Y., Xie, W., Tu, Z.: Zoom transformer for skeleton-based group activity recognition. IEEE Trans. Circuits Syst. Video Technol. 32, 8646–8659 (2022). https://doi.org/10.1109/TCSVT.2022.3193574

    Article  Google Scholar 

  6. Wang, Z., Zou, Y., Zhang, Z.: Cluster attention contrast for video anomaly detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2463–2471 (2020). https://doi.org/10.1145/3394171.3413529

  7. Li, H., Achim, A., Bull, D.: Unsupervised video anomaly detection using feature clustering. IET Signal Proc. 6, 521–533 (2012). https://doi.org/10.1049/iet-spr.2011.0074

    Article  MathSciNet  Google Scholar 

  8. Chang, Y., Tu, Z., Xie, W., Yuan, J.: Clustering driven deep autoencoder for video anomaly detection. In: European Conference on Computer Vision, pp. 329–345 (2020). https://doi.org/10.1007/978-3-030-58555-6_20

  9. Piciarelli, C., Micheloni, C., Foresti, G.L.: Trajectory-based anomalous event detection. IEEE Trans. Circuits Syst. Video Technol. 18, 1544–1554 (2008). https://doi.org/10.1109/TCSVT.2008.2005599

    Article  Google Scholar 

  10. Fu, Z., Hu, W., Tan, T.: Similarity based vehicle trajectory clustering and anomaly detection. In: IEEE International Conference on Image Processing, vol. 2, pp. II-602 (2005). https://doi.org/10.1109/ICIP.2005.1530127

  11. Asad, M., Jiang, H., Yang, J., Tu, E., Malik, A.A.: Multi-stream 3D latent feature clustering for abnormality detection in videos. Appl. Intell. 52, 1126–1143 (2022). https://doi.org/10.1007/s10489-021-02356-9

    Article  Google Scholar 

  12. Vafaei Sadr, A., Bassett, B.A., Kunz, M.A.: Flexible framework for anomaly detection via dimensionality reduction. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-05839-5

    Article  Google Scholar 

  13. Singh, D., Mohan, C.K.: Deep spatio-temporal representation for detection of road accidents using stacked autoencoder. IEEE Trans. Intell. Transp. Syst. 20, 879–887 (2018). https://doi.org/10.1109/TITS.2018.2835308

    Article  Google Scholar 

  14. Sabokrou, M., Fathy, M., Hoseini, M.: Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron. Lett. 52, 1122–1124 (2016). https://doi.org/10.1049/el.2016.0440

    Article  Google Scholar 

  15. Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26, 1992–2004 (2017). https://doi.org/10.1109/TIP.2017.2670780

    Article  MathSciNet  Google Scholar 

  16. Wang, T., Qiao, M., Lin, Z., Li, C., Snoussi, H., Liu, Z., Choi, C.: Generative neural networks for anomaly detection in crowded scenes. IEEE Trans. Inf. Forensics Secur. 14, 1390–1399 (2018). https://doi.org/10.1109/TIFS.2018.2878538

    Article  Google Scholar 

  17. Gong, D., Liu, L., Le, V., Saha, B., Mansour, M., Venkatesh, S., Hengel, A.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019). https://doi.org/10.1109/ICCV.2019.00179

  18. Sun, J., Wang, X., Xiong, N., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018). https://doi.org/10.1109/ACCESS.2018.2848210

    Article  Google Scholar 

  19. Chu, W., Xue, H., Yao, C., Cai, D.: Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans. Multimed. 21, 246–255 (2018). https://doi.org/10.1109/TMM.2018.2846411

    Article  Google Scholar 

  20. Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018). https://doi.org/10.1016/j.cviu.2018.02.006

    Article  Google Scholar 

  21. Yu, Q., Kavitha, M.S., Kurita, T.: Mixture of experts with convolutional and variational autoencoders for anomaly detection. Appl. Intell. 51, 3241–3254 (2021). https://doi.org/10.1007/s10489-020-01944-5

    Article  Google Scholar 

  22. Luo, W., Liu, W., Lian, D., Tang, J., Duan, L., Peng, X., Gao, S.: Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1070–1084 (2019). https://doi.org/10.1109/TPAMI.2019.2944377

    Article  Google Scholar 

  23. Aslam, N., Kolekar, M.: DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02882-2

    Article  Google Scholar 

  24. Hu, J., Zhu, E., Wang, S., Liu, X., Guo, X., Yin, J.: An efficient and robust unsupervised anomaly detection method using ensemble random projection in surveillance videos. Sensors 19, 4145 (2019). https://doi.org/10.3390/s19194145

    Article  Google Scholar 

  25. Chandrakala, S., Deepak, K., Revathy, G.: Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis. Artif. Intell. Rev. 56, 3319–3368 (2023). https://doi.org/10.1007/s10462-022-10258-6

    Article  Google Scholar 

  26. Chang, Y., Tu, Z., Xie, W., Luo, B., Zhang, S., Sui, H., Yuan, J.: Video anomaly detection with spatio-temporal dissociation. Pattern Recogn. 122, 108213 (2022). https://doi.org/10.1016/j.patcog.2021.108213

    Article  Google Scholar 

  27. Zhong, Y., Chen, X., Hu, Y., Tang, P., Ren, F.: Bidirectional spatio-temporal feature learning with multiscale evaluation for video anomaly detection. IEEE Trans. Circuits Syst. Video Technol. 32, 8285–8296 (2022). https://doi.org/10.1109/TCSVT.2022.3190539

    Article  Google Scholar 

  28. Liu, W., Chang, H., Ma, B., Shan, S., Chen, X.: Diversity-measurable anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12147–12156 (2023). https://doi.org/10.48550/arXiv.2303.05047

  29. Chaurasia, R., Jaiswal, U.: Spatio-temporal based video anomaly detection using deep neural networks. Int. J. Inf. Technol. 15, 1569–1581 (2023). https://doi.org/10.1007/s41870-023-01193-y

    Article  Google Scholar 

  30. Yadav, D., Jain, A., Asati, S., Yadav, A.: Video anomaly detection for pedestrian surveillance. Comput. Vis. Mach. Intell. Proc. CVMI 2022, 489–500 (2023). https://doi.org/10.1007/978-981-19-7867-8_39

    Article  Google Scholar 

  31. Gayal, B., Patil, S.: Detection and localization of anomalies in video surveillance using novel optimization based deep convolutional neural network. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-14917-w

    Article  Google Scholar 

  32. Ye, M., Peng, X., Gan, W., Wu, W., Qiao, Y.: Anopcn: video anomaly detection via deep predictive coding network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1805–1813 (2019). https://doi.org/10.1145/3343031.3350899

  33. Lai, Y., Liu, R., Han, Y.: Video anomaly detection via predictive autoencoder with gradient-based attention. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102894

  34. Zhang, Y., Nie, X., He, R., Chen, M., Yin, Y.: Normality learning in multispace for video anomaly detection. IEEE Trans. Circuits Syst. Video Technol. 31, 3694–3706 (2020). https://doi.org/10.1109/TCSVT.2020.3039798

    Article  Google Scholar 

  35. Wang, X., Che, Z., Jiang, B., Xiao, N., Yang, K., Tang, J., Ye, J., Wang, J., Qi, Q.: Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Trans. Neural Netw. Learn. Syst. 33, 2301–2312 (2021). https://doi.org/10.1109/TNNLS.2021.3083152

    Article  MathSciNet  Google Scholar 

  36. https://github.com/OlafenwaMoses/ImageAI. Accessed 31 May 2023

  37. https://keras.io/api/applications/vgg/#vgg16-function. Accessed 31 May 2023

  38. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv. Neural. Inf. Process. Syst. (2015). https://doi.org/10.48550/arXiv.1506.04214

    Article  Google Scholar 

  39. Mukherjee, S., Ghosh, S., Ghosh, S., Kumar, P., Roy, P.P.: Predicting video-frames using encoder-convlstm combination. In: ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2027–2031 (2019). https://doi.org/10.1109/ICASSP.2019.8682158

  40. Lin, Z., Li, M., Zheng, Z., Cheng, Y., Yuan, C.: Self-attention convlstm for spatiotemporal prediction. Proc. AAAI Conf. Artif. Intell. 34, 11531–11538 (2020). https://doi.org/10.1609/aaai.v34i07.6819

    Article  Google Scholar 

  41. Desai, P., Sujatha, C., Chakraborty, S., Ansuman, S., Bhandari, S., Kardiguddi, S.: Next frame prediction using ConvLSTM. J. Phys. Conf. Ser. 2161, 012024 (2022). https://doi.org/10.1088/1742-6596/2161/1/012024

    Article  Google Scholar 

  42. Luo, W., Liu, W., Gao, S.: Remembering history with convolutional lstm for anomaly detection. In: IEEE International Conference on Multimedia and Expo, pp. 439–444 (2017). https://doi.org/10.1109/ICME.2017.8019325

  43. Sabih, M., Vishwakarma, D.: Crowd anomaly detection with LSTMs using optical features and domain knowledge for improved inferring. Vis. Comput. 38, 1719–1730 (2022). https://doi.org/10.1007/s00371-021-02100-x

    Article  Google Scholar 

  44. http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html. Accessed 30 May 2023

  45. http://www.svcl.ucsd.edu/projects/anomaly/dataset.html. Accessed 30 May 2023

  46. https://svip-lab.github.io/dataset/campus_dataset.html. Accessed 12 Dec 2022

Download references

Funding

The authors do not receive financial support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Kazemi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent for publication

The authors have written this paper for educational purposes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahimpour, S.M., Kazemi, M., Moallem, P. et al. Video anomaly detection based on attention and efficient spatio-temporal feature extraction. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03361-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03361-y

Keywords

Navigation