Skip to main content
Log in

HN-MUM: heterogeneous video anomaly detection network with multi-united-memory module

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a Heterogeneous Network with Multi-United-Memory (HN-MUM) module, which integrates motion and appearance to solve the Video Anomaly Detection (VAD) problem. First, we present a heterogeneous dual-flow network to process the motion and appearance information independently based on the notion of “specific analysis of particular issues” and the distinction between motion and appearance. Then, motivated by the notion of “view of connection” and the relationships between motion and appearance, we combine the motion and appearance features in the decoding phase. This is achieved by using a memory module to memorize and reconstruct the combined representation by matching the motion patterns with the appearance in memory items. On the other hand, we observe that a single memory module is unable to adequately capture all typical patterns. In light of this, we propose the Multi-United-Memory (MUM), which is consisted of three basic memory modules. Each basic memory module fuses the relevant motion and appearance elements, which is helpful to memorize the motion-appearance-united representation in the memory in a related manner. To the best of our knowledge, this is the first effort to use a multi-level unified-thought memory module to detect abnormalities. On UCSD Ped2, CUHK Avenue, and Shanghai Tech, HN-MUM is able to attain AUC values of 97.1%, 88.2%, and 76.2%, respectively. Extensive experiments on three benchmark datasets show that HN-MUM performs competitively with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

We provide original and editable data appearing in the submitted article, including figures, tables and experimental results.

References

  1. Abati D, Porrello A, Calderara S, Cucchiara R (2018) Latent space autoregression for novelty detection. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 481–490

    Google Scholar 

  2. Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4080–4088

    Google Scholar 

  3. Chang YP, Tu ZG, Xie W, Luo B, Zhang SF, Sui HG (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:1–12

    Google Scholar 

  4. Chen H, Shen J, Wang L, Song J (2017) Leveraging stacked denoising autoencoder in prediction of pathogen-host protein-protein interactions. Processing of the 2017 IEEE international congress on big data, pp 368–375

    Google Scholar 

  5. Fan C, Zhang X, Zhang S, Wang W, Zhang C, Huang H (2019) heterogeneous memory enhanced multimodal attention model for video question answering. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1999–2007

    Google Scholar 

  6. Fanta H, Shao Z, Ma L (2020) SiTGRU: single-tunnelled gated recurrent unit for abnormality detection. Inf Sci 524:15–32

    Google Scholar 

  7. Giorno AD, Bagnell JA, Hebert M (2016) A discriminative framework for anomaly detection in large videos. Processing of the European Conference on Computer Vision, pp 334–349

    Google Scholar 

  8. Gong D, Liu L, Le L, Saha B, Mansour MR, Venkatesh S, Hengel A (2020) memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. Processing of the IEEE International Conference on Computer Vision, pp 1705–1714

    Google Scholar 

  9. Han QL, Wang HF, Yang L, Wu M, Kou JQ, Du QS, Li NF (2020) Real-time adversarial GAN-based abnormal crowd behavior detection. J Real-Time Image Proc 17(6):2153–2162

    Google Scholar 

  10. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742

    Google Scholar 

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

    Google Scholar 

  12. Ionescu RT, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. Processing of the IEEE international conference on computer vision, pp 2914–2922

    Google Scholar 

  13. Kang M, Lee K, Lee YH, Suh C (2020) Autoencoder-based graph construction for semi-supervised learning. Processing of the European conference on computer vision, pp 500–517

    Google Scholar 

  14. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. Processing of the International Conference on Learning Representations

    Google Scholar 

  15. Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) ask me anything: dynamic memory networks for natural language processing. Processing of the International Conference on Machine Learning, pp 2068–2078

    Google Scholar 

  16. Kumar K (2019) EVS-DK: Event video skimming using deep keyframe. J Vis Commun Image Represent 58:345–352

    Google Scholar 

  17. Kumar K, Kumar A, Bahuguna A (2017) D-CAD: deep and crowded anomaly detection. Proceedings of the 7th international conference on computer and communication technology, pp 100–105

    Google Scholar 

  18. Kumar K, Shrimankar DD (2017) F-DES: fast and deep event summarization. IEEE Trans Multimedia 20(2):323–334

    Google Scholar 

  19. Kumar K, Shrimankar DD (2018) Deep event learning boost-up approach: Delta. Multimed Tools Appl 77(20):26635–26655

    Google Scholar 

  20. Kumar K, Shrimankar DD, Singh N (2016) Equal partition based clustering approach for event summarization in videos. 2016 12th international conference on signal-image technology & internet-based systems (SITIS), pp 119–126

    Google Scholar 

  21. Kumar K, Shrimankar DD, Singh N (2018) V-less: a video from linear event summaries. Proceedings of 2nd international conference on Computer Vision & Image Processing, pp 385–395

    Google Scholar 

  22. Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77(6):7383–7404

    Google Scholar 

  23. Lee S, Sung J, Yu Y, Kim G (2018) A memory network approach for story-based temporal summarization of 360 videos. Processing of the IEEE conference on computer vision and pattern recognition, pp 1410–1419

    Google Scholar 

  24. Li RR, Liu WJ, Yang L, Sun SH, Hu W, Zhang F, Li W (2018) DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation. IEEE J Sel Top 11(11):3954–3962

    Google Scholar 

  25. Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32

    Google Scholar 

  26. Liu W, Luo WX, Lian DZ, Gao SH (2018) Future frame prediction for anomaly detection -- a new baseline. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545

    Google Scholar 

  27. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in MATLAB. Processing of the IEEE international conference on computer vision, pp 2720–2727

    Google Scholar 

  28. Łukasz K, Ofir N, Aurko R, Samy B (2017) Learning to remember rare events. Processing of the International Conference on Learning Representations

    Google Scholar 

  29. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. Processing of the IEEE international conference on computer vision, pp 341–349

    Google Scholar 

  30. Luo W, Liu W, Gao S (2017) Remembering history with convolutional LSTM for anomaly detection. Processing of the IEEE international conference on multimedia and expo, pp 439–444

    Google Scholar 

  31. Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2021) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell 43(3):1070–1084

    Google Scholar 

  32. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) anomaly detection in crowded scenes. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1975–1981

    Google Scholar 

  33. Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. Processing of the International Conference on Learning Representations

    Google Scholar 

  34. Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–27

    Google Scholar 

  35. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 935–942

    Google Scholar 

  36. Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11988–11996

    Google Scholar 

  37. Nguyen TN, Meunier J (2019) Anomaly detection in video sequence with appearance-motion correspondence. Processing of the IEEE international conference on computer vision, pp 1273–1283

    Google Scholar 

  38. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. Processing of the IEEE conference on computer vision and pattern recognition, pp 14360–14369

    Google Scholar 

  39. Paszke A, Gross S, Chintala S, Chanan G, Yang E, Devito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. Processing of the Conference and Workshop on Neural Information Processing Systems

    Google Scholar 

  40. Quan Z, Zeng W, Li X, Liu Y, Yu Y, Yang W (2020) Recurrent neural networks with external addressable long-term and working memory for learning long-term dependences. IEEE Trans Neural Netw Learn Syst 31:813–826

    MathSciNet  Google Scholar 

  41. Stewart R, Ermon S (2017) Label-free supervision of neural networks with physics and domain knowledge. Proceeding of the 31st Association for the Advancement of artificial intelligence conference, pp 2576–2582

    Google Scholar 

  42. Wang DL, Wang SY (2021) Abnormal event detection algorithm based on dual attention future frame prediction and gap fusion discrimination. J Electron Imaging 30(2):023009

    Google Scholar 

  43. Weston J, Chopra S, Bordes A (2015) Memory networks. Processing of the International Conference on Learning Representations

    Google Scholar 

  44. Weston JE, Szlam AD, Fergus RD, Sukhbaatar S (2015) End-to-end memory networks. Processing of the Conference and Workshop on Neural Information Processing Systems, pp 2440–2448

    Google Scholar 

  45. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Under 156:117–127

    Google Scholar 

  46. Ye M, Peng X, Gan W, Wu W, Qiao Y (2019) Anopcn: video anomaly detection via deep predictive coding network. Processing of the 27th ACM multimedia conference, pp 1805–1813

    Google Scholar 

  47. Yong SC, Yong HT (2017) Abnormal event detection in videos using spatiotemporal autoencoder. Processing of the international symposium on neural networks, pp 189–196

    Google Scholar 

  48. Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua XS (2017) Spatiotemporal AutoEncoder for video anomaly detection. Processing of the 25th ACM multimedia conference, pp 1933–1941

    Google Scholar 

  49. Zhu M, Pan P, Chen W, Yang Y (2019) DMGAN: dynamic memory generative adversarial networks for text-to-image synthesis. Processing of the IEEE international conference on computer vision, pp 5795–5803

    Google Scholar 

Download references

Code availability

We are pleased to share code that is used in work submitted for publication.

Funding

This work is supported in part by National Natural Science Foundation of China under Grant 61871241, Grant 61971245 and Grant 61976120, in part by Nanjing University State Key Lab. for Novel Software Technology under Grant KFKT2019B15, in part by Nantong Science and Technology Program JC2021131 and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX21_3084 and KYCX22_3340.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Hongjun Li, Yunlong Wang, Mingyi Chen, Jiaxin Li. The first draft of the manuscript was written by Hongjun Li and Yunlong Wang, all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hongjun Li.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Wang, Y., Chen, M. et al. HN-MUM: heterogeneous video anomaly detection network with multi-united-memory module. Multimed Tools Appl 82, 31521–31538 (2023). https://doi.org/10.1007/s11042-023-15154-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15154-x

Keywords

Navigation