Abstract
In this paper, we present a Heterogeneous Network with Multi-United-Memory (HN-MUM) module, which integrates motion and appearance to solve the Video Anomaly Detection (VAD) problem. First, we present a heterogeneous dual-flow network to process the motion and appearance information independently based on the notion of “specific analysis of particular issues” and the distinction between motion and appearance. Then, motivated by the notion of “view of connection” and the relationships between motion and appearance, we combine the motion and appearance features in the decoding phase. This is achieved by using a memory module to memorize and reconstruct the combined representation by matching the motion patterns with the appearance in memory items. On the other hand, we observe that a single memory module is unable to adequately capture all typical patterns. In light of this, we propose the Multi-United-Memory (MUM), which is consisted of three basic memory modules. Each basic memory module fuses the relevant motion and appearance elements, which is helpful to memorize the motion-appearance-united representation in the memory in a related manner. To the best of our knowledge, this is the first effort to use a multi-level unified-thought memory module to detect abnormalities. On UCSD Ped2, CUHK Avenue, and Shanghai Tech, HN-MUM is able to attain AUC values of 97.1%, 88.2%, and 76.2%, respectively. Extensive experiments on three benchmark datasets show that HN-MUM performs competitively with state-of-the-art methods.
Similar content being viewed by others
Data availability
We provide original and editable data appearing in the submitted article, including figures, tables and experimental results.
References
Abati D, Porrello A, Calderara S, Cucchiara R (2018) Latent space autoregression for novelty detection. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 481–490
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4080–4088
Chang YP, Tu ZG, Xie W, Luo B, Zhang SF, Sui HG (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:1–12
Chen H, Shen J, Wang L, Song J (2017) Leveraging stacked denoising autoencoder in prediction of pathogen-host protein-protein interactions. Processing of the 2017 IEEE international congress on big data, pp 368–375
Fan C, Zhang X, Zhang S, Wang W, Zhang C, Huang H (2019) heterogeneous memory enhanced multimodal attention model for video question answering. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1999–2007
Fanta H, Shao Z, Ma L (2020) SiTGRU: single-tunnelled gated recurrent unit for abnormality detection. Inf Sci 524:15–32
Giorno AD, Bagnell JA, Hebert M (2016) A discriminative framework for anomaly detection in large videos. Processing of the European Conference on Computer Vision, pp 334–349
Gong D, Liu L, Le L, Saha B, Mansour MR, Venkatesh S, Hengel A (2020) memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. Processing of the IEEE International Conference on Computer Vision, pp 1705–1714
Han QL, Wang HF, Yang L, Wu M, Kou JQ, Du QS, Li NF (2020) Real-time adversarial GAN-based abnormal crowd behavior detection. J Real-Time Image Proc 17(6):2153–2162
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Ionescu RT, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. Processing of the IEEE international conference on computer vision, pp 2914–2922
Kang M, Lee K, Lee YH, Suh C (2020) Autoencoder-based graph construction for semi-supervised learning. Processing of the European conference on computer vision, pp 500–517
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. Processing of the International Conference on Learning Representations
Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) ask me anything: dynamic memory networks for natural language processing. Processing of the International Conference on Machine Learning, pp 2068–2078
Kumar K (2019) EVS-DK: Event video skimming using deep keyframe. J Vis Commun Image Represent 58:345–352
Kumar K, Kumar A, Bahuguna A (2017) D-CAD: deep and crowded anomaly detection. Proceedings of the 7th international conference on computer and communication technology, pp 100–105
Kumar K, Shrimankar DD (2017) F-DES: fast and deep event summarization. IEEE Trans Multimedia 20(2):323–334
Kumar K, Shrimankar DD (2018) Deep event learning boost-up approach: Delta. Multimed Tools Appl 77(20):26635–26655
Kumar K, Shrimankar DD, Singh N (2016) Equal partition based clustering approach for event summarization in videos. 2016 12th international conference on signal-image technology & internet-based systems (SITIS), pp 119–126
Kumar K, Shrimankar DD, Singh N (2018) V-less: a video from linear event summaries. Proceedings of 2nd international conference on Computer Vision & Image Processing, pp 385–395
Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77(6):7383–7404
Lee S, Sung J, Yu Y, Kim G (2018) A memory network approach for story-based temporal summarization of 360 videos. Processing of the IEEE conference on computer vision and pattern recognition, pp 1410–1419
Li RR, Liu WJ, Yang L, Sun SH, Hu W, Zhang F, Li W (2018) DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation. IEEE J Sel Top 11(11):3954–3962
Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32
Liu W, Luo WX, Lian DZ, Gao SH (2018) Future frame prediction for anomaly detection -- a new baseline. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in MATLAB. Processing of the IEEE international conference on computer vision, pp 2720–2727
Łukasz K, Ofir N, Aurko R, Samy B (2017) Learning to remember rare events. Processing of the International Conference on Learning Representations
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. Processing of the IEEE international conference on computer vision, pp 341–349
Luo W, Liu W, Gao S (2017) Remembering history with convolutional LSTM for anomaly detection. Processing of the IEEE international conference on multimedia and expo, pp 439–444
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2021) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell 43(3):1070–1084
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) anomaly detection in crowded scenes. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1975–1981
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. Processing of the International Conference on Learning Representations
Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–27
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 935–942
Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. Processing of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11988–11996
Nguyen TN, Meunier J (2019) Anomaly detection in video sequence with appearance-motion correspondence. Processing of the IEEE international conference on computer vision, pp 1273–1283
Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. Processing of the IEEE conference on computer vision and pattern recognition, pp 14360–14369
Paszke A, Gross S, Chintala S, Chanan G, Yang E, Devito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. Processing of the Conference and Workshop on Neural Information Processing Systems
Quan Z, Zeng W, Li X, Liu Y, Yu Y, Yang W (2020) Recurrent neural networks with external addressable long-term and working memory for learning long-term dependences. IEEE Trans Neural Netw Learn Syst 31:813–826
Stewart R, Ermon S (2017) Label-free supervision of neural networks with physics and domain knowledge. Proceeding of the 31st Association for the Advancement of artificial intelligence conference, pp 2576–2582
Wang DL, Wang SY (2021) Abnormal event detection algorithm based on dual attention future frame prediction and gap fusion discrimination. J Electron Imaging 30(2):023009
Weston J, Chopra S, Bordes A (2015) Memory networks. Processing of the International Conference on Learning Representations
Weston JE, Szlam AD, Fergus RD, Sukhbaatar S (2015) End-to-end memory networks. Processing of the Conference and Workshop on Neural Information Processing Systems, pp 2440–2448
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Under 156:117–127
Ye M, Peng X, Gan W, Wu W, Qiao Y (2019) Anopcn: video anomaly detection via deep predictive coding network. Processing of the 27th ACM multimedia conference, pp 1805–1813
Yong SC, Yong HT (2017) Abnormal event detection in videos using spatiotemporal autoencoder. Processing of the international symposium on neural networks, pp 189–196
Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua XS (2017) Spatiotemporal AutoEncoder for video anomaly detection. Processing of the 25th ACM multimedia conference, pp 1933–1941
Zhu M, Pan P, Chen W, Yang Y (2019) DMGAN: dynamic memory generative adversarial networks for text-to-image synthesis. Processing of the IEEE international conference on computer vision, pp 5795–5803
Code availability
We are pleased to share code that is used in work submitted for publication.
Funding
This work is supported in part by National Natural Science Foundation of China under Grant 61871241, Grant 61971245 and Grant 61976120, in part by Nanjing University State Key Lab. for Novel Software Technology under Grant KFKT2019B15, in part by Nantong Science and Technology Program JC2021131 and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX21_3084 and KYCX22_3340.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Hongjun Li, Yunlong Wang, Mingyi Chen, Jiaxin Li. The first draft of the manuscript was written by Hongjun Li and Yunlong Wang, all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Wang, Y., Chen, M. et al. HN-MUM: heterogeneous video anomaly detection network with multi-united-memory module. Multimed Tools Appl 82, 31521–31538 (2023). https://doi.org/10.1007/s11042-023-15154-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15154-x