Vision-based outlier detection techniques in automated surveillance: a survey and future ideas

Umale, Ankita; Lal, Nidhi; Goel, Charu

doi:10.1007/s11042-023-15911-y

Vision-based outlier detection techniques in automated surveillance: a survey and future ideas

Published: 01 July 2023

Volume 83, pages 14565–14607, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ankita Umale¹,
Nidhi Lal² &
Charu Goel³

168 Accesses
Explore all metrics

Abstract

Outlier detection is one of the emerging study topics influenced by video annotation. An outlier is anything odd or irregular that deviates from the norm. Outlier detection is subjective since it is influenced by a variety of contextual circumstances. Human mobility patterns and classification are explored in this survey to combine it for further anomaly detection. There is a possibility of noise interference in automated surveillance, which disturbs a video stream and makes it difficult to identify details, reducing its accuracy. The presence of such frequent noise-interference raises the error rate, particularly in real-time processing models. Aside from that, Motion Camouflage must be addressed in intelligent surveillance to obtain a clear view of the frame. This paper offers a focused survey, highlighting the three major issues in video anomaly detection: the inadequate utilization of Motion Patterns, the prolonged Noise Interference that affects accuracy and leads to an increased error rate, and the elevated false-alarm rate due to Motion Camouflage. A brief analysis of these techniques, as well as their limitations, is provided. This paper focuses on probable challenges in the field of video-based outlier detection in automated surveillance and ways to mitigate those challenges. Moreover, this survey draws attention to the promising directions of research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Essence of Unsupervised Detection of Anomalous Motion in Surveillance Videos

A Review of Anomaly Detection Techniques Using Computer Vision

Anomaly detection using edge computing in video surveillance system: review

Article 29 March 2022

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

Aggarwal A, Rani A, Kumar M (2020) A robust method to authenticate car license plates using segmentation and ROI based approach. Smart and Sustainable Built Environment 9(4):737–747
Google Scholar
Amato A, Huerta I, Mozerov MG, Roca FX, Gonzàlez J (2014) Moving cast shadows detection methods for video surveillance applications. no. September 2012, pp. 23–47
Bajaj K, Singh DK, Ansari MA (2020) Autoencoders Based Deep Learner for Image Denoising. Procedia Computer Science 171(2019):1535–1541
Google Scholar
Braun M, Krebs S, Flohr F, Gavrila DM (2019) EuroCity persons: A novel benchmark for person detection in traffic scenes. IEEE Trans Pattern Anal Mach Intell 41:1844–1861
Google Scholar
Camplani M, Maddalena L, Moyá Alcover G, Petrosino A, Salgado L (2017) A benchmarking framework for background subtraction in RGBD videos. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10590 LNCS, pp. 219–229
Cheng KW, Chen YT, Fang WH (2015) Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 07-12-June, 2909–2917
Chen D, Yuan Z, Hua G, Zheng N, Wang J (2015) Similarity learning on an explicit polynomial kernel feature map for person re-identification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vol. 07-12-June, pp. 1565–1573
Choudhary C, Singh I (2022) Kumar M (2023) SARWAS: Deep ensemble learning techniques for sentiment based recommendation system. Expert Syst Appl 216(December):119420
Google Scholar
Cuthill IC, Matchette SR, Scott-Samuel NE (2019) Camouflage in a dynamic world. Curr Opin Behav Sci 30:109–115
Google Scholar
Deepak K, Chandrakala S, Mohan CK (2021) Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1):215–222
Google Scholar
Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence. Springer 9:85–112
Google Scholar
Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Google Scholar
Genovese M, Napoli E (2013) FPGA-based architecture for real time segmentation and denoising of HD video. J Real-Time Image Proc 8(4):389–401
Google Scholar
Haseeb M, Hancock ER (2012) Unsupervised clustering of human pose using spectral embedding. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 7626 LNCS, pp. 467–473
Havasi L, Szlávik Z, Szirányi T (2007) Detection of gait characteristics for scene registration in video surveillance system. IEEE Trans Image Process 16:503–510
Google Scholar
Huang Z, Zhu H, Zhou JT, Peng X (2018) Multiple Marginal Fisher Analysis. IEEE Trans Industr Electron 66(12):9798–9807
Google Scholar
Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Google Scholar
Kadu H, Kuo CC (2014) Automatic human mocap data classification. IEEE Trans Multimedia 16:2191–2202
Kavikuil K, Amudha J (2018) Leveraging deep learning for anomaly detection in video surveillance, vol. 815. Springer Singapore
Kumar M, Srivastava S (2019) Image forgery detection based on physics and pixels: A study. Aust J Forensic Sci 51(2):119–134
Google Scholar
Kumar M, Srivastava S, Uddin N (2019) Forgery detection using multiple light sources for synthetic images. Aust J Forensic Sci 51(3):243–250
Google Scholar
Kumar MK, Shrestha H, Dhasarathan C, Kumar M, Nidhya R, Shankar A (2022) A Deep Learning Based Convolution Neural Network-DCNN Approach to Detect Brain Tumor. No, February, Springer Nature Singapore
Google Scholar
Kumar M, Aggarwal J, Rani A, Stephan T, Shankar A, Mirjalili S (2022) Secure video communication using firefly optimization and visual cryptography. Artif Intell Rev 55(4):2997–3017
Google Scholar
Lee SW, Maik V, Jang JH, Shin J, Paik J (2005) Noise-adaptive spatio-temporal filter for real-time noise removal in low light level images. IEEE Trans Consum Electron 51(2):648–653
Google Scholar
Liu J, Xia Y, Tang Z (2021) Privacy-preserving video fall detection using visual shielding information. Visual Computer 37(2):359–370
Google Scholar
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection - a new baseline. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 6536–6545
Luo J, Zhao J, Wen B, Zhang Y (2021) Explaining the semantics capturing capability of scene graph generation models. Pattern Recogn 110:107427
Google Scholar
Maggioni M, Sánchez-Monge E, Foi A (2014) Joint removal of random and fixed-pattern noise through spatiotemporal video filtering. IEEE Trans Image Process 23(10):4282–4296
MathSciNet Google Scholar
Moreau T, Bruna J (2017) Understanding trainable sparse coding via matrix factorization. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–13
Narasimhan MG, Sowmya Kamath S (2018) Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed Tools Appl 77:13173–13195
Google Scholar
Nasaruddin N, Muchtar K, Afdhal A, Dwiyantoro APJ (2020) Deep anomaly detection through visual attention in surveillance videos. Journal of Big Data 7
Pang S, del Coz JJ, Yu Z, Luaces O, Díez J (2017) Deep learning to frame objects for visual target tracking. Eng Appl Artif Intell 65:406–420
Google Scholar
Peng X, Lu C, Yi Z, Tang H (2018) Connections between nuclear-norm and frobenius-norm-based representations. IEEE Transactions on Neural Networks and Learning Systems 29(1):218–224
MathSciNet Google Scholar
Peng X, Feng J, Xiao S, Yau WY, Zhou JT, Yang S (2018) Structured autoencoders for subspace clustering. IEEE Trans Image Process 27(10):5076–5086
MathSciNet Google Scholar
Pop DO, Rogozan A, Chatelain C, Nashashibi F, Bensrhair A (2019) Multi-task deep learning for pedestrian detection, action recognition and time to cross prediction. IEEE Access 7:149318–149327
Google Scholar
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognitiona review. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):865–878
Google Scholar
Pulla Rao C, Guruva Reddy A, Rama Rao CB (2020) Camouflaged object detection for machine vision applications. Int J Speech Technol 23(2):327–335
Google Scholar
Raheja S, Obaidat MS, Sadoun B, Malik S, Rani A, Kumar M, Stephan T (2021) Modeling and simulation of urban air quality with a 2-phase assessment technique. Simulation modelling practice and theory vol. 109, no. January, p. 102281
Rajeshdate A, Kiranshah S (2018) Camouflage Moving Object Detection: A Review. 2017 International Conference on Computing, Communication, Control and Automation, ICCUBEA 2017
Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-Cascade: Cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004
MathSciNet Google Scholar
Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172(February)88–97
Sedghi M, Geo M, Atia G (2020) A Multi-criteria Approach for Fast and Robust Representative Selection from Manifolds. IEEE Trans Knowl Data Eng 4347(c)1
Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. Eurasip Journal on Image and Video Processing 2017
Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv Neural Inf Proces Syst 2015-Janua, 802–810
Shijila B, Tom AJ, George SN (2019) Simultaneous denoising and moving object detection using low rank approximation. Futur Gener Comput Syst 90:198–210
Google Scholar
Singh YGS, Chintalacheruvu SCK, Garg S, Kumar M (2021) Efficient face identification and authentication tool for biometric attendance system. 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 2021, pp. 379–383
Singh SK, Dhawale CA, Misra S (2013) Survey of Object Detection Methods in Camouflaged Image. IERI Procedia 4:351–357
Google Scholar
Singh V, Singh S, Gupta P (2020) Real-time anomaly recognition through CCTV using neural networks. In Procedia Computer Science vol. 173, pp. 254–263, Elsevier B.V
Sri Preethaa KR, Sabari A (2020) Intelligent video analysis for enhanced pedestrian detection by hybrid metaheuristic approach. Soft Comput 24:12303–12311
Google Scholar
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 6479–6488
Sun J, Shao J, He C (2019) Abnormal event detection for video surveillance using deep one-class learning. Multimedia Tools and Applications 78(3):3633–3647
Google Scholar
Tian Y, Pang G, Chen Y, Singh R, Verjans JW (2021) Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. IEEE/CVF International Conference on Computer Vision 3:4955–4966
Google Scholar
Tran D, Yuan J, Forsyth D (2014) Video event detection: From subvolume localization to spatiotemporal path search. IEEE Trans Pattern Anal Mach Intell 36:404–416
Google Scholar
Walia GS, Kapoor R (2016) Robust object tracking based upon adaptive multi-cue integration for video surveillance. Multimedia Tools and Applications 75(23):15821–15847
Google Scholar
Wang Z, Ling Q, Huang TS (2016) Learning deep l0 encoders. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (1)2194–2200
Wang Q, Ma J, Yu S, Tan L (2020) Noise detection and image denoising based on fractional calculus. Chaos, Solitons and Fractals vol. 131, no. xxxx
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127
Google Scholar
Xu K, Sun T, Jiang X (2020) Video anomaly detection and localization based on an adaptive intra-frame classification network. IEEE Trans Multimedia 22(2):394–406
Google Scholar
Yadav DK, Singh K, Kumari S (2017) Challenging issues of video surveillance system using internet of things in cloud environment. Communications in Computer and Information Science 721:471–481
Google Scholar
Yeh CH, Lin CY, Muchtar K, Lai HE, Sun MT (2017) Three-Pronged Compensation and Hysteresis Thresholding for Moving Object Detection in Real-Time Video Surveillance. IEEE Trans Industr Electron 64(6):4945–4955
Google Scholar
Yuan Y, Ma D, Wang Q (2015) Hyperspectral anomaly detection by graph pixel selection. IEEE Transactions on Cybernetics 46(10):3123–3134
Google Scholar
Zhang X, Wu H, Wu M, Wu C (2020) Extended Motion Diffusion-Based Change Detection for Airport Ground Surveillance. IEEE Trans Image Process 29:5677–5686
Google Scholar
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 3313–3320
Zheng Y, Zhang X, Wang F, Cao T, Sun M, Wang X (2019) Detection of people with camouflage pattern via dense deconvolution network. IEEE Signal Process Lett 26(1):29–33
Google Scholar
Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 1237–1246
Zhong F, Li M, Zhang K, Hu J, Liu L (2021) DSPNet: A low computational-cost network for human pose estimation. Neurocomputing 423:327–335
Google Scholar
Zhou JT, Di K, Du J, Peng X, Yang H, Pan SJ, Tsang IW, Liu Y, Qin Z, Goh RSM (2018) Sc2Net: Sparse LSTMs for sparse coding. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 4588–4595
Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) AnomalyNet: An anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IIIT Nagpur, Nagpur, India
Ankita Umale
Department of Computer Science and Engineering, VNIT Nagpur, Nagpur, India
Nidhi Lal
Department of Basic Sciences, IIIT Nagpur, Nagpur, India
Charu Goel

Authors

Ankita Umale
View author publications
You can also search for this author in PubMed Google Scholar
Nidhi Lal
View author publications
You can also search for this author in PubMed Google Scholar
Charu Goel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankita Umale.

Ethics declarations

Conflicts of Interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

This section contains supplementary material with a significant outlook towards some of the mathematical work relevant to the research scope of this survey.

1.1 A. SC2Net

According to [67] Sparse coding (SC) has demonstrated effectiveness in uncovering semantic information from noisy and high dimensional data. The given method develops a novel variant of l1-solver by introducing the adaptive momentum vectors into ISTA to enable per parameter updates and encapsulate the historical information in optimization. Given a data matrix $\textbf{X}=\left[ \textbf{x}_{1}, \textbf{x}_{2}, \cdots , \textbf{x}_{n}\right] \in \mathbb {R}^{d_{x} \times n}$, sparse coding aims to learn a dictionary $\textbf{B}=\left[ \textbf{b}_{1}, \textbf{b}_{2}, \cdots , \textbf{b}_{n}\right] \in \mathbb {R}^{d_{x} \times \bar{d}_{s}}$ that is used to generate sparse codes $\left[ \textbf{s}_{1}, \textbf{s}_{2}, \cdots , \textbf{s}_{n}\right] \in \mathbb {R}^{d_{s} \times n}$ for the input data X. The optimization problem can be formulated as follows:

$$\begin{aligned} \begin{aligned} \min _{\textbf{S}, \textbf{B}}&\sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{2}^{2} \\ \text{ s.t. }&\left\| \textbf{s}_{i}\right\| _{0} \le k, \text{ and } \left\| \textbf{b}_{j}\right\| ^{2} \le 1, j=1, \cdots d_{s} \end{aligned} \end{aligned}$$

(10)

The above optimization is hard to solve due to the non-convexity of the $\ell _{0}$ norm. Therefore, it is often relaxed to the following problem with the $\ell _{1}$ norm,

$$\begin{aligned} \begin{aligned} \min _{\textbf{S}, \textbf{B}}&\sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{2}^{2}+\lambda \left\| \textbf{s}_{i}\right\| _{1}^{2} \\ \text{ s.t. }&\left\| \textbf{b}_{j}\right\| ^{2} \le 1, \text{ and } j=1, \cdots d_{s}. \end{aligned} \end{aligned}$$

(11)

To solve (11), a conventional way is to alternatingly optimize B and S, which correspond to two optimization procedures: dictionary learning and sparse approximation. Specifically, by fixing S, (11) reduces to the following $\ell _{2}$ constrained optimization problem,

$$\begin{aligned} \begin{array}{cl} \min _{\textbf{B}} &{} \Vert \textbf{X}-\textbf{B S}\Vert _{F}^{2} \\ \text{ s.t. } &{} \left\| \textbf{b}_{i}\right\| ^{2} \le 1, \text{ and } i=1, \cdots d_{s}. \end{array} \end{aligned}$$

(12)

The above problem is the well-known ridge regression problem, which has a closed-form solution. By fixing B, (12) reduces to the sparse approximation problem which aims to represent the input x by a linear combination of B as follows,

$$\begin{aligned} \min _{\textbf{s}} \sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{F}^{2}+\lambda \left\| \textbf{s}_{i}\right\| _{1} \end{aligned}$$

(13)

The updating formula can be mathematically expressed as

$$\begin{aligned} \textbf{s}^{(t)}=s h_{(\lambda \tau )}\left( \textbf{s}^{(t-1)}-\tau \nabla g\left( \textbf{s}^{(t-1)}\right) \right) \end{aligned}$$

(14)

where the shrinkage function is defined as ${\text {sh}}_{(\lambda \tau )}(\textbf{s})=$ ${\text {sign}}(\textbf{s})\left( |\textbf{s}|-\lambda _{\tau }\right) _{+}$. The solution of (14) can be achieved through the following update rule,

$$\begin{aligned} \begin{aligned} \textbf{s}^{(t)}&=s h_{(\lambda \tau )}\left( \textbf{s}^{(t-1)}-\tau \left( \textbf{B}^{T}\left( \textbf{B} \textbf{s}^{(t-1)}-\textbf{X}\right) \right) \right) \\ {}&=s h_{(\lambda \tau )}\left( \textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x}\right) \end{aligned} \end{aligned}$$

(15)

where, $\textbf{W}_{e}=\textbf{I}-\tau \textbf{B}^{T} \textbf{B}$ and $\textbf{W}_{d}=\tau \textbf{B}^{T}$.

Note that (15) could be treated as a simple RNN. In other words, the $\ell _{1}$-oriented optimization algorithm can be reformulated as a model-based optimization algorithm.

1.2 B. AnomalyNet

AnomalyNet [68] is an optimization network which simultaneously achieves sparse representation and dictionary learning using a novel LSTM network (termed SC2Net). Several algorithms have been proposed to optimize neural networks by incorporating the “momentum” into the dynamics of stochastic gradient decent (SGD). These methods have shown promising performance in improving the robustness and convergence speed of SGD. Borrowing the high-level idea of these optimization methods, adaptive momentum vectors i(t), f(t) are introduced to ISTA at the time step t as follows,

$$\begin{aligned} \left. \begin{array}{r} \begin{array}{l} \tilde{\textbf{c}}^{(t)}=\textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x} \\ \textbf{c}^{(t)}=\textbf{f}^{(t)} \odot \textbf{c}^{(t-1)}+\textbf{i}^{(t)} \odot \tilde{\textbf{c}}^{(t)} \\ \textbf{s}^{(t)}={\text {sh}}_{(\lambda \tau )}\left( \textbf{c}^{(t)}\right) \end{array} \end{array}\right\} \end{aligned}$$

(16)

where $\odot $ is the element-wise product of the vectors. Here, $ \tilde{\textbf{c}}^{(t)}$, $ \textbf{c}^{(t)} $, and $ \textbf{c}^{(t-1)} $ denotes the linear combination at current, previous and combined iteration levels respectively. In adaptive ISTA, c(t) accumulates all the historical information with different weightssimilar to the diagonal matrix containing the sum of the squares of the past gradients. In the equations, i, f and s indicate the input gate , forget gate and output gates respectively. Following the above notations, the updating rule in ISTA can be equivalently expressed to $ \textbf{s}^{(t)}={\text {sh}}_{(\lambda \tau )}\left( \tilde{\textbf{c}}^{(t)}\right) $. SLSTM does not have “output gate” like the vanilla LSTM. The SLSTM unit is achieved by rewriting above equation as follows:

$$\begin{aligned} \left. \begin{array}{r} \begin{array}{l} \textbf{i}^{(t)}=\sigma \left( \textbf{W}_{i s} \textbf{s}^{(t-1)}+\textbf{W}_{i x} \textbf{x}\right) \\ \textbf{f}^{(t)}=\sigma \left( \textbf{W}_{f s} \textbf{s}^{(t-1)}+\textbf{W}_{f x} \textbf{x}\right) \\ \tilde{\textbf{c}}^{(t)}=\textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x}, \\ \textbf{c}^{(t)}=\textbf{f}^{(t)} \odot \textbf{c}^{(t-1)}+\textbf{i}^{(t)} \odot \tilde{\textbf{c}}^{(t)} \\ \textbf{s}^{(t)}=h_{(\textbf{D}, \textbf{u})}\left( \textbf{c}^{(t)}\right) \end{array} \end{array}\right\} \end{aligned}$$

(17)

where W denotes the weight matrix (e.g. $ \textbf{W}_{i s}$ is the weight matrix from the input gate to the outputs), $\sigma (\textbf{x})=\frac{1}{1+e^{-\textbf{x}}}$, $h_{(\textbf{D}, \textbf{u})}=\textbf{D}(\tanh (\textbf{x}+\textbf{u})+\tanh (\textbf{x}-\textbf{u}))$ where, $\textbf{u}$ and $\textbf{D}$ denote a trainable vector and diagonal matrix, respectively. It uses smooth and differentiable nonlinear activation function named “Double tanh” instead of the shrinkage function to address the vanishing gradient problem.

1.3 C. MOCAP

In Automatic Human Motion Capture i.e MOCAP [18] all training and testing motion sequences are converted to their motion strings. A test motion string is compared with all training motion strings individually, using the suffix array technique. The i th test string has f codeword indices in its motion string while the j th training string has g codeword indices. The codeword indices in these two strings are identical and the largest common sequence is of length l. For the i th test string, the following two metrics are calculated with respect to a given motion category k.

1.
Max-parameter: the average of MLMs of all training motions in category k;
2.
Sim-parameter: the average of the similarity product of all training motions in category k.

For the k th category, we have

$$\begin{aligned} M A X_{i}^{k}&=\frac{\sum _{j \in k} M L M_{i, j}}{\sum _{j \in k} \underline{1}(j \in k)} \\ S I M_{i}^{k}&=\frac{\sum _{j \in k} S R P_{i, j}}{\sum _{j \in k} \underline{1}(j \in k)} \end{aligned}$$

where $\underline{1}(\cdot )$ is the indicator function.

Alternatively, for each test motion, all categories are pitted against each other, one-on-one, and the category with the higher value for the parameter is considered. All category-specific votes are aggregated together and, then, normalized to get the soft scores. That is,

$$\begin{aligned} \begin{aligned} {\text {Vote}}_{i, M A X}^{k}&=\sum _{\forall m \ne k} \underline{1}\left( M A X_{i}^{k}>M A X_{i}^{m}\right) \\ {\text {Vote}}_{i, S I M}^{k}&=\sum _{\forall m \ne k} \underline{1}\left( S I M_{i}^{k}>S I M_{i}^{m}\right) \end{aligned} \end{aligned}$$

(18)

The full body level-n codebooks that offer the best classification results are short-listed for fusion to yield the final decision.

1.4 D. Camouflaged object detection

In [37], a constructive approach is presented by characterizing entity-texture and statistical modeling of Camouflaged Images in texture smoothing conditions. Camouflage image is a combination of Camouflaged object image which is a target and Camouflage pattern image in general background. The Camouflaged object image intensities are modified by a predefined threshold value and its binary representation which is treated as a new target image. The correlation peaks $\alpha ^{\wedge 2}$ and $\beta ^{\wedge 2}$ between Scene and Normalized basis Images are evaluated to detect the target in the camouflaged image S. When the Camouflaged image is a linear combination of basis images ($\textrm{f}_{\textrm{o}}^{\wedge }\textrm{x}$ and $\mathrm {f*}^{\wedge }\textrm{x}$) the normalization of basis images with its autocorrelation is $\textrm{f}_{\textrm{o}}^{\wedge }\textrm{x}$ and $\mathrm {f*}^{\wedge }\textrm{x}$. The Correlation peaks from normalized basis images are.

$$\begin{aligned} \begin{array}{c} \alpha ^{\wedge 2}=\left[ \textrm{S}(\textrm{x}) * \textrm{f}_{\textrm{o}}^{\wedge }(\textrm{x})\right] ^{2} \text{ and } \\ \beta ^{\wedge 2}=\left[ \textrm{S}(\textrm{x}) * \mathrm f * ^{\wedge }(\textrm{x})\right] ^{2} \end{array} \end{aligned}$$

(19)

These correlations peaks obtained from the camouflage image with basis images of the target and its binary image are considered as filter coefficients to identify the target in the camouflaged image. The filter equation is given with correlation peaks is:

$$\begin{aligned} C(x)=\frac{\left[ S(x) * f_{0}(x)\right] ^{2}}{[S(x) * {f_*} (x)]^{2}-\sqrt{N}\left[ S^{2}(x) * {f_*} (x)\right] } \end{aligned}$$

(20)

If the degree of Camouflage is high enough in the case of single or multiple Camouflaged objects the modeling of the target will not be able to detect the object due to intensity variation and random textures. There is the need for some texture (regular or irregular) models to detect the Camouflaged objects. The cluster shade (CS) and cluster prominence (CP) gives the characteristic summary of GraylevelCM (GLCM). For object detection the following set of statistics are found to be suitable and their use is found to be satisfactory in both the cases of with and without texture smoothing illustration. The feature considered completely depends on normalized GLCM at different offsets.

$$\begin{aligned} \left. \begin{array}{r} \text{ GraylevelCM } =G C(i, j)=\frac{C(i, j)}{\sum _{i, j=1}^{M-1} C(i, j)} \\ \text{ Contrast } =\sum _{i, j=1}^{M}(i-j)^{2} G C(i, j) \\ \text{ CS } =\sum _{i, j=1}^{M}\left( i-D_{x}+j-D_{y}\right) ^{3} G C(i, j) \\ \qquad \begin{array}{r} \text{ CP }=\sum _{i, j=1}^{M}\left( i-D_{x}+j-D_{y}\right) ^{4} G C(i, j) \\ \end{array} \end{array}\right\} \end{aligned}$$

(21)

where,$\begin{array}{c} D_{x}=\sum _{i, j=1}^{M} i G C(i, j) and D_{y}=\sum _{i, j=1}^{M} j V(i, j). \end{array}$ Target extracted in this approach completely depends on the selection of seed block in image i.e., in texture terminology primitive element extraction. This can be done by the normalized GLCM based characteristics summation like contrast, cluster shade and cluster prominence.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Umale, A., Lal, N. & Goel, C. Vision-based outlier detection techniques in automated surveillance: a survey and future ideas. Multimed Tools Appl 83, 14565–14607 (2024). https://doi.org/10.1007/s11042-023-15911-y

Download citation

Received: 10 January 2023
Revised: 22 April 2023
Accepted: 22 May 2023
Published: 01 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15911-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vision-based outlier detection techniques in automated surveillance: a survey and future ideas

Abstract

Access this article

Similar content being viewed by others

On the Essence of Unsupervised Detection of Anomalous Motion in Surveillance Videos

A Review of Anomaly Detection Techniques Using Computer Vision

Anomaly detection using edge computing in video surveillance system: review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix

1.1 A. SC2Net

1.2 B. AnomalyNet

1.3 C. MOCAP

1.4 D. Camouflaged object detection

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vision-based outlier detection techniques in automated surveillance: a survey and future ideas

Abstract

Access this article

Similar content being viewed by others

On the Essence of Unsupervised Detection of Anomalous Motion in Surveillance Videos

A Review of Anomaly Detection Techniques Using Computer Vision

Anomaly detection using edge computing in video surveillance system: review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 A. SC2Net

1.2 B. AnomalyNet

1.3 C. MOCAP

1.4 D. Camouflaged object detection

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation