Skip to main content
Log in

Vision-based outlier detection techniques in automated surveillance: a survey and future ideas

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Outlier detection is one of the emerging study topics influenced by video annotation. An outlier is anything odd or irregular that deviates from the norm. Outlier detection is subjective since it is influenced by a variety of contextual circumstances. Human mobility patterns and classification are explored in this survey to combine it for further anomaly detection. There is a possibility of noise interference in automated surveillance, which disturbs a video stream and makes it difficult to identify details, reducing its accuracy. The presence of such frequent noise-interference raises the error rate, particularly in real-time processing models. Aside from that, Motion Camouflage must be addressed in intelligent surveillance to obtain a clear view of the frame. This paper offers a focused survey, highlighting the three major issues in video anomaly detection: the inadequate utilization of Motion Patterns, the prolonged Noise Interference that affects accuracy and leads to an increased error rate, and the elevated false-alarm rate due to Motion Camouflage. A brief analysis of these techniques, as well as their limitations, is provided. This paper focuses on probable challenges in the field of video-based outlier detection in automated surveillance and ways to mitigate those challenges. Moreover, this survey draws attention to the promising directions of research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Aggarwal A, Rani A, Kumar M (2020) A robust method to authenticate car license plates using segmentation and ROI based approach. Smart and Sustainable Built Environment 9(4):737–747

    Google Scholar 

  2. Amato A, Huerta I, Mozerov MG, Roca FX, Gonzàlez J (2014) Moving cast shadows detection methods for video surveillance applications. no. September 2012, pp. 23–47

  3. Bajaj K, Singh DK, Ansari MA (2020) Autoencoders Based Deep Learner for Image Denoising. Procedia Computer Science 171(2019):1535–1541

    Google Scholar 

  4. Braun M, Krebs S, Flohr F, Gavrila DM (2019) EuroCity persons: A novel benchmark for person detection in traffic scenes. IEEE Trans Pattern Anal Mach Intell 41:1844–1861

    Google Scholar 

  5. Camplani M, Maddalena L, Moyá Alcover G, Petrosino A, Salgado L (2017) A benchmarking framework for background subtraction in RGBD videos. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10590 LNCS, pp. 219–229

  6. Cheng KW, Chen YT, Fang WH (2015) Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 07-12-June, 2909–2917

  7. Chen D, Yuan Z, Hua G, Zheng N, Wang J (2015) Similarity learning on an explicit polynomial kernel feature map for person re-identification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vol. 07-12-June, pp. 1565–1573

  8. Choudhary C, Singh I (2022) Kumar M (2023) SARWAS: Deep ensemble learning techniques for sentiment based recommendation system. Expert Syst Appl 216(December):119420

    Google Scholar 

  9. Cuthill IC, Matchette SR, Scott-Samuel NE (2019) Camouflage in a dynamic world. Curr Opin Behav Sci 30:109–115

    Google Scholar 

  10. Deepak K, Chandrakala S, Mohan CK (2021) Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1):215–222

    Google Scholar 

  11. Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence. Springer 9:85–112

    Google Scholar 

  12. Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Google Scholar 

  13. Genovese M, Napoli E (2013) FPGA-based architecture for real time segmentation and denoising of HD video. J Real-Time Image Proc 8(4):389–401

    Google Scholar 

  14. Haseeb M, Hancock ER (2012) Unsupervised clustering of human pose using spectral embedding. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 7626 LNCS, pp. 467–473

  15. Havasi L, Szlávik Z, Szirányi T (2007) Detection of gait characteristics for scene registration in video surveillance system. IEEE Trans Image Process 16:503–510

    Google Scholar 

  16. Huang Z, Zhu H, Zhou JT, Peng X (2018) Multiple Marginal Fisher Analysis. IEEE Trans Industr Electron 66(12):9798–9807

    Google Scholar 

  17. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Google Scholar 

  18. Kadu H, Kuo CC (2014) Automatic human mocap data classification. IEEE Trans Multimedia 16:2191–2202

  19. Kavikuil K, Amudha J (2018) Leveraging deep learning for anomaly detection in video surveillance, vol. 815. Springer Singapore

  20. Kumar M, Srivastava S (2019) Image forgery detection based on physics and pixels: A study. Aust J Forensic Sci 51(2):119–134

    Google Scholar 

  21. Kumar M, Srivastava S, Uddin N (2019) Forgery detection using multiple light sources for synthetic images. Aust J Forensic Sci 51(3):243–250

    Google Scholar 

  22. Kumar MK, Shrestha H, Dhasarathan C, Kumar M, Nidhya R, Shankar A (2022) A Deep Learning Based Convolution Neural Network-DCNN Approach to Detect Brain Tumor. No, February, Springer Nature Singapore

    Google Scholar 

  23. Kumar M, Aggarwal J, Rani A, Stephan T, Shankar A, Mirjalili S (2022) Secure video communication using firefly optimization and visual cryptography. Artif Intell Rev 55(4):2997–3017

    Google Scholar 

  24. Lee SW, Maik V, Jang JH, Shin J, Paik J (2005) Noise-adaptive spatio-temporal filter for real-time noise removal in low light level images. IEEE Trans Consum Electron 51(2):648–653

    Google Scholar 

  25. Liu J, Xia Y, Tang Z (2021) Privacy-preserving video fall detection using visual shielding information. Visual Computer 37(2):359–370

    Google Scholar 

  26. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection - a new baseline. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 6536–6545

  27. Luo J, Zhao J, Wen B, Zhang Y (2021) Explaining the semantics capturing capability of scene graph generation models. Pattern Recogn 110:107427

    Google Scholar 

  28. Maggioni M, Sánchez-Monge E, Foi A (2014) Joint removal of random and fixed-pattern noise through spatiotemporal video filtering. IEEE Trans Image Process 23(10):4282–4296

    MathSciNet  Google Scholar 

  29. Moreau T, Bruna J (2017) Understanding trainable sparse coding via matrix factorization. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–13

  30. Narasimhan MG, Sowmya Kamath S (2018) Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed Tools Appl 77:13173–13195

    Google Scholar 

  31. Nasaruddin N, Muchtar K, Afdhal A, Dwiyantoro APJ (2020) Deep anomaly detection through visual attention in surveillance videos. Journal of Big Data 7

  32. Pang S, del Coz JJ, Yu Z, Luaces O, Díez J (2017) Deep learning to frame objects for visual target tracking. Eng Appl Artif Intell 65:406–420

    Google Scholar 

  33. Peng X, Lu C, Yi Z, Tang H (2018) Connections between nuclear-norm and frobenius-norm-based representations. IEEE Transactions on Neural Networks and Learning Systems 29(1):218–224

    MathSciNet  Google Scholar 

  34. Peng X, Feng J, Xiao S, Yau WY, Zhou JT, Yang S (2018) Structured autoencoders for subspace clustering. IEEE Trans Image Process 27(10):5076–5086

    MathSciNet  Google Scholar 

  35. Pop DO, Rogozan A, Chatelain C, Nashashibi F, Bensrhair A (2019) Multi-task deep learning for pedestrian detection, action recognition and time to cross prediction. IEEE Access 7:149318–149327

    Google Scholar 

  36. Popoola OP, Wang K (2012) Video-based abnormal human behavior recognitiona review. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):865–878

    Google Scholar 

  37. Pulla Rao C, Guruva Reddy A, Rama Rao CB (2020) Camouflaged object detection for machine vision applications. Int J Speech Technol 23(2):327–335

    Google Scholar 

  38. Raheja S, Obaidat MS, Sadoun B, Malik S, Rani A, Kumar M, Stephan T (2021) Modeling and simulation of urban air quality with a 2-phase assessment technique. Simulation modelling practice and theory vol. 109, no. January, p. 102281

  39. Rajeshdate A, Kiranshah S (2018) Camouflage Moving Object Detection: A Review. 2017 International Conference on Computing, Communication, Control and Automation, ICCUBEA 2017

  40. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-Cascade: Cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004

    MathSciNet  Google Scholar 

  41. Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172(February)88–97

  42. Sedghi M, Geo M, Atia G (2020) A Multi-criteria Approach for Fast and Robust Representative Selection from Manifolds. IEEE Trans Knowl Data Eng 4347(c)1

  43. Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. Eurasip Journal on Image and Video Processing 2017

  44. Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv Neural Inf Proces Syst 2015-Janua, 802–810

  45. Shijila B, Tom AJ, George SN (2019) Simultaneous denoising and moving object detection using low rank approximation. Futur Gener Comput Syst 90:198–210

    Google Scholar 

  46. Singh YGS, Chintalacheruvu SCK, Garg S, Kumar M (2021) Efficient face identification and authentication tool for biometric attendance system. 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 2021, pp. 379–383

  47. Singh SK, Dhawale CA, Misra S (2013) Survey of Object Detection Methods in Camouflaged Image. IERI Procedia 4:351–357

    Google Scholar 

  48. Singh V, Singh S, Gupta P (2020) Real-time anomaly recognition through CCTV using neural networks. In Procedia Computer Science vol. 173, pp. 254–263, Elsevier B.V

  49. Sri Preethaa KR, Sabari A (2020) Intelligent video analysis for enhanced pedestrian detection by hybrid metaheuristic approach. Soft Comput 24:12303–12311

    Google Scholar 

  50. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 6479–6488

  51. Sun J, Shao J, He C (2019) Abnormal event detection for video surveillance using deep one-class learning. Multimedia Tools and Applications 78(3):3633–3647

    Google Scholar 

  52. Tian Y, Pang G, Chen Y, Singh R, Verjans JW (2021) Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. IEEE/CVF International Conference on Computer Vision 3:4955–4966

    Google Scholar 

  53. Tran D, Yuan J, Forsyth D (2014) Video event detection: From subvolume localization to spatiotemporal path search. IEEE Trans Pattern Anal Mach Intell 36:404–416

    Google Scholar 

  54. Walia GS, Kapoor R (2016) Robust object tracking based upon adaptive multi-cue integration for video surveillance. Multimedia Tools and Applications 75(23):15821–15847

    Google Scholar 

  55. Wang Z, Ling Q, Huang TS (2016) Learning deep l0 encoders. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (1)2194–2200

  56. Wang Q, Ma J, Yu S, Tan L (2020) Noise detection and image denoising based on fractional calculus. Chaos, Solitons and Fractals vol. 131, no. xxxx

  57. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127

    Google Scholar 

  58. Xu K, Sun T, Jiang X (2020) Video anomaly detection and localization based on an adaptive intra-frame classification network. IEEE Trans Multimedia 22(2):394–406

    Google Scholar 

  59. Yadav DK, Singh K, Kumari S (2017) Challenging issues of video surveillance system using internet of things in cloud environment. Communications in Computer and Information Science 721:471–481

    Google Scholar 

  60. Yeh CH, Lin CY, Muchtar K, Lai HE, Sun MT (2017) Three-Pronged Compensation and Hysteresis Thresholding for Moving Object Detection in Real-Time Video Surveillance. IEEE Trans Industr Electron 64(6):4945–4955

    Google Scholar 

  61. Yuan Y, Ma D, Wang Q (2015) Hyperspectral anomaly detection by graph pixel selection. IEEE Transactions on Cybernetics 46(10):3123–3134

    Google Scholar 

  62. Zhang X, Wu H, Wu M, Wu C (2020) Extended Motion Diffusion-Based Change Detection for Airport Ground Surveillance. IEEE Trans Image Process 29:5677–5686

    Google Scholar 

  63. Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 3313–3320

  64. Zheng Y, Zhang X, Wang F, Cao T, Sun M, Wang X (2019) Detection of people with camouflage pattern via dense deconvolution network. IEEE Signal Process Lett 26(1):29–33

    Google Scholar 

  65. Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 1237–1246

  66. Zhong F, Li M, Zhang K, Hu J, Liu L (2021) DSPNet: A low computational-cost network for human pose estimation. Neurocomputing 423:327–335

    Google Scholar 

  67. Zhou JT, Di K, Du J, Peng X, Yang H, Pan SJ, Tsang IW, Liu Y, Qin Z, Goh RSM (2018) Sc2Net: Sparse LSTMs for sparse coding. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 4588–4595

  68. Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) AnomalyNet: An anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankita Umale.

Ethics declarations

Conflicts of Interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This section contains supplementary material with a significant outlook towards some of the mathematical work relevant to the research scope of this survey.

1.1 A. SC2Net

According to [67] Sparse coding (SC) has demonstrated effectiveness in uncovering semantic information from noisy and high dimensional data. The given method develops a novel variant of l1-solver by introducing the adaptive momentum vectors into ISTA to enable per parameter updates and encapsulate the historical information in optimization. Given a data matrix \(\textbf{X}=\left[ \textbf{x}_{1}, \textbf{x}_{2}, \cdots , \textbf{x}_{n}\right] \in \mathbb {R}^{d_{x} \times n}\), sparse coding aims to learn a dictionary \(\textbf{B}=\left[ \textbf{b}_{1}, \textbf{b}_{2}, \cdots , \textbf{b}_{n}\right] \in \mathbb {R}^{d_{x} \times \bar{d}_{s}}\) that is used to generate sparse codes \(\left[ \textbf{s}_{1}, \textbf{s}_{2}, \cdots , \textbf{s}_{n}\right] \in \mathbb {R}^{d_{s} \times n}\) for the input data X. The optimization problem can be formulated as follows:

$$\begin{aligned} \begin{aligned} \min _{\textbf{S}, \textbf{B}}&\sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{2}^{2} \\ \text{ s.t. }&\left\| \textbf{s}_{i}\right\| _{0} \le k, \text{ and } \left\| \textbf{b}_{j}\right\| ^{2} \le 1, j=1, \cdots d_{s} \end{aligned} \end{aligned}$$
(10)

The above optimization is hard to solve due to the non-convexity of the \(\ell _{0}\) norm. Therefore, it is often relaxed to the following problem with the \(\ell _{1}\) norm,

$$\begin{aligned} \begin{aligned} \min _{\textbf{S}, \textbf{B}}&\sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{2}^{2}+\lambda \left\| \textbf{s}_{i}\right\| _{1}^{2} \\ \text{ s.t. }&\left\| \textbf{b}_{j}\right\| ^{2} \le 1, \text{ and } j=1, \cdots d_{s}. \end{aligned} \end{aligned}$$
(11)

To solve (11), a conventional way is to alternatingly optimize B and S, which correspond to two optimization procedures: dictionary learning and sparse approximation. Specifically, by fixing S, (11) reduces to the following \(\ell _{2}\) constrained optimization problem,

$$\begin{aligned} \begin{array}{cl} \min _{\textbf{B}} &{} \Vert \textbf{X}-\textbf{B S}\Vert _{F}^{2} \\ \text{ s.t. } &{} \left\| \textbf{b}_{i}\right\| ^{2} \le 1, \text{ and } i=1, \cdots d_{s}. \end{array} \end{aligned}$$
(12)

The above problem is the well-known ridge regression problem, which has a closed-form solution. By fixing B, (12) reduces to the sparse approximation problem which aims to represent the input x by a linear combination of B as follows,

$$\begin{aligned} \min _{\textbf{s}} \sum _{i}\left\| \textbf{x}_{i}-\textbf{B} \textbf{s}_{i}\right\| _{F}^{2}+\lambda \left\| \textbf{s}_{i}\right\| _{1} \end{aligned}$$
(13)

The updating formula can be mathematically expressed as

$$\begin{aligned} \textbf{s}^{(t)}=s h_{(\lambda \tau )}\left( \textbf{s}^{(t-1)}-\tau \nabla g\left( \textbf{s}^{(t-1)}\right) \right) \end{aligned}$$
(14)

where the shrinkage function is defined as \({\text {sh}}_{(\lambda \tau )}(\textbf{s})=\) \({\text {sign}}(\textbf{s})\left( |\textbf{s}|-\lambda _{\tau }\right) _{+}\). The solution of (14) can be achieved through the following update rule,

$$\begin{aligned} \begin{aligned} \textbf{s}^{(t)}&=s h_{(\lambda \tau )}\left( \textbf{s}^{(t-1)}-\tau \left( \textbf{B}^{T}\left( \textbf{B} \textbf{s}^{(t-1)}-\textbf{X}\right) \right) \right) \\ {}&=s h_{(\lambda \tau )}\left( \textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x}\right) \end{aligned} \end{aligned}$$
(15)

where, \(\textbf{W}_{e}=\textbf{I}-\tau \textbf{B}^{T} \textbf{B}\) and \(\textbf{W}_{d}=\tau \textbf{B}^{T}\).

Note that (15) could be treated as a simple RNN. In other words, the \(\ell _{1}\)-oriented optimization algorithm can be reformulated as a model-based optimization algorithm.

1.2 B. AnomalyNet

AnomalyNet [68] is an optimization network which simultaneously achieves sparse representation and dictionary learning using a novel LSTM network (termed SC2Net). Several algorithms have been proposed to optimize neural networks by incorporating the “momentum” into the dynamics of stochastic gradient decent (SGD). These methods have shown promising performance in improving the robustness and convergence speed of SGD. Borrowing the high-level idea of these optimization methods, adaptive momentum vectors i(t), f(t) are introduced to ISTA at the time step t as follows,

$$\begin{aligned} \left. \begin{array}{r} \begin{array}{l} \tilde{\textbf{c}}^{(t)}=\textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x} \\ \textbf{c}^{(t)}=\textbf{f}^{(t)} \odot \textbf{c}^{(t-1)}+\textbf{i}^{(t)} \odot \tilde{\textbf{c}}^{(t)} \\ \textbf{s}^{(t)}={\text {sh}}_{(\lambda \tau )}\left( \textbf{c}^{(t)}\right) \end{array} \end{array}\right\} \end{aligned}$$
(16)

where \(\odot \) is the element-wise product of the vectors. Here, \( \tilde{\textbf{c}}^{(t)}\), \( \textbf{c}^{(t)} \), and \( \textbf{c}^{(t-1)} \) denotes the linear combination at current, previous and combined iteration levels respectively. In adaptive ISTA, c(t) accumulates all the historical information with different weightssimilar to the diagonal matrix containing the sum of the squares of the past gradients. In the equations, i, f and s indicate the input gate , forget gate and output gates respectively. Following the above notations, the updating rule in ISTA can be equivalently expressed to \( \textbf{s}^{(t)}={\text {sh}}_{(\lambda \tau )}\left( \tilde{\textbf{c}}^{(t)}\right) \). SLSTM does not have “output gate” like the vanilla LSTM. The SLSTM unit is achieved by rewriting above equation as follows:

$$\begin{aligned} \left. \begin{array}{r} \begin{array}{l} \textbf{i}^{(t)}=\sigma \left( \textbf{W}_{i s} \textbf{s}^{(t-1)}+\textbf{W}_{i x} \textbf{x}\right) \\ \textbf{f}^{(t)}=\sigma \left( \textbf{W}_{f s} \textbf{s}^{(t-1)}+\textbf{W}_{f x} \textbf{x}\right) \\ \tilde{\textbf{c}}^{(t)}=\textbf{W}_{e} \textbf{s}^{(t-1)}+\textbf{W}_{d} \textbf{x}, \\ \textbf{c}^{(t)}=\textbf{f}^{(t)} \odot \textbf{c}^{(t-1)}+\textbf{i}^{(t)} \odot \tilde{\textbf{c}}^{(t)} \\ \textbf{s}^{(t)}=h_{(\textbf{D}, \textbf{u})}\left( \textbf{c}^{(t)}\right) \end{array} \end{array}\right\} \end{aligned}$$
(17)

where W denotes the weight matrix (e.g. \( \textbf{W}_{i s}\) is the weight matrix from the input gate to the outputs), \(\sigma (\textbf{x})=\frac{1}{1+e^{-\textbf{x}}}\), \(h_{(\textbf{D}, \textbf{u})}=\textbf{D}(\tanh (\textbf{x}+\textbf{u})+\tanh (\textbf{x}-\textbf{u}))\) where, \(\textbf{u}\) and \(\textbf{D}\) denote a trainable vector and diagonal matrix, respectively. It uses smooth and differentiable nonlinear activation function named “Double tanh” instead of the shrinkage function to address the vanishing gradient problem.

1.3 C. MOCAP

In Automatic Human Motion Capture i.e MOCAP [18] all training and testing motion sequences are converted to their motion strings. A test motion string is compared with all training motion strings individually, using the suffix array technique. The i th test string has f codeword indices in its motion string while the j th training string has g codeword indices. The codeword indices in these two strings are identical and the largest common sequence is of length l. For the i th test string, the following two metrics are calculated with respect to a given motion category k.

  1. 1.

    Max-parameter: the average of MLMs of all training motions in category k;

  2. 2.

    Sim-parameter: the average of the similarity product of all training motions in category k.

For the k th category, we have

$$\begin{aligned} M A X_{i}^{k}&=\frac{\sum _{j \in k} M L M_{i, j}}{\sum _{j \in k} \underline{1}(j \in k)} \\ S I M_{i}^{k}&=\frac{\sum _{j \in k} S R P_{i, j}}{\sum _{j \in k} \underline{1}(j \in k)} \end{aligned}$$

where \(\underline{1}(\cdot )\) is the indicator function.

Alternatively, for each test motion, all categories are pitted against each other, one-on-one, and the category with the higher value for the parameter is considered. All category-specific votes are aggregated together and, then, normalized to get the soft scores. That is,

$$\begin{aligned} \begin{aligned} {\text {Vote}}_{i, M A X}^{k}&=\sum _{\forall m \ne k} \underline{1}\left( M A X_{i}^{k}>M A X_{i}^{m}\right) \\ {\text {Vote}}_{i, S I M}^{k}&=\sum _{\forall m \ne k} \underline{1}\left( S I M_{i}^{k}>S I M_{i}^{m}\right) \end{aligned} \end{aligned}$$
(18)

The full body level-n codebooks that offer the best classification results are short-listed for fusion to yield the final decision.

1.4 D. Camouflaged object detection

In [37], a constructive approach is presented by characterizing entity-texture and statistical modeling of Camouflaged Images in texture smoothing conditions. Camouflage image is a combination of Camouflaged object image which is a target and Camouflage pattern image in general background. The Camouflaged object image intensities are modified by a predefined threshold value and its binary representation which is treated as a new target image. The correlation peaks \(\alpha ^{\wedge 2}\) and \(\beta ^{\wedge 2}\) between Scene and Normalized basis Images are evaluated to detect the target in the camouflaged image S. When the Camouflaged image is a linear combination of basis images (\(\textrm{f}_{\textrm{o}}^{\wedge }\textrm{x}\) and \(\mathrm {f*}^{\wedge }\textrm{x}\)) the normalization of basis images with its autocorrelation is \(\textrm{f}_{\textrm{o}}^{\wedge }\textrm{x}\) and \(\mathrm {f*}^{\wedge }\textrm{x}\). The Correlation peaks from normalized basis images are.

$$\begin{aligned} \begin{array}{c} \alpha ^{\wedge 2}=\left[ \textrm{S}(\textrm{x}) * \textrm{f}_{\textrm{o}}^{\wedge }(\textrm{x})\right] ^{2} \text{ and } \\ \beta ^{\wedge 2}=\left[ \textrm{S}(\textrm{x}) * \mathrm f * ^{\wedge }(\textrm{x})\right] ^{2} \end{array} \end{aligned}$$
(19)

These correlations peaks obtained from the camouflage image with basis images of the target and its binary image are considered as filter coefficients to identify the target in the camouflaged image. The filter equation is given with correlation peaks is:

$$\begin{aligned} C(x)=\frac{\left[ S(x) * f_{0}(x)\right] ^{2}}{[S(x) * {f_*} (x)]^{2}-\sqrt{N}\left[ S^{2}(x) * {f_*} (x)\right] } \end{aligned}$$
(20)

If the degree of Camouflage is high enough in the case of single or multiple Camouflaged objects the modeling of the target will not be able to detect the object due to intensity variation and random textures. There is the need for some texture (regular or irregular) models to detect the Camouflaged objects. The cluster shade (CS) and cluster prominence (CP) gives the characteristic summary of GraylevelCM (GLCM). For object detection the following set of statistics are found to be suitable and their use is found to be satisfactory in both the cases of with and without texture smoothing illustration. The feature considered completely depends on normalized GLCM at different offsets.

$$\begin{aligned} \left. \begin{array}{r} \text{ GraylevelCM } =G C(i, j)=\frac{C(i, j)}{\sum _{i, j=1}^{M-1} C(i, j)} \\ \text{ Contrast } =\sum _{i, j=1}^{M}(i-j)^{2} G C(i, j) \\ \text{ CS } =\sum _{i, j=1}^{M}\left( i-D_{x}+j-D_{y}\right) ^{3} G C(i, j) \\ \qquad \begin{array}{r} \text{ CP }=\sum _{i, j=1}^{M}\left( i-D_{x}+j-D_{y}\right) ^{4} G C(i, j) \\ \end{array} \end{array}\right\} \end{aligned}$$
(21)

where,\(\begin{array}{c} D_{x}=\sum _{i, j=1}^{M} i G C(i, j) and D_{y}=\sum _{i, j=1}^{M} j V(i, j). \end{array}\) Target extracted in this approach completely depends on the selection of seed block in image i.e., in texture terminology primitive element extraction. This can be done by the normalized GLCM based characteristics summation like contrast, cluster shade and cluster prominence.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Umale, A., Lal, N. & Goel, C. Vision-based outlier detection techniques in automated surveillance: a survey and future ideas. Multimed Tools Appl 83, 14565–14607 (2024). https://doi.org/10.1007/s11042-023-15911-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15911-y

Keywords

Navigation