Abstract
In this paper, we present a spatiotemporal wavelet correlogram (STWC) as a new feature for human action recognition (HAR) in videos. The proposed feature benefits from a different approach with respect to bag of visual words, interest point detection and descriptor representation method. The new approach requires neither motion estimation (tracking) nor background/foreground subtraction. STWC is generated more efficiently compared to the state-of-the-art HAR methods and achieves comparable results. STWC utilizes the multi-scale, multi-resolution property of wavelet transform and considers the correlation of wavelet coefficients. It is generated by computing spatiotemporal correlogram of quantized wavelet coefficients. These coefficients are computed using 3D wavelet decomposition and a simple quantization method. Based on the present findings, recommendations are made for the selection of the richest wavelet subbands to compute STWC.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491
Mühling M, Meister M, Korfhage N et al (2018) Content-based video retrieval in historical collections of the German broadcasting archive. Int J Digit Libr. https://doi.org/10.1007/s00799-018-0236-z
Deng M, Wang C, Cheng F, Zeng W (2017) Fusion of spatial-temporal and kinematic features for gait recognition with deterministic learning. Pattern Recognit 67:186–200
Jiang Y, Wang J, Liang Y, Xia J (2018) Combining static and dynamic features for real-time moving pedestrian detection. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6057-7
Ullah MM, Laptev I (2012) Actlets: a novel local representation for human action recognition in video. In: 19th IEEE international conference on image processing (ICIP). IEEE, pp 777–780
Zhou Q, Wang G (2012) Atomic action features: a new feature for action recognition. In: Computer vision—ECCV. Workshops and demonstrations lecture notes in computer science. pp 291–300
Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent Part A 40:159–167. https://doi.org/10.1016/j.jvcir.2016.06.023
Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257
Lu G, Kudo M (2014) Learning action patterns in difference images for efficient action recognition. Neurocomputing 123:328–336
Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recognit 81:443–455. https://doi.org/10.1016/j.patcog.2018.04.015
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference computer vision and pattern recognition, pp 1–8
Tran D, Bourdev L, Fergus R, et al (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Baccouche M, Mamalet F, Wolf C et al (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39
Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518. https://doi.org/10.1016/j.patcog.2005.05.010
Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50:283–339. https://doi.org/10.1007/s10462-017-9545-7
Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput Part 2 55:42–52. https://doi.org/10.1016/j.imavis.2016.06.007
Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013
Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit 4:556–567
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 1–8
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI’81 proceedings of the 7th international joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc, pp 674–679
Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939
Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 514–521
Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79
Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp 147–151
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and pattern recognition, pp 1996–2003
Dalal N, Triggs B (2005) Histograms of oriented gradients for human Detection. In: Computer vision and pattern recognition (CVPR), pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Comput Vis 60:91–110
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Laptev I (2003) Space-time interest points. Comput Vis 64:107–123
Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, pp 650–663
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British machine vision conference
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576
Nguyen T-V, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25:77–86
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. Pattern Anal Mach Intell 23:257–267
Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187
Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2018.01.037
Castro-Muñoz G, Martínez-Carballido J, Rosas-Romero R (2015) A human action recognition approach with a novel reduced feature set based on the natural domain knowledge of the human figure. Signal Process Image Commun 30:190–205
Huang J, Kumar SR, Mitra M et al (1997) Image indexing using color correlograms. Comput Vis Pattern Recognit. https://doi.org/10.1109/cvpr.1997.609412
Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Pattern Anal Mach Intell 11:674–693
Rahman Ahad MA, Islam MN, Jahan I (2016) Action recognition based on binary patterns of action-history and histogram of oriented gradient. J Multimodal User Interfaces 10:335–344. https://doi.org/10.1007/s12193-016-0229-4
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of IEEE international conference pattern recognition pp 32–36
Ji S, Yang M, Yu K et al (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231. https://doi.org/10.1109/TPAMI.2012.59
Charalampous K, Gasteratos A (2016) On-line deep learning method for action recognition. Pattern Anal Appl 19:337–354. https://doi.org/10.1007/s10044-014-0404-8
Wang S, Ma Z, Yang Y et al (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16:289–298
Dou JL (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896
Yu J, Jeon M, Pedrycz W (2014) Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131:200–207
Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29:546–555
Gorelick L, Blank M, Shechtman E et al (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: In Proceedings of IEEE international conference on computer vision and pattern recognition
Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80
Arunnehru J, Chamundeeswari G, Bharathi SP (2018) Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci 133:471–477. https://doi.org/10.1016/j.procs.2018.07.059
Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN). pp 463–469
Li N, Huang J, Li T et al (2018) Detecting action tubes via spatial action estimation and temporal path inference. Neurocomputing 311:65–77. https://doi.org/10.1016/j.neucom.2018.05.033
Dilmen E, Beyhan S (2018) An enhanced online LS-SVM approach for classification problems. Soft Comput 22:4457–4475. https://doi.org/10.1007/s00500-017-2713-5
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abrishami Moghaddam, H., Zare, A. Spatiotemporal wavelet correlogram for human action recognition. Int J Multimed Info Retr 8, 167–180 (2019). https://doi.org/10.1007/s13735-018-00167-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-018-00167-2