Skip to main content
Log in

Spatiotemporal wavelet correlogram for human action recognition

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In this paper, we present a spatiotemporal wavelet correlogram (STWC) as a new feature for human action recognition (HAR) in videos. The proposed feature benefits from a different approach with respect to bag of visual words, interest point detection and descriptor representation method. The new approach requires neither motion estimation (tracking) nor background/foreground subtraction. STWC is generated more efficiently compared to the state-of-the-art HAR methods and achieves comparable results. STWC utilizes the multi-scale, multi-resolution property of wavelet transform and considers the correlation of wavelet coefficients. It is generated by computing spatiotemporal correlogram of quantized wavelet coefficients. These coefficients are computed using 3D wavelet decomposition and a simple quantization method. Based on the present findings, recommendations are made for the selection of the richest wavelet subbands to compute STWC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491

    Article  Google Scholar 

  2. Mühling M, Meister M, Korfhage N et al (2018) Content-based video retrieval in historical collections of the German broadcasting archive. Int J Digit Libr. https://doi.org/10.1007/s00799-018-0236-z

    Google Scholar 

  3. Deng M, Wang C, Cheng F, Zeng W (2017) Fusion of spatial-temporal and kinematic features for gait recognition with deterministic learning. Pattern Recognit 67:186–200

    Article  Google Scholar 

  4. Jiang Y, Wang J, Liang Y, Xia J (2018) Combining static and dynamic features for real-time moving pedestrian detection. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6057-7

    Google Scholar 

  5. Ullah MM, Laptev I (2012) Actlets: a novel local representation for human action recognition in video. In: 19th IEEE international conference on image processing (ICIP). IEEE, pp 777–780

  6. Zhou Q, Wang G (2012) Atomic action features: a new feature for action recognition. In: Computer vision—ECCV. Workshops and demonstrations lecture notes in computer science. pp 291–300

  7. Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent Part A 40:159–167. https://doi.org/10.1016/j.jvcir.2016.06.023

    Article  Google Scholar 

  8. Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257

    Article  Google Scholar 

  9. Lu G, Kudo M (2014) Learning action patterns in difference images for efficient action recognition. Neurocomputing 123:328–336

    Article  Google Scholar 

  10. Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recognit 81:443–455. https://doi.org/10.1016/j.patcog.2018.04.015

    Article  Google Scholar 

  11. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference computer vision and pattern recognition, pp 1–8

  12. Tran D, Bourdev L, Fergus R, et al (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  13. Baccouche M, Mamalet F, Wolf C et al (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39

    Chapter  Google Scholar 

  14. Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518. https://doi.org/10.1016/j.patcog.2005.05.010

    Article  Google Scholar 

  15. Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50:283–339. https://doi.org/10.1007/s10462-017-9545-7

    Article  Google Scholar 

  16. Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput Part 2 55:42–52. https://doi.org/10.1016/j.imavis.2016.06.007

    Article  Google Scholar 

  17. Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013

  18. Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit 4:556–567

    Article  Google Scholar 

  19. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 1–8

  20. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI’81 proceedings of the 7th international joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc, pp 674–679

  21. Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939

  22. Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 514–521

  23. Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79

    Article  MathSciNet  Google Scholar 

  24. Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp 147–151

  25. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and pattern recognition, pp 1996–2003

  26. Dalal N, Triggs B (2005) Histograms of oriented gradients for human Detection. In: Computer vision and pattern recognition (CVPR), pp 886–893

  27. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Comput Vis 60:91–110

    Article  Google Scholar 

  28. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359. https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  29. Laptev I (2003) Space-time interest points. Comput Vis 64:107–123

    Article  Google Scholar 

  30. Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, pp 650–663

  31. Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British machine vision conference

  32. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  33. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314

  34. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576

  35. Nguyen T-V, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25:77–86

    Article  Google Scholar 

  36. Bobick A, Davis J (2001) The recognition of human movement using temporal templates. Pattern Anal Mach Intell 23:257–267

    Article  Google Scholar 

  37. Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187

    MATH  Google Scholar 

  38. Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2018.01.037

    Google Scholar 

  39. Castro-Muñoz G, Martínez-Carballido J, Rosas-Romero R (2015) A human action recognition approach with a novel reduced feature set based on the natural domain knowledge of the human figure. Signal Process Image Commun 30:190–205

    Article  Google Scholar 

  40. Huang J, Kumar SR, Mitra M et al (1997) Image indexing using color correlograms. Comput Vis Pattern Recognit. https://doi.org/10.1109/cvpr.1997.609412

    Google Scholar 

  41. Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Pattern Anal Mach Intell 11:674–693

    Article  MATH  Google Scholar 

  42. Rahman Ahad MA, Islam MN, Jahan I (2016) Action recognition based on binary patterns of action-history and histogram of oriented gradient. J Multimodal User Interfaces 10:335–344. https://doi.org/10.1007/s12193-016-0229-4

    Article  Google Scholar 

  43. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27

    Article  Google Scholar 

  44. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of IEEE international conference pattern recognition pp 32–36

  45. Ji S, Yang M, Yu K et al (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  46. Charalampous K, Gasteratos A (2016) On-line deep learning method for action recognition. Pattern Anal Appl 19:337–354. https://doi.org/10.1007/s10044-014-0404-8

    Article  MathSciNet  Google Scholar 

  47. Wang S, Ma Z, Yang Y et al (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16:289–298

    Article  Google Scholar 

  48. Dou JL (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896

    Article  Google Scholar 

  49. Yu J, Jeon M, Pedrycz W (2014) Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131:200–207

    Article  Google Scholar 

  50. Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29:546–555

    Article  Google Scholar 

  51. Gorelick L, Blank M, Shechtman E et al (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253

    Article  Google Scholar 

  52. Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: In Proceedings of IEEE international conference on computer vision and pattern recognition

  53. Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80

    Article  Google Scholar 

  54. Arunnehru J, Chamundeeswari G, Bharathi SP (2018) Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci 133:471–477. https://doi.org/10.1016/j.procs.2018.07.059

    Article  Google Scholar 

  55. Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN). pp 463–469

  56. Li N, Huang J, Li T et al (2018) Detecting action tubes via spatial action estimation and temporal path inference. Neurocomputing 311:65–77. https://doi.org/10.1016/j.neucom.2018.05.033

    Article  Google Scholar 

  57. Dilmen E, Beyhan S (2018) An enhanced online LS-SVM approach for classification problems. Soft Comput 22:4457–4475. https://doi.org/10.1007/s00500-017-2713-5

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Abrishami Moghaddam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abrishami Moghaddam, H., Zare, A. Spatiotemporal wavelet correlogram for human action recognition. Int J Multimed Info Retr 8, 167–180 (2019). https://doi.org/10.1007/s13735-018-00167-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-018-00167-2

Keywords

Navigation