Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 11, pp 13661–13678 | Cite as

Human action and event recognition using a novel descriptor based on improved dense trajectories

  • Snehasis Mukherjee
  • Krit Karan Singh
Article

Abstract

We propose a unified method for recognizing human action and human related events in a realistic video. We use an efficient pipeline of (a) a 3D representation of the Improved Dense Trajectory Feature (DTF) and (b) Fisher Vector (FV). Further, a novel descriptor is proposed, capable of representing human actions and human related events based on the FV representation of the input video. The proposed unified descriptor is a 168-dimensional vector obtained from each video sequence by statistically analyzing the motion patterns of the 3D joint locations of the human body. The proposed descriptor is trained using binary Support Vector Machine (SVM) for recognizing human actions or human related events. We evaluate the proposed approach on two challenging action recognition datasets: UCF sports and CMU Mocap datasets. In addition to the two action recognition dataset, the proposed approach is tested on the Hollywood2 event recognition dataset. On all the benchmark datasets for both action and event recognition, the proposed approach has shown its efficacy compared to the state-of-the-art techniques.

Keywords

Event recognition Action recognition Dense trajectories Fisher vector 

References

  1. 1.
    Asteriadis S, Daras P (2016) Landmark-based multimodal human action recognition. Multimedia Tools and Applications, Springer. doi:  10.1007/s11042-016-3945-6
  2. 2.
    Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: European conference on computer vision, pp 404–417Google Scholar
  3. 3.
    Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. IEEE Computer Vision and Pattern Recognition 1948–1955Google Scholar
  4. 4.
    Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2016) Action recognition from depth sequences using weighted fusion of 2d and 3d auto-correlation of gradient features. Multimedia Tools and Applications, Springer. doi:  10.1007/s11042-016-3284-7
  5. 5.
    CMU Mocap dataset, http://mocap.cs.cmu.edu/, accessed as on December, 2015
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Computer Vision and Pattern Recognition 886–893Google Scholar
  7. 7.
    Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision, LNCS 3952, pp 428–452Google Scholar
  8. 8.
    David G (1999) Lowe, Object recognition from local scale-invariant features. In: International conference on computer vision, pp 1150–1157Google Scholar
  9. 9.
    Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 65–72Google Scholar
  10. 10.
    Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gaidon A, Harchaoui Z, Schmid C (2011) Actom sequence models for efficient action detection. IEEE Computer Vision and Pattern Recognition 3201–3208Google Scholar
  12. 12.
    Gupta A, Martinez J, Little JJ, Woodham RJ (2014) Pose from motion for cross-view action recognition via non-linear circulant temporal encoding. IEEE Computer Vision and Pattern Recognition 2601– 2608Google Scholar
  13. 13.
    Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey vision conference, pp 147–151Google Scholar
  14. 14.
    Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) Action recognition with improved trajectories. International Journal of Multimedia Information Retrieval 2(2):73–101CrossRefGoogle Scholar
  15. 15.
    Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2-3):107–123CrossRefGoogle Scholar
  16. 16.
    Le Q, Zou W, Yeung S, Ng A (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. IEEE Computer Vision and Pattern Recognition 3361–3368Google Scholar
  17. 17.
    Marszałek M, Laptev I, Schmid C (2009) Actions in context. IEEE Computer Vision and Pattern Recognition 2929–2936Google Scholar
  18. 18.
    Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. ICCV workshops on video-oriented object and event classificationGoogle Scholar
  19. 19.
    Mukherjee S (2015) Human action recognition using dominant pose duplet. In: International conference on computer vision system (ICVS), pp 488–497Google Scholar
  20. 20.
    Mukherjee S, Biswas SK, Mukherjee DP (2011) Recognizing human action at a distance in video by key poses. IEEE Trans Circuits Syst Video Technol 21(9):1228–1241CrossRefGoogle Scholar
  21. 21.
    Mukherjee S, Biswas SK, Mukherjee DP (2014) Recognizing interactions between human performers by ‘Dominating Pose Doublet’. Mach Vis Appl 25(4):1033–1052CrossRefGoogle Scholar
  22. 22.
    Mukherjee S, Mallik A, Mukherjee DP (2015) Human action recognition using dominant motion pattern. In: International conference on computer vision system (ICVS), pp 477–487Google Scholar
  23. 23.
    Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: IEEE international conference on computer vision, pp 1817–1824Google Scholar
  24. 24.
    Raptis M, Soatto S (2010) Tracklet descriptors for action modeling and video analysis. In: European conference on computer vision, LNCS 6311, pp 577–590Google Scholar
  25. 25.
    Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. CRCV-TR-12-01Google Scholar
  26. 26.
    Sun J, Mu Y, Yan S, Cheong L-F (2010) Activity recognition using dense long-duration trajectories. In: IEEE international conference on multimedia and expo, pp 322–327Google Scholar
  27. 27.
    Tuzel O, Porikli F, Meer P (2006) Region covariance: a fast descriptor for detection and classification. In: European conference on computer vision - volume part II, pp 589–600Google Scholar
  28. 28.
    Vinodh B, Gowd TS, Mukherjee S (2016) Event recognition in egocentric videos using a novel trajectory based feature. ICVGIP, ACM, pp 76:1–76:8Google Scholar
  29. 29.
    Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  30. 30.
    Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE international conference on computer vision, pp 3551–3558Google Scholar
  31. 31.
    Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference, p 127Google Scholar
  32. 32.
    Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis, Springer 103 (1):60–79MathSciNetCrossRefGoogle Scholar
  33. 33.
    Willems G, Tuytelaars T, Gool LV (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision, LNCS 5303, pp 650–663Google Scholar
  34. 34.
    Wong S-F, Cipolla R (2007) Extracting spatiotemporal interest points using global information. In: IEEE international conference on computer vision, pp 1–8Google Scholar
  35. 35.
    Yao A, Gall J, Fanelli G, Gool LV (2011) Does human action recognition benefit from pose estimation?. In: British machine vision conference, pp 67.1–67.11Google Scholar
  36. 36.
    Ziaeefar M, Bergevin R (2015) Semantic human activity recognition: a literature review. Pattern Recogn, Elsevier 48(8):2329–2345CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.IIIT Chittoor, SriCityAndhra PradeshIndia

Personalised recommendations