Human Action Recognition for Depth Cameras via Dynamic Frame Warping

  • Kartik GuptaEmail author
  • Arnav Bhavsar
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 460)


Human action recognition using depth cameras is an important and challenging task which can involve highly similar motions in different actions. In addition, another factor which makes the problem difficult, is the large amount of intra class variations within the same action class. In this paper, we explore a Dynamic Frame Warping framework as an extension to the Dynamic Time Warping framework from the RGB domain, to address the action recognition with depth cameras. We employ intuitively relevant skeleton joints based features from the depth stream data generated using Microsoft Kinect. We show that the proposed approach is able to generate better accuracy for cross-subject evaluation compared to state-of-the-art works even on complex actions as well as simpler actions but which are similar to each other.


Human action recognition Depth-camera Skeleton information Dynamic frame Warping Class templates 


  1. 1.
    X. Yang, & Y. Tian. Effective 3d action recognition using eigenjoints. Journal of Visual Communication and Image Representation, 25(1), 2014, pp. 2–11.Google Scholar
  2. 2.
    K. Kulkarni, G. Evangelidis, J. Cech, & R. Horaud. Continuous action recognition based on sequence alignment. International Journal of Computer Vision, 112(1), 2015, pp. 90–114.Google Scholar
  3. 3.
    W. Li, Z. Zhang, & , Z. Liu. Action recognition based on a bag of 3d points. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2010), 2010, pp. 9–14.Google Scholar
  4. 4.
    L. Xia, C. C. Chen, & J. K. Aggarwal. View invariant human action recognition using histograms of 3d joints. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2012), 2012, pp. 20–27.Google Scholar
  5. 5.
    J. Wang, Z. Liu, Y. Wu, & J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), 2012. pp. 1290–1297.Google Scholar
  6. 6.
    O. Oreifej, & Z. Liu. HON4D: Histogram of oriented 4d normals for activity recognition from depth sequences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), 2013. pp. 716–723.Google Scholar
  7. 7.
    C. Chen, K. Liu & N. Kehtarnavaz. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing. 2013. pp.1–9.Google Scholar
  8. 8.
    J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, & R. Moore. Real-time human pose recognition in parts from single depth images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 2011, pp. 116–124.Google Scholar
  9. 9.
    L. Rabiner, & B. H. Juang. Fundamentals of speech recognition. Salt Lake: Prentice hall 1993.Google Scholar
  10. 10.
    M. Mueller. Dynamic time warping. Information retrieval for music and motion, Berlin: Springer 2007, pp. 6984.Google Scholar
  11. 11.
    S. S. Chen, D. L. Donoho, & M. A. Saunders. Atomic decomposition by basis pursuit. SIAM journal on scientific computing, 20(1), 1998, pp. 33–61.Google Scholar
  12. 12.
    G. D. Evangelidis, & E. Z. Psarakis. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 2008, pp. 1858–1865.Google Scholar
  13. 13.
    H. Rahmani, A. Mahmood, D. Q. Huynh, & A. Mian. Real time human ation recognition using histograms of depth gradients and random decision forests. IEEE Winter Conference on Applications of Computer Vision (WACV 2014), 2014, pp. 626–633.Google Scholar
  14. 14.
    A. Klaser, M. Marszaek, & C. Schmid. A spatio-temporal descriptor based on 3D-gradients. British Machine Vision Conference (BMVC 2008), 2008, pp. 275:1–10.Google Scholar
  15. 15.
    F. Lv, & R. Nevatia. Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. European Conference on Computer Vision (ECCV 2006), Springer Berlin Heidelberg 2006, pp. 359–372.Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2017

Authors and Affiliations

  1. 1.School of Computing & Electrical EngineeringIndian Institute of Technology MandiMandiIndia

Personalised recommendations