Advertisement

ConvNets-based action recognition from skeleton motion maps

  • Yanfang Chen
  • Liwei Wang
  • Chuankun LiEmail author
  • Yonghong Hou
  • Wanqing Li
Article
  • 39 Downloads

Abstract

With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.

Keywords

Computer vision Action recognition Convolutional neural networks Skeleton motion maps 

Notes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No.61571325 and in part by the Key Projects in the Tianjin Science and Technology Pillar Program under Grant No.16ZXHLGX00190.

References

  1. 1.
    Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 471–478Google Scholar
  2. 2.
    Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172Google Scholar
  3. 3.
    Chollet F (2015) Keras. https://github.com/fchollet/keras
  4. 4.
    Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp 1110–1118Google Scholar
  5. 5.
    Du Y, Fu Y, Wang L (2016) Skeleton based action recognition with convolutional neural network. In: Proc. Asian Conference on Pattern Recognition(IAPR), pp 579–583Google Scholar
  6. 6.
    Fothergill S, Mentis HM, Nowozin S, Kohli P (2012) Instructing people for training gestural interactive systems. In: ACM Conference on Computer-Human Interaction (ACM HCI), pp 1737–1746Google Scholar
  7. 7.
    Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (HOD) Describing trajectories of human joints for action recognition. In: IJCAI, pp 1351–1357Google Scholar
  8. 8.
    Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circ Syst Video Technol 28(3):807–811CrossRefGoogle Scholar
  9. 9.
    Hu J-F, Zheng W-S, Lai J, Zhang J (2015) Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR, pages 5344–5352Google Scholar
  10. 10.
    Hu J-F, Zheng W-S, Ma L, Wang G, Lai J (2016) Real-time rgb-d activity prediction by soft regression. In: ECCV, pp 280–296Google Scholar
  11. 11.
    Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708Google Scholar
  12. 12.
    Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI, pp 2466–2472Google Scholar
  13. 13.
    Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: International Joint Conference on Artificial Intelligence, pp 639–44Google Scholar
  14. 14.
    Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297Google Scholar
  15. 15.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Annual Conference on Neural Information Processing Systems (NIPS), pp 1106–1114Google Scholar
  16. 16.
    Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1012–1020Google Scholar
  17. 17.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPRW, pp 9–14Google Scholar
  18. 18.
    Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters, pp 624–628CrossRefGoogle Scholar
  19. 19.
    Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Proceedings of European Conference on Computer Vision, pp 816–833CrossRefGoogle Scholar
  20. 20.
    Liu J, Akhtar N, Mian A (2017) Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. arXiv:1711.05941
  21. 21.
    Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021CrossRefGoogle Scholar
  22. 22.
    Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3671–3680Google Scholar
  23. 23.
    Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn, pp 346–362Google Scholar
  24. 24.
    Lu X, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp 20–27Google Scholar
  25. 25.
    Lu C, Jia J, Tang C-K (2014) Range-sample depth feature for action recognition. In: CVPR, pp 772–779Google Scholar
  26. 26.
    Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (smij) A new representation for human skeletal action recognition. In: Computer Vision and Pattern Recognition Workshops, pp 24–38Google Scholar
  27. 27.
    Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp 716–723Google Scholar
  28. 28.
    Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, Long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 4580–4584Google Scholar
  29. 29.
    Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+ D: A large scale dataset for 3D human activity analysis. In CVPR, pages 1010–1019Google Scholar
  30. 30.
    Tang Y, Yi T, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5323–5332Google Scholar
  31. 31.
    Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: ICCV, pp 4041–4049Google Scholar
  32. 32.
    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp 588–595Google Scholar
  33. 33.
    Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp 1290–1297Google Scholar
  34. 34.
    Wang P, Li W, Ogunbona P, Gao Z, Zhang H (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: DICTA, pp 1–8Google Scholar
  35. 35.
    Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona PO (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122Google Scholar
  36. 36.
    Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum-Mach Syst 46(4):498–509CrossRefGoogle Scholar
  37. 37.
    Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp 102–106Google Scholar
  38. 38.
    Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. Proceedings of the 24th ACM international conference on Multimedia. ACM, pp 102–106Google Scholar
  39. 39.
    Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254
  40. 40.
    Xu Y, Qi Z, Zhang D (2011) Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments. Neurocomputing 74(18):3946–3952CrossRefGoogle Scholar
  41. 41.
    Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and symmetrical facetraining samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158CrossRefGoogle Scholar
  42. 42.
    Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: CVPR, pp 804–811Google Scholar
  43. 43.
    Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM MM, pp 1057–1060Google Scholar
  44. 44.
    Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent dirichlet allocation for action recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 2613–2618Google Scholar
  45. 45.
    Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE International Conference on Computer Vision, pp 2136–2145Google Scholar
  46. 46.
    Zhang S, Liu X, Xiao J (2017) On geometric features for skeleton-based action recognition using multilayer lstm networks. In: Applications of Computer Vision, pp 148–157Google Scholar
  47. 47.
    Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: DICTA. IEEEGoogle Scholar
  48. 48.
    Zhu W, Lan C, Xing J, Zeng W, Li Y, Li S, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM, networks In Proc. AAAI Conference on Artificial Intelligence, pp 3697–3704Google Scholar
  49. 49.
    Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAIGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electronic and Information EngineeringTianjin UniversityTianjinPeople’s Republic of China
  2. 2.Advanced Multimedia Research LabUniversity of WollongongWollongongAustralia

Personalised recommendations