Advertisement

Learning Discriminative Representation for Skeletal Action Recognition Using LSTM Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10425)

Abstract

Human action recognition based on 3D skeleton data is a rapidly growing research area in computer vision due to their robustness to variations of viewpoint, human body scale and motion speed. Recent studies suggest that recurrent neural networks (RNNs) or convolutional neural networks (CNNs) are very effective to learn discriminative features of temporal sequences for classification. However, in prior models, the RNN-based method has a complicated multi-layer hierarchical architecture, and the CNN-based methods learn the contextual feature on fixed temporal scales. In this paper, we propose a framework which is simple and able to select temporal scales automatically with a single layer LSTM for skeleton based action recognition. Experimental results on three benchmark datasets show that our approach achieves the state-of-the-art performance compared to recent models.

Keywords

Skeleton based action recognition Recurrent Neural Networks Long short-term memory 

References

  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)CrossRefGoogle Scholar
  2. 2.
    Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  3. 3.
    Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2015)Google Scholar
  4. 4.
    Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  5. 5.
    Chen, L., Wei, H., Ferryman, J.M.: A survey of human motion analysis using depth imagery. Pattern Recogn. Lett. 34, 1995–2006 (2013)CrossRefGoogle Scholar
  6. 6.
    Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. (2016)Google Scholar
  7. 7.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)Google Scholar
  8. 8.
    Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443 (2014)Google Scholar
  9. 9.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. J. Mach. Learn. Res. 3(1), 115–143 (2003)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: International Joint Conference on Artificial Intelligence, pp. 1351–1357 (2013)Google Scholar
  11. 11.
    Grushin, A., Monner, D.D., Reggia, J.A., Mishra, A.: Robust human action recognition via long short-term memory. In: International Joint Conference on Neural Networks, pp. 1–8 (2013)Google Scholar
  12. 12.
    Han, F., Reily, B., Hoff, W., Zhang, H.: Space-time representation of people based on 3D skeletal data: a review. arXiv preprint (2016). arXiv:1601.01006
  13. 13.
    Han, J.: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)CrossRefGoogle Scholar
  14. 14.
    Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (2013)Google Scholar
  15. 15.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  16. 16.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  17. 17.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Workshop on Human Activity Understanding from 3D Data, pp. 9–14 (2010)Google Scholar
  18. 18.
    Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3D human action recognition. In: European Conference on Computer Vision, pp. 816–833 (2016)Google Scholar
  19. 19.
    Lonard, N., Waghmare, S., Wang, Y., Kim, J.H.: rnn: Recurrent library for torch (2015). arXiv preprint arXiv:1511.07889
  20. 20.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 411(1), pp. 1297–1304 (2011)Google Scholar
  21. 21.
    Sivalingam, R., Somasundaram, G., Bhatawadekar, V., Morellas, V., Papanikolopoulos, N.: Sparse representation of point trajectories for action classification. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 3601–3606 (2012)Google Scholar
  22. 22.
    Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: IEEE Conference on Computer Vision Workshop (2015)Google Scholar
  23. 23.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  24. 24.
    Wang, C., Wang, Y., Yuille, A.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  25. 25.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)Google Scholar
  26. 26.
    Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27 (2012)Google Scholar
  27. 27.
    Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using na07ve-bayes-nearest-neighbor. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19 (2012)Google Scholar
  28. 28.
    Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., Gall, J.: A survey on human motion analysis from depth data. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. LNCS, vol. 8200, pp. 149–187. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-44964-2_8 CrossRefGoogle Scholar
  29. 29.
    Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)Google Scholar
  30. 30.
    Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 486–491 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and TechnologyEast China Normal UniversityShanghaiChina

Personalised recommendations