Advertisement

The Visual Computer

, Volume 36, Issue 3, pp 621–631 | Cite as

Skeleton-based action recognition by part-aware graph convolutional networks

  • Yang Qin
  • Lingfei MoEmail author
  • Chenyang Li
  • Jiayi Luo
Original Article

Abstract

This paper proposes an improved graph convolutional networks to deal with the skeleton-based action recognition. Inspired by splitting skeleton into several parts to feed deep networks, the part-aware convolutions is designed to replace common convolutions which is performed on all the neighboring joints. For scale invariance on multi-scale data, an Inception-like structure is introduced, which can concatenate feature maps from different convolution kernels. In contrast to methods based on LSTMs, the model presented is capable of extracting both temporal and spatial features from input data. Due to full use of spatial structure, the performance is enhanced greatly on various datasets. To evaluate the model, experiments were conducted on three benchmark skeleton-based datasets, including Berkeley MHAD, SBU Kinect Interaction, and NTU RGB-D datasets. The effectiveness and robustness of the model are demonstrated by comparing the experimental results of the proposed model with the state-of-the-art results. In addition, feature maps from different layers of a trained model are explored and the explanation of the part-aware convolutions is also provided.

Keywords

Action recognition Graph convolutional networks Human skeleton 

Notes

Acknowledgements

(Portions of) the research in this paper used the NTU RGB+D Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.

Funding

This study was funded by the National Science Foundation of China [61603091, Multi-Dimensions Based Physical Activity Assessment for the Human Daily Life].

Compliance with ethical standards

Conflict of interest

The authors, Yang Qin, Lingfei Mo, Chenyang Li, and Jiayi Luo, declare that they have no conflict of interest.

References

  1. 1.
    Bloom, V., Makris, D., Argyriou, V.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 7–12 (2012)Google Scholar
  2. 2.
    Chen, C., Jafari, R., Kehtarnavaz, N.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, pp. 168–172(2015)Google Scholar
  3. 3.
    Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (stip) detector. Vis. Comput. 32(3), 289–306 (2016)CrossRefGoogle Scholar
  4. 4.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)Google Scholar
  5. 5.
    Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)CrossRefGoogle Scholar
  6. 6.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)Google Scholar
  7. 7.
    Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  9. 9.
    Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, IEEE, pp. 5344–5352 (2015)Google Scholar
  10. 10.
    Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on, IEEE, pp. 1–6 (2014)Google Scholar
  11. 11.
    Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)CrossRefGoogle Scholar
  12. 12.
    Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58(6), 1233–1258 (1987)CrossRefGoogle Scholar
  13. 13.
    Kapsouras, I., Nikolaidis, N.: Action recognition on motion capture data using a dynemes and forward differences representation. J. Vis. Commun. Image Represent. 25(6), 1432–1445 (2014)CrossRefGoogle Scholar
  14. 14.
    Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017a)CrossRefGoogle Scholar
  15. 15.
    Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4570–4579 (2017b)Google Scholar
  16. 16.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). arXiv preprint. arXiv:1609.02907
  17. 17.
    Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks (2015). arXiv preprint. arXiv:1511.05493
  18. 18.
    Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN (2017). arXiv preprint. arXiv:1707.02356
  19. 19.
    Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition (2018). arXiv preprint. arXiv:1802.09834
  20. 20.
    Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint. arXiv:1312.4400
  21. 21.
    Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision, Springer, pp. 816–833 (2016)Google Scholar
  22. 22.
    Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3671–3680 (2017)Google Scholar
  23. 23.
    Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley mhad: a comprehensive multimodal human action database. In: Applications of Computer Vision (WACV), 2013 IEEE Workshop on, IEEE, pp. 53–60 (2013)Google Scholar
  25. 25.
    Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)CrossRefGoogle Scholar
  26. 26.
    Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)Google Scholar
  27. 27.
    Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  28. 28.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556
  29. 29.
    Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, 1, p. 7 (2017)Google Scholar
  30. 30.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  31. 31.
    Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)Google Scholar
  32. 32.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)Google Scholar
  33. 33.
    van der Maaten, L., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  34. 34.
    Vantigodi, S., Babu, R.V.: Real-time human action recognition from motion capture data. In: Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013 Fourth National Conference on, IEEE, pp. 1–4 (2013)Google Scholar
  35. 35.
    Vantigodi, S., Radhakrishnan, V.B.: Action recognition from motion capture data using meta-cognitive rbf network classifier. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on, IEEE, pp. 1–6 (2014)Google Scholar
  36. 36.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)Google Scholar
  37. 37.
    Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: e Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  38. 38.
    Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Vis. Comput. 30(12), 1395–1404 (2014)CrossRefGoogle Scholar
  39. 39.
    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). arXiv preprint. arXiv:1801.07455
  40. 40.
    Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 28–35 (2012)Google Scholar
  41. 41.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, Springer, pp. 818–833 (2014)Google Scholar
  42. 42.
    Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, pp. 2018–2025 (2011)Google Scholar
  43. 43.
    Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2136–2145 (2017)Google Scholar
  44. 44.
    Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: AAAI, 2, p. 8 (2016)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Instrument Science and EngineeringSoutheast UniversityNanjingChina

Personalised recommendations