Skip to main content

Advertisement

Log in

Semantic-guided multi-scale human skeleton action recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic information have excellent performance in terms of computational cost and recognition results by extracting spatio-temporal features of all joints, nevertheless, they will cause information redundancy and are of limitations in extracting long-term context spatio-temporal features. In this work, we propose a semantic-guided multi-scale neural network (SGMSN) method for skeleton action recognition. For spatial modeling, the key insight of our approach is to achieve multi-scale graph convolution by manipulating the data level (without adding additional computational cost). For temporal modeling, we build the multi-scale temporal convolutional network with a multi-scale receptive field across the temporal dimensions. Several experiments have been carried out on two publicly available large-scale skeleton datasets, NTU RGB+D and NTU RGB+D 120. On the NTU RGB+D datasets, the accuracy is 90.1% (cross-subject) and 95.8% (cross-view) respectively. The experimental results show that the performance of the proposed network architecture is superior to most current state-of-the-art action recognition models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Materials Availability

The datasets are available from the ROSE Lab at https://rose1.ntu.edu.sg/dataset/actionRecognition/;

Code Availability

The code are available from the first author on reasonable request.

References

  1. Gao BK, Dong L, Bi HB et al (2021) Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition, Appl Intell, pp 1–9. https://doi.org/10.1007/s10489-021-02723-6https://doi.org/10.1007/s10489-021-02723-6

  2. Al-Faris M, Chiverton J, Ndzi D et al (2020) A review on computer vision-based methods for human action recognition. J Imaging 6(6):46. https://doi.org/10.3390/jimaging6060046

    Article  Google Scholar 

  3. Yang JY, Liu W, Yuan JS et al (2020) Hierarchical soft quantization for skeleton-based human action recognition. IEEE Trans on Multimedia 23:883–898. https://doi.org/10.1109/TMM.2020.2990082

    Article  Google Scholar 

  4. Wu LL, Yu ZB, Liu YJ et al (2021) Limb pose aware networks for monocular 3d pose estimation. IEEE Trans on Image Process 31:906–917. https://doi.org/10.1109/TIP.2021.3136613

    Article  Google Scholar 

  5. Ahad MAR, Ahmed M, Antar AD et al (2021) Action recognition using kinematics posture feature on 3d skeleton joint locations. Pattern Recognit Lett 145:216–224. https://doi.org/10.1016/j.patrec.2021.02.013

    Article  Google Scholar 

  6. He JY, Wu X, Cheng ZQ et al (2021) Db-lstm: Densely-connected bi-directional lstm for human action recognition. Neurocomputing 444:319–331. https://doi.org/10.1016/j.neucom.2020.05.118

    Article  Google Scholar 

  7. Chen ZM, Pan JJ, Yang XS et al (2020) Hybrid features for skeleton-based action recognition based on network fusion. Comput Animat Virtual Worlds 31(4–5):1952. https://doi.org/10.1002/cav.1952

    Google Scholar 

  8. Su H, Chang ZG, Yu MY et al (2020) Convolutional neural network with adaptive inferential framework for skeleton-based action recognition. J Vis Commun Image Represent 73:102925. https://doi.org/10.1016/j.jvcir.2020.102925

    Article  Google Scholar 

  9. Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. Paper presented at the thirty-second AAAI conference on artificial intelligence, (AAAI-18), New Orleans, Louisiana, USA, pp 2–7 February 2018

  10. Ahmad T, Jin LW, Lin LJ et al (2021) Skeleton-based action recognition using sparse spatio-temporal gcn with edge effective resistance. Neurocomputing 423:389–398. https://doi.org/10.1016/j.neucom.2020.10.096

    Article  Google Scholar 

  11. Xu Y, Hou ZJ, Liang JZ et al (2019) Action recognition using weighted fusion of depth images and skeleton’s key frames. Multimed Tools Appl 78(17):25063–25078. https://doi.org/10.1007/s11042-019-7593-5

    Article  Google Scholar 

  12. Saggese A, Strisciuglio N, Vento M et al (2019) Learning skeleton representations for human action recognition. Pattern Recognit Lett 118:23–31. https://doi.org/10.1016/j.patrec.2018.03.005

    Article  Google Scholar 

  13. Liu K, Gao L, Khan NM et al (2020) A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. IEEE Trans Multimedia 23:64–76. https://doi.org/10.1109/TMM.2020.2974323

    Article  Google Scholar 

  14. Liu ZY, Zhang HW, Chen ZH et al (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19, June 2020. https://doi.org/10.1109/CVPR42600.2020.00022

  15. Li MS, Chen SH, Zhao YH et al (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00029

  16. Zhang PF, Lan CL, Zeng WJ et al (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00119

  17. Wang J, Liu ZC, Wu Y, Yuan JS (2012) Mining actionlet ensemble for action recognition with depth cameras. Paper presented at the 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, pp 16–21, June 2012

  18. Han F, Reily B, Hoff W et al (2017) Space-time representation of people based on 3d skeletal data: A review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011

    Article  Google Scholar 

  19. Zhang YX, Zhang HB, Du JX et al (2021) Rgb+ 2d skeleton: local hand-crafted and 3d convolution feature coding for action recognition. Signal Image Video Process 15(7):1379–1386. https://doi.org/10.1007/s11760-021-01868-8

    Article  Google Scholar 

  20. Rao HC, Xu SH, Hu XP, et al. (2021) Augmented skeleton based contrastive action learning with momentum lstm for unsupervised action recognition. Inf Sci 569:90–109

    Article  Google Scholar 

  21. Avola D, Cascio M, Cinque L et al (2019) 2-d skeleton-based action recognition via two-branch stacked lstm-rnns. IEEE Trans Multimedia 22(10):2481–2496. https://doi.org/10.1109/TMM.2019.2960588

    Article  Google Scholar 

  22. Seo YM, Choi YS (2021) Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information, Paper presented at the 36th ACM/SIGAPP symposium on applied computing, Republic of Korea, pp 22–26 March 2021. https://doi.org/10.1145/3412841.3441974

  23. Naveenkumar M, Domnic S (2020) Learning representations from quadrilateral based geometric features for skeleton-based action recognition using lstm networks. Intell Decis Technol 14(1):47–54. https://doi.org/10.3233/IDT-190078

    Article  Google Scholar 

  24. Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network, Paper presented at the 3rd IAPR asian conference on pattern recognition, Kuala Lumpur, Malaysia, pp 3–6 November 2015. https://doi.org/10.1109/ACPR.2015.7486569https://doi.org/10.1109/ACPR.2015.7486569

  25. Li CK, Hou YH, Wang PC et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628. https://doi.org/10.1109/LSP.2017.2678539

    Article  Google Scholar 

  26. Cao CQ, Lan CL, Zhang YF et al (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257. https://doi.org/10.1109/TCSVT.2018.2879913

    Article  Google Scholar 

  27. Banerjee A, Singh PK, Sarkar R (2020) Fuzzy integral based cnn classifier fusion for 3d skeleton action recognition. IEEE Trans Circuits Syst Video Technol 31(6):2206–2216. https://doi.org/10.1109/TCSVT.2020.3019293

    Article  Google Scholar 

  28. Yoon YS, Yu JM, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition, Appl Intell, pp 1–15. https://doi.org/10.1007/s10489-021-02487-zhttps://doi.org/10.1007/s10489-021-02487-z

  29. Chan WS, Tian ZQ, Wu Y (2020) Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors 20(12):3499. https://doi.org/10.3390/s20123499

    Article  Google Scholar 

  30. Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/j.cviu.2021.103219

    Article  Google Scholar 

  31. Zhang YH, Wu B, Li W et al (2021) STST: Spatial-temporal specialized transformer for skeleton-based action recognition. Paper presented at the 29th ACM international conference on multimedia, ACM 2021, virtual event, China, pp 20–24, October 2021

  32. Bai RW, Li M, Meng B et al (2021) Gcst: Graph convolutional skeleton transformer for action recognition. arXiv:2109.02860

  33. Cheng YB, Chen XP, Zhang DY et al (2021) Motion-transformer: self-supervised pre-training for skeleton-based action recognition. Paper presented at the 2nd ACM international conference on multimedia in asia, ACM 2021, virtual event, Singapore, pp 7–9 March 2021

  34. Chen Z, Li SC, Yang B et al (2021) Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, virtual event, pp 2–9, February 2021

  35. Abu-El-Haija S, Perozzi B, Kapoor A et al Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. Paper presented at the 36th international conference on machine learning, ICML 2019, long beach, California, USA, pp 9–15, June 2019

  36. Li B, Li X, Zhang ZF et al (2019) Spatio-temporal graph routing for skeleton-based action recognition, Paper presented at the thirty-third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January –1 February 2019. https://doi.org/10.1609/aaai.v33i01.33018561

  37. Li MS, Chen SH, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00371https://doi.org/10.1109/CVPR.2019.00371

  38. Liu X, Li YS, Xia RJ (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300. https://doi.org/10.1016/j.neucom.2020.03.126

    Article  Google Scholar 

  39. Huang QQ, Zhou FY, He JK et al (2020) Spatial–temporal graph attention networks for skeleton-based action recognition. J Electron Imaging 29(5):053033. https://doi.org/10.1117/1.JEI.29.5.053003

    Article  Google Scholar 

  40. Liao RJ, Zhao Z, Urtasun R et al (2019) Lanczosnet: Multi-scale deep graph convolutional networks. arXiv:1901.01484

  41. Zhang PF, Lan CL, Zeng WJ et al (2021) Multi-scale semantics-guided neural networks for efficient skeleton-based human action recognition. arXiv:2111.03993

  42. Xu WY, Wu MQ, Zhu J et al (2021) Multi-scale skeleton adaptive weighted gcn for skeleton-based human action recognition in iot. Appl Soft Comput 104:107236. https://doi.org/10.1016/j.asoc.2021.107236

    Article  Google Scholar 

  43. Wang HR, Yu BS, Xia K et al (2021) Skeleton edge motion networks for human action recognition. Neurocomputing 423:1–12. https://doi.org/10.1016/j.neucom.2020.10.037

    Article  Google Scholar 

  44. Shi L, Zhang YF, Cheng J et al (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.01230

  45. Cao Y, Liu C, Huang ZL et al (2021) Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80(19):29139–29162. https://doi.org/10.1007/s11042-021-11136-z

    Article  Google Scholar 

  46. Peng W, Hong XP, Zhao GY (2021) Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recognit 115:107921. https://doi.org/10.1016/j.patcog.2021.107921

    Article  Google Scholar 

  47. Herrmann C, Bowen RS, Zabih R (2020) Channel Selection Using Gumbel Softmax, Paper presented at the computer vision - ECCV 2020 - 16th european conference, Glasgow, UK, pp 23–28 August 2020. https://doi.org/10.1007/978-3-030-58583-9_15

  48. Wen D, Jiang JF, Xu JW et al (2021) RFC-HyPGCN: A Runtime sparse feature compress accelerator for skeleton-based GCNs action recognition model with hybrid pruning. Paper presented at the 2021 IEEE 32nd international conference on application-specific systems, architectures and processors (ASAP), virtual conference, USA, pp 7–9, July 2021

  49. Shahroudy A, Liu J, Ng TT et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis, paper presented at the 2016 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, pp 27–30 June 2016. https://doi.org/10.1109/CVPR.2016.115

  50. Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/TPAMI.2019.2916873

    Article  Google Scholar 

  51. He T, Zhang Z, Zhang H et al (2019) Bag of tricks for image classification with convolutional neural networks, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00065

  52. Cheng K, Zhang YF, He XY et al (2020) Skeleton-based action recognition with shift graph convolutional network. Paper presented at the 2020 IEEE/CVF Conf Comput Vis and Pattern Recognit, seattle, WA, USA, pp 13–19, June 2020

  53. Chen YX, Zhang ZQ, Yuan CF et al (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. Paper presented at the 2021 IEEE/CVF Int Conf Comput Vis, ICCV 2021, Virtual Event, pp 11–17, October 2021

  54. Li C, Zhong QY, Xie D et al (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055, pp 786–792. https://doi.org/10.24963/ijcai.2018/109

  55. Zhang PF, Lan CL, Xing JL et al (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/TPAMI.2019.2896631

    Article  Google Scholar 

  56. Si CY, Jing Y, Wang W et al (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning, Paper presented at the computer vision - ECCV 2018 - 15th European Conference, Munich, Germany, pp 8–14, September 2018. https://doi.org/10.1007/978-3-030-01246-5_7

  57. Si CY, Chen WT, Wang W et al (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.00132

  58. Shi L, Zhang YF, Cheng J et al (2019) Skeleton-based action recognition with directed graph neural networks. Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019

  59. Peng W, Hong XP, Chen HY et al (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020

  60. Huang LJ, Huang Y, Ouyang WL et al (2020) Part-level graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020

  61. Song YF, Zhang Z, Shan CF et al (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051

    Article  Google Scholar 

  62. Yan GL, Hua M, Zhong ZC (2021) Multi-derivative physical and geometric convolutional embedding networks for skeleton-based action recognition. Comput Aided Geom Des 86:101964. https://doi.org/10.1016/j.cagd.2021.101964

    Article  MathSciNet  MATH  Google Scholar 

  63. Huang QQ, Zhou FY, Qin RZ et al (2021) View transform graph attention recurrent networks for skeleton-based action recognition. SIViP 15(3):599–606. https://doi.org/10.1007/s11760-020-01781-6

    Article  Google Scholar 

  64. Wang QT, Peng JL, Shi SZ et al (2021) Iip-transformer: Intra-inter-part transformer for skeleton-based action recognition. arXiv:2110.13385

  65. Qin ZY, Liu Y, Ji P et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv:2105.01563

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinlin Hu.

Ethics declarations

Conflict of Interests

The authors have no Conflict interests to declare that are relevant to the content of this article.

Competing interests

The authors have no Competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, Y., Hu, J., Zhuang, L. et al. Semantic-guided multi-scale human skeleton action recognition. Appl Intell 53, 9763–9778 (2023). https://doi.org/10.1007/s10489-022-03968-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03968-5

Keywords

Navigation