Semantic-guided multi-scale human skeleton action recognition

Qi, Yongfeng; Hu, Jinlin; Zhuang, Liqiang; Pei, Xiaoxu

doi:10.1007/s10489-022-03968-5

Semantic-guided multi-scale human skeleton action recognition

Published: 12 August 2022

Volume 53, pages 9763–9778, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yongfeng Qi¹,
Jinlin Hu¹,
Liqiang Zhuang¹ &
…
Xiaoxu Pei¹

505 Accesses
1 Altmetric
Explore all metrics

Abstract

With the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic information have excellent performance in terms of computational cost and recognition results by extracting spatio-temporal features of all joints, nevertheless, they will cause information redundancy and are of limitations in extracting long-term context spatio-temporal features. In this work, we propose a semantic-guided multi-scale neural network (SGMSN) method for skeleton action recognition. For spatial modeling, the key insight of our approach is to achieve multi-scale graph convolution by manipulating the data level (without adding additional computational cost). For temporal modeling, we build the multi-scale temporal convolutional network with a multi-scale receptive field across the temporal dimensions. Several experiments have been carried out on two publicly available large-scale skeleton datasets, NTU RGB+D and NTU RGB+D 120. On the NTU RGB+D datasets, the accuracy is 90.1% (cross-subject) and 95.8% (cross-view) respectively. The experimental results show that the performance of the proposed network architecture is superior to most current state-of-the-art action recognition models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual attention network

Article Open access 28 July 2023

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Materials Availability

The datasets are available from the ROSE Lab at https://rose1.ntu.edu.sg/dataset/actionRecognition/;

Code Availability

The code are available from the first author on reasonable request.

References

Gao BK, Dong L, Bi HB et al (2021) Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition, Appl Intell, pp 1–9. https://doi.org/10.1007/s10489-021-02723-6 https://doi.org/10.1007/s10489-021-02723-6
Al-Faris M, Chiverton J, Ndzi D et al (2020) A review on computer vision-based methods for human action recognition. J Imaging 6(6):46. https://doi.org/10.3390/jimaging6060046
Article Google Scholar
Yang JY, Liu W, Yuan JS et al (2020) Hierarchical soft quantization for skeleton-based human action recognition. IEEE Trans on Multimedia 23:883–898. https://doi.org/10.1109/TMM.2020.2990082
Article Google Scholar
Wu LL, Yu ZB, Liu YJ et al (2021) Limb pose aware networks for monocular 3d pose estimation. IEEE Trans on Image Process 31:906–917. https://doi.org/10.1109/TIP.2021.3136613
Article Google Scholar
Ahad MAR, Ahmed M, Antar AD et al (2021) Action recognition using kinematics posture feature on 3d skeleton joint locations. Pattern Recognit Lett 145:216–224. https://doi.org/10.1016/j.patrec.2021.02.013
Article Google Scholar
He JY, Wu X, Cheng ZQ et al (2021) Db-lstm: Densely-connected bi-directional lstm for human action recognition. Neurocomputing 444:319–331. https://doi.org/10.1016/j.neucom.2020.05.118
Article Google Scholar
Chen ZM, Pan JJ, Yang XS et al (2020) Hybrid features for skeleton-based action recognition based on network fusion. Comput Animat Virtual Worlds 31(4–5):1952. https://doi.org/10.1002/cav.1952
Google Scholar
Su H, Chang ZG, Yu MY et al (2020) Convolutional neural network with adaptive inferential framework for skeleton-based action recognition. J Vis Commun Image Represent 73:102925. https://doi.org/10.1016/j.jvcir.2020.102925
Article Google Scholar
Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. Paper presented at the thirty-second AAAI conference on artificial intelligence, (AAAI-18), New Orleans, Louisiana, USA, pp 2–7 February 2018
Ahmad T, Jin LW, Lin LJ et al (2021) Skeleton-based action recognition using sparse spatio-temporal gcn with edge effective resistance. Neurocomputing 423:389–398. https://doi.org/10.1016/j.neucom.2020.10.096
Article Google Scholar
Xu Y, Hou ZJ, Liang JZ et al (2019) Action recognition using weighted fusion of depth images and skeleton’s key frames. Multimed Tools Appl 78(17):25063–25078. https://doi.org/10.1007/s11042-019-7593-5
Article Google Scholar
Saggese A, Strisciuglio N, Vento M et al (2019) Learning skeleton representations for human action recognition. Pattern Recognit Lett 118:23–31. https://doi.org/10.1016/j.patrec.2018.03.005
Article Google Scholar
Liu K, Gao L, Khan NM et al (2020) A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. IEEE Trans Multimedia 23:64–76. https://doi.org/10.1109/TMM.2020.2974323
Article Google Scholar
Liu ZY, Zhang HW, Chen ZH et al (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19, June 2020. https://doi.org/10.1109/CVPR42600.2020.00022
Li MS, Chen SH, Zhao YH et al (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00029
Zhang PF, Lan CL, Zeng WJ et al (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00119
Wang J, Liu ZC, Wu Y, Yuan JS (2012) Mining actionlet ensemble for action recognition with depth cameras. Paper presented at the 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, pp 16–21, June 2012
Han F, Reily B, Hoff W et al (2017) Space-time representation of people based on 3d skeletal data: A review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011
Article Google Scholar
Zhang YX, Zhang HB, Du JX et al (2021) Rgb+ 2d skeleton: local hand-crafted and 3d convolution feature coding for action recognition. Signal Image Video Process 15(7):1379–1386. https://doi.org/10.1007/s11760-021-01868-8
Article Google Scholar
Rao HC, Xu SH, Hu XP, et al. (2021) Augmented skeleton based contrastive action learning with momentum lstm for unsupervised action recognition. Inf Sci 569:90–109
Article Google Scholar
Avola D, Cascio M, Cinque L et al (2019) 2-d skeleton-based action recognition via two-branch stacked lstm-rnns. IEEE Trans Multimedia 22(10):2481–2496. https://doi.org/10.1109/TMM.2019.2960588
Article Google Scholar
Seo YM, Choi YS (2021) Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information, Paper presented at the 36th ACM/SIGAPP symposium on applied computing, Republic of Korea, pp 22–26 March 2021. https://doi.org/10.1145/3412841.3441974
Naveenkumar M, Domnic S (2020) Learning representations from quadrilateral based geometric features for skeleton-based action recognition using lstm networks. Intell Decis Technol 14(1):47–54. https://doi.org/10.3233/IDT-190078
Article Google Scholar
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network, Paper presented at the 3rd IAPR asian conference on pattern recognition, Kuala Lumpur, Malaysia, pp 3–6 November 2015. https://doi.org/10.1109/ACPR.2015.7486569 https://doi.org/10.1109/ACPR.2015.7486569
Li CK, Hou YH, Wang PC et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628. https://doi.org/10.1109/LSP.2017.2678539
Article Google Scholar
Cao CQ, Lan CL, Zhang YF et al (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257. https://doi.org/10.1109/TCSVT.2018.2879913
Article Google Scholar
Banerjee A, Singh PK, Sarkar R (2020) Fuzzy integral based cnn classifier fusion for 3d skeleton action recognition. IEEE Trans Circuits Syst Video Technol 31(6):2206–2216. https://doi.org/10.1109/TCSVT.2020.3019293
Article Google Scholar
Yoon YS, Yu JM, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition, Appl Intell, pp 1–15. https://doi.org/10.1007/s10489-021-02487-z https://doi.org/10.1007/s10489-021-02487-z
Chan WS, Tian ZQ, Wu Y (2020) Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors 20(12):3499. https://doi.org/10.3390/s20123499
Article Google Scholar
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/j.cviu.2021.103219
Article Google Scholar
Zhang YH, Wu B, Li W et al (2021) STST: Spatial-temporal specialized transformer for skeleton-based action recognition. Paper presented at the 29th ACM international conference on multimedia, ACM 2021, virtual event, China, pp 20–24, October 2021
Bai RW, Li M, Meng B et al (2021) Gcst: Graph convolutional skeleton transformer for action recognition. arXiv:2109.02860
Cheng YB, Chen XP, Zhang DY et al (2021) Motion-transformer: self-supervised pre-training for skeleton-based action recognition. Paper presented at the 2nd ACM international conference on multimedia in asia, ACM 2021, virtual event, Singapore, pp 7–9 March 2021
Chen Z, Li SC, Yang B et al (2021) Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, virtual event, pp 2–9, February 2021
Abu-El-Haija S, Perozzi B, Kapoor A et al Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. Paper presented at the 36th international conference on machine learning, ICML 2019, long beach, California, USA, pp 9–15, June 2019
Li B, Li X, Zhang ZF et al (2019) Spatio-temporal graph routing for skeleton-based action recognition, Paper presented at the thirty-third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January –1 February 2019. https://doi.org/10.1609/aaai.v33i01.33018561
Li MS, Chen SH, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00371 https://doi.org/10.1109/CVPR.2019.00371
Liu X, Li YS, Xia RJ (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300. https://doi.org/10.1016/j.neucom.2020.03.126
Article Google Scholar
Huang QQ, Zhou FY, He JK et al (2020) Spatial–temporal graph attention networks for skeleton-based action recognition. J Electron Imaging 29(5):053033. https://doi.org/10.1117/1.JEI.29.5.053003
Article Google Scholar
Liao RJ, Zhao Z, Urtasun R et al (2019) Lanczosnet: Multi-scale deep graph convolutional networks. arXiv:1901.01484
Zhang PF, Lan CL, Zeng WJ et al (2021) Multi-scale semantics-guided neural networks for efficient skeleton-based human action recognition. arXiv:2111.03993
Xu WY, Wu MQ, Zhu J et al (2021) Multi-scale skeleton adaptive weighted gcn for skeleton-based human action recognition in iot. Appl Soft Comput 104:107236. https://doi.org/10.1016/j.asoc.2021.107236
Article Google Scholar
Wang HR, Yu BS, Xia K et al (2021) Skeleton edge motion networks for human action recognition. Neurocomputing 423:1–12. https://doi.org/10.1016/j.neucom.2020.10.037
Article Google Scholar
Shi L, Zhang YF, Cheng J et al (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.01230
Cao Y, Liu C, Huang ZL et al (2021) Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80(19):29139–29162. https://doi.org/10.1007/s11042-021-11136-z
Article Google Scholar
Peng W, Hong XP, Zhao GY (2021) Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recognit 115:107921. https://doi.org/10.1016/j.patcog.2021.107921
Article Google Scholar
Herrmann C, Bowen RS, Zabih R (2020) Channel Selection Using Gumbel Softmax, Paper presented at the computer vision - ECCV 2020 - 16th european conference, Glasgow, UK, pp 23–28 August 2020. https://doi.org/10.1007/978-3-030-58583-9_15
Wen D, Jiang JF, Xu JW et al (2021) RFC-HyPGCN: A Runtime sparse feature compress accelerator for skeleton-based GCNs action recognition model with hybrid pruning. Paper presented at the 2021 IEEE 32nd international conference on application-specific systems, architectures and processors (ASAP), virtual conference, USA, pp 7–9, July 2021
Shahroudy A, Liu J, Ng TT et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis, paper presented at the 2016 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, pp 27–30 June 2016. https://doi.org/10.1109/CVPR.2016.115
Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/TPAMI.2019.2916873
Article Google Scholar
He T, Zhang Z, Zhang H et al (2019) Bag of tricks for image classification with convolutional neural networks, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00065
Cheng K, Zhang YF, He XY et al (2020) Skeleton-based action recognition with shift graph convolutional network. Paper presented at the 2020 IEEE/CVF Conf Comput Vis and Pattern Recognit, seattle, WA, USA, pp 13–19, June 2020
Chen YX, Zhang ZQ, Yuan CF et al (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. Paper presented at the 2021 IEEE/CVF Int Conf Comput Vis, ICCV 2021, Virtual Event, pp 11–17, October 2021
Li C, Zhong QY, Xie D et al (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055, pp 786–792. https://doi.org/10.24963/ijcai.2018/109
Zhang PF, Lan CL, Xing JL et al (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/TPAMI.2019.2896631
Article Google Scholar
Si CY, Jing Y, Wang W et al (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning, Paper presented at the computer vision - ECCV 2018 - 15th European Conference, Munich, Germany, pp 8–14, September 2018. https://doi.org/10.1007/978-3-030-01246-5_7
Si CY, Chen WT, Wang W et al (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.00132
Shi L, Zhang YF, Cheng J et al (2019) Skeleton-based action recognition with directed graph neural networks. Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019
Peng W, Hong XP, Chen HY et al (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020
Huang LJ, Huang Y, Ouyang WL et al (2020) Part-level graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020
Song YF, Zhang Z, Shan CF et al (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051
Article Google Scholar
Yan GL, Hua M, Zhong ZC (2021) Multi-derivative physical and geometric convolutional embedding networks for skeleton-based action recognition. Comput Aided Geom Des 86:101964. https://doi.org/10.1016/j.cagd.2021.101964
Article MathSciNet MATH Google Scholar
Huang QQ, Zhou FY, Qin RZ et al (2021) View transform graph attention recurrent networks for skeleton-based action recognition. SIViP 15(3):599–606. https://doi.org/10.1007/s11760-020-01781-6
Article Google Scholar
Wang QT, Peng JL, Shi SZ et al (2021) Iip-transformer: Intra-inter-part transformer for skeleton-based action recognition. arXiv:2110.13385
Qin ZY, Liu Y, Ji P et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv:2105.01563

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, Gansu, China
Yongfeng Qi, Jinlin Hu, Liqiang Zhuang & Xiaoxu Pei

Authors

Yongfeng Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jinlin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxu Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinlin Hu.

Ethics declarations

Conflict of Interests

The authors have no Conflict interests to declare that are relevant to the content of this article.

Competing interests

The authors have no Competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qi, Y., Hu, J., Zhuang, L. et al. Semantic-guided multi-scale human skeleton action recognition. Appl Intell 53, 9763–9778 (2023). https://doi.org/10.1007/s10489-022-03968-5

Download citation

Accepted: 05 July 2022
Published: 12 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-03968-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic-guided multi-scale human skeleton action recognition

Abstract

Access this article

Similar content being viewed by others

Visual attention network

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Human activity recognition in artificial intelligence framework: a narrative review

Materials Availability

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic-guided multi-scale human skeleton action recognition

Abstract

Access this article

Similar content being viewed by others

Visual attention network

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Human activity recognition in artificial intelligence framework: a narrative review

Materials Availability

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation