Abstract
With the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic information have excellent performance in terms of computational cost and recognition results by extracting spatio-temporal features of all joints, nevertheless, they will cause information redundancy and are of limitations in extracting long-term context spatio-temporal features. In this work, we propose a semantic-guided multi-scale neural network (SGMSN) method for skeleton action recognition. For spatial modeling, the key insight of our approach is to achieve multi-scale graph convolution by manipulating the data level (without adding additional computational cost). For temporal modeling, we build the multi-scale temporal convolutional network with a multi-scale receptive field across the temporal dimensions. Several experiments have been carried out on two publicly available large-scale skeleton datasets, NTU RGB+D and NTU RGB+D 120. On the NTU RGB+D datasets, the accuracy is 90.1% (cross-subject) and 95.8% (cross-view) respectively. The experimental results show that the performance of the proposed network architecture is superior to most current state-of-the-art action recognition models.
Similar content being viewed by others
Materials Availability
The datasets are available from the ROSE Lab at https://rose1.ntu.edu.sg/dataset/actionRecognition/;
Code Availability
The code are available from the first author on reasonable request.
References
Gao BK, Dong L, Bi HB et al (2021) Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition, Appl Intell, pp 1–9. https://doi.org/10.1007/s10489-021-02723-6https://doi.org/10.1007/s10489-021-02723-6
Al-Faris M, Chiverton J, Ndzi D et al (2020) A review on computer vision-based methods for human action recognition. J Imaging 6(6):46. https://doi.org/10.3390/jimaging6060046
Yang JY, Liu W, Yuan JS et al (2020) Hierarchical soft quantization for skeleton-based human action recognition. IEEE Trans on Multimedia 23:883–898. https://doi.org/10.1109/TMM.2020.2990082
Wu LL, Yu ZB, Liu YJ et al (2021) Limb pose aware networks for monocular 3d pose estimation. IEEE Trans on Image Process 31:906–917. https://doi.org/10.1109/TIP.2021.3136613
Ahad MAR, Ahmed M, Antar AD et al (2021) Action recognition using kinematics posture feature on 3d skeleton joint locations. Pattern Recognit Lett 145:216–224. https://doi.org/10.1016/j.patrec.2021.02.013
He JY, Wu X, Cheng ZQ et al (2021) Db-lstm: Densely-connected bi-directional lstm for human action recognition. Neurocomputing 444:319–331. https://doi.org/10.1016/j.neucom.2020.05.118
Chen ZM, Pan JJ, Yang XS et al (2020) Hybrid features for skeleton-based action recognition based on network fusion. Comput Animat Virtual Worlds 31(4–5):1952. https://doi.org/10.1002/cav.1952
Su H, Chang ZG, Yu MY et al (2020) Convolutional neural network with adaptive inferential framework for skeleton-based action recognition. J Vis Commun Image Represent 73:102925. https://doi.org/10.1016/j.jvcir.2020.102925
Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. Paper presented at the thirty-second AAAI conference on artificial intelligence, (AAAI-18), New Orleans, Louisiana, USA, pp 2–7 February 2018
Ahmad T, Jin LW, Lin LJ et al (2021) Skeleton-based action recognition using sparse spatio-temporal gcn with edge effective resistance. Neurocomputing 423:389–398. https://doi.org/10.1016/j.neucom.2020.10.096
Xu Y, Hou ZJ, Liang JZ et al (2019) Action recognition using weighted fusion of depth images and skeleton’s key frames. Multimed Tools Appl 78(17):25063–25078. https://doi.org/10.1007/s11042-019-7593-5
Saggese A, Strisciuglio N, Vento M et al (2019) Learning skeleton representations for human action recognition. Pattern Recognit Lett 118:23–31. https://doi.org/10.1016/j.patrec.2018.03.005
Liu K, Gao L, Khan NM et al (2020) A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. IEEE Trans Multimedia 23:64–76. https://doi.org/10.1109/TMM.2020.2974323
Liu ZY, Zhang HW, Chen ZH et al (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19, June 2020. https://doi.org/10.1109/CVPR42600.2020.00022
Li MS, Chen SH, Zhao YH et al (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00029
Zhang PF, Lan CL, Zeng WJ et al (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition, Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, pp 13–19 June 2020. https://doi.org/10.1109/CVPR42600.2020.00119
Wang J, Liu ZC, Wu Y, Yuan JS (2012) Mining actionlet ensemble for action recognition with depth cameras. Paper presented at the 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, pp 16–21, June 2012
Han F, Reily B, Hoff W et al (2017) Space-time representation of people based on 3d skeletal data: A review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011
Zhang YX, Zhang HB, Du JX et al (2021) Rgb+ 2d skeleton: local hand-crafted and 3d convolution feature coding for action recognition. Signal Image Video Process 15(7):1379–1386. https://doi.org/10.1007/s11760-021-01868-8
Rao HC, Xu SH, Hu XP, et al. (2021) Augmented skeleton based contrastive action learning with momentum lstm for unsupervised action recognition. Inf Sci 569:90–109
Avola D, Cascio M, Cinque L et al (2019) 2-d skeleton-based action recognition via two-branch stacked lstm-rnns. IEEE Trans Multimedia 22(10):2481–2496. https://doi.org/10.1109/TMM.2019.2960588
Seo YM, Choi YS (2021) Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information, Paper presented at the 36th ACM/SIGAPP symposium on applied computing, Republic of Korea, pp 22–26 March 2021. https://doi.org/10.1145/3412841.3441974
Naveenkumar M, Domnic S (2020) Learning representations from quadrilateral based geometric features for skeleton-based action recognition using lstm networks. Intell Decis Technol 14(1):47–54. https://doi.org/10.3233/IDT-190078
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network, Paper presented at the 3rd IAPR asian conference on pattern recognition, Kuala Lumpur, Malaysia, pp 3–6 November 2015. https://doi.org/10.1109/ACPR.2015.7486569https://doi.org/10.1109/ACPR.2015.7486569
Li CK, Hou YH, Wang PC et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628. https://doi.org/10.1109/LSP.2017.2678539
Cao CQ, Lan CL, Zhang YF et al (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257. https://doi.org/10.1109/TCSVT.2018.2879913
Banerjee A, Singh PK, Sarkar R (2020) Fuzzy integral based cnn classifier fusion for 3d skeleton action recognition. IEEE Trans Circuits Syst Video Technol 31(6):2206–2216. https://doi.org/10.1109/TCSVT.2020.3019293
Yoon YS, Yu JM, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition, Appl Intell, pp 1–15. https://doi.org/10.1007/s10489-021-02487-zhttps://doi.org/10.1007/s10489-021-02487-z
Chan WS, Tian ZQ, Wu Y (2020) Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors 20(12):3499. https://doi.org/10.3390/s20123499
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/j.cviu.2021.103219
Zhang YH, Wu B, Li W et al (2021) STST: Spatial-temporal specialized transformer for skeleton-based action recognition. Paper presented at the 29th ACM international conference on multimedia, ACM 2021, virtual event, China, pp 20–24, October 2021
Bai RW, Li M, Meng B et al (2021) Gcst: Graph convolutional skeleton transformer for action recognition. arXiv:2109.02860
Cheng YB, Chen XP, Zhang DY et al (2021) Motion-transformer: self-supervised pre-training for skeleton-based action recognition. Paper presented at the 2nd ACM international conference on multimedia in asia, ACM 2021, virtual event, Singapore, pp 7–9 March 2021
Chen Z, Li SC, Yang B et al (2021) Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, virtual event, pp 2–9, February 2021
Abu-El-Haija S, Perozzi B, Kapoor A et al Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. Paper presented at the 36th international conference on machine learning, ICML 2019, long beach, California, USA, pp 9–15, June 2019
Li B, Li X, Zhang ZF et al (2019) Spatio-temporal graph routing for skeleton-based action recognition, Paper presented at the thirty-third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January –1 February 2019. https://doi.org/10.1609/aaai.v33i01.33018561
Li MS, Chen SH, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00371https://doi.org/10.1109/CVPR.2019.00371
Liu X, Li YS, Xia RJ (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300. https://doi.org/10.1016/j.neucom.2020.03.126
Huang QQ, Zhou FY, He JK et al (2020) Spatial–temporal graph attention networks for skeleton-based action recognition. J Electron Imaging 29(5):053033. https://doi.org/10.1117/1.JEI.29.5.053003
Liao RJ, Zhao Z, Urtasun R et al (2019) Lanczosnet: Multi-scale deep graph convolutional networks. arXiv:1901.01484
Zhang PF, Lan CL, Zeng WJ et al (2021) Multi-scale semantics-guided neural networks for efficient skeleton-based human action recognition. arXiv:2111.03993
Xu WY, Wu MQ, Zhu J et al (2021) Multi-scale skeleton adaptive weighted gcn for skeleton-based human action recognition in iot. Appl Soft Comput 104:107236. https://doi.org/10.1016/j.asoc.2021.107236
Wang HR, Yu BS, Xia K et al (2021) Skeleton edge motion networks for human action recognition. Neurocomputing 423:1–12. https://doi.org/10.1016/j.neucom.2020.10.037
Shi L, Zhang YF, Cheng J et al (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, Paper presented at the 2019 IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.01230
Cao Y, Liu C, Huang ZL et al (2021) Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80(19):29139–29162. https://doi.org/10.1007/s11042-021-11136-z
Peng W, Hong XP, Zhao GY (2021) Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recognit 115:107921. https://doi.org/10.1016/j.patcog.2021.107921
Herrmann C, Bowen RS, Zabih R (2020) Channel Selection Using Gumbel Softmax, Paper presented at the computer vision - ECCV 2020 - 16th european conference, Glasgow, UK, pp 23–28 August 2020. https://doi.org/10.1007/978-3-030-58583-9_15
Wen D, Jiang JF, Xu JW et al (2021) RFC-HyPGCN: A Runtime sparse feature compress accelerator for skeleton-based GCNs action recognition model with hybrid pruning. Paper presented at the 2021 IEEE 32nd international conference on application-specific systems, architectures and processors (ASAP), virtual conference, USA, pp 7–9, July 2021
Shahroudy A, Liu J, Ng TT et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis, paper presented at the 2016 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, pp 27–30 June 2016. https://doi.org/10.1109/CVPR.2016.115
Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/TPAMI.2019.2916873
He T, Zhang Z, Zhang H et al (2019) Bag of tricks for image classification with convolutional neural networks, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019. https://doi.org/10.1109/CVPR.2019.00065
Cheng K, Zhang YF, He XY et al (2020) Skeleton-based action recognition with shift graph convolutional network. Paper presented at the 2020 IEEE/CVF Conf Comput Vis and Pattern Recognit, seattle, WA, USA, pp 13–19, June 2020
Chen YX, Zhang ZQ, Yuan CF et al (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. Paper presented at the 2021 IEEE/CVF Int Conf Comput Vis, ICCV 2021, Virtual Event, pp 11–17, October 2021
Li C, Zhong QY, Xie D et al (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055, pp 786–792. https://doi.org/10.24963/ijcai.2018/109
Zhang PF, Lan CL, Xing JL et al (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/TPAMI.2019.2896631
Si CY, Jing Y, Wang W et al (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning, Paper presented at the computer vision - ECCV 2018 - 15th European Conference, Munich, Germany, pp 8–14, September 2018. https://doi.org/10.1007/978-3-030-01246-5_7
Si CY, Chen WT, Wang W et al (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition, Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.00132
Shi L, Zhang YF, Cheng J et al (2019) Skeleton-based action recognition with directed graph neural networks. Paper presented at the 2019 IEEE Conf Comput Vis and Pattern Recognit, CVPR 2019, Long Beach, CA, USA, pp 16–20, June 2019
Peng W, Hong XP, Chen HY et al (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020
Huang LJ, Huang Y, Ouyang WL et al (2020) Part-level graph convolutional network for skeleton-based action recognition. Paper presented at the thirty-fourth AAAI Conf Artif Intell, AAAI 2020, New York, NY, USA, pp 7–12, February 2020
Song YF, Zhang Z, Shan CF et al (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051
Yan GL, Hua M, Zhong ZC (2021) Multi-derivative physical and geometric convolutional embedding networks for skeleton-based action recognition. Comput Aided Geom Des 86:101964. https://doi.org/10.1016/j.cagd.2021.101964
Huang QQ, Zhou FY, Qin RZ et al (2021) View transform graph attention recurrent networks for skeleton-based action recognition. SIViP 15(3):599–606. https://doi.org/10.1007/s11760-020-01781-6
Wang QT, Peng JL, Shi SZ et al (2021) Iip-transformer: Intra-inter-part transformer for skeleton-based action recognition. arXiv:2110.13385
Qin ZY, Liu Y, Ji P et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv:2105.01563
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no Conflict interests to declare that are relevant to the content of this article.
Competing interests
The authors have no Competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qi, Y., Hu, J., Zhuang, L. et al. Semantic-guided multi-scale human skeleton action recognition. Appl Intell 53, 9763–9778 (2023). https://doi.org/10.1007/s10489-022-03968-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03968-5