Human action recognition based on enhanced data guidance and key node spatial temporal graph convolution

Zhang, Chengyu; Liang, Jiuzhen; Li, Xing; Xia, Yunfei; Di, Lan; Hou, Zhenjie; Huan, Zhan

doi:10.1007/s11042-022-11947-8

Human action recognition based on enhanced data guidance and key node spatial temporal graph convolution

Published: 02 February 2022

Volume 81, pages 8349–8366, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chengyu Zhang¹,
Jiuzhen Liang¹,
Xing Li²,
Yunfei Xia³,
Lan Di⁴,
Zhenjie Hou¹ &
…
Zhan Huan¹

471 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Graph convolutional networks have achieved remarkable performance in action recognition from skeleton videos. However, most of the existing GCN-based methods improve performance by increasing model parameters, which require a high amount of data. This means that they usually perform poorly on small sample learning tasks. In this paper, we propose a novel enhanced data guidance algorithm to improve the performance of the GCN-based method on small sample datasets. These enhanced data perform coordinate transformation on the skeleton to obtain robustness to scale, rotation and translation. The proposed guidance algorithm allows the target model to learn the advantages of enhanced data and reduce the complexity of the task. We also propose a new key node method, which can select key joints and frames in the spatial and temporal dimensions respectively. This removes the redundant information of the skeleton sequence and significantly reduces the computational cost. Furthermore, the combination of key nodes and enhanced data can greatly reduce the demand for training data. The recognition accuracy rates of 94.81% and 94.19% have been achieved on the public MSR Action3D and UTD-MHAD datasets, respectively. This result proves that our method is significantly better than mainstream 3D action recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Visual attention network

Article Open access 28 July 2023

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Article 25 September 2020

References

Bansal M, Kumar M, Kumar M (2021) 2d object recognition: a comparative analysis of sift, surf and orb feature descriptors. Multimed Tools Appl 80 (12):18839–18857
Article Google Scholar
Berthelot D, Carlini N, Goodfellow I, Oliver A, Papernot N, Raffel C (2019) MixMatch: A holistic approach to Semi-Supervised learning. Curran Associates Inc., Red Hook, NY USA
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1302–1310
Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Transactions on Human-Machine Systems 45(1):51–61
Article Google Scholar
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP), pp 168–172
Cui R, Hua G, Zhu A, Wu J, Liu H (2019) Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7:8245–8257
Article Google Scholar
Defferrard M, Bresson X, Vandergheynst P (2017) Convolutional neural networks on graphs with fast localized spectral filtering
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1110–1118
Gupta S, Kumar M, Garg A (2019) Improved object recognition results using sift and orb feature detector. Multimed Tools Appl 78(23):34157–34171
Article Google Scholar
Hou Y, Li Z, Wang P, Li W (2018) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 28(3):807–811
Article Google Scholar
Hussein M, Torki M, Gowayyed M, El Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations, 08
Joan B., Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks
Kumar M, Chhabra P, Garg NK (2018) An efficient content based image retrieval system using bayesnet and k-nn. Multimedia Tools Appl 77 (16):21557–21570
Article Google Scholar
Lee D-H (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: ICML 2013 Workshop: Challenges in Representation Learning (WREPL), p 07
Lee H, Hwang SJ, Shin J (2020) Self-supervised label augmentation via input transformations
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, october 22-29, 2017. IEEE Computer Society, pp 1012–1020
Li Y, Hu H, Zhou G (2019) Using data augmentation in continuous authentication on smartphones. IEEE Internet Things J. 6(1):628–640
Article Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE Computer society conference on computer vision and pattern recognition - workshops, pp 9–14
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. 9907, 10
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 12018–12027
Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Proc Mag 30(3):83–98
Article Google Scholar
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1227–1236
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos
Sohn K, Berthelot D, Li C-L, Zhang Z, Carlini N, Cubuk ED, Kurakin A, Zhang H, Raffel C (2020) Fixmatch: Simplifying semi-supervised learning with consistency and confidence
Thakkar K, Narayanan PJ (2018) Part-based graph convolutional network for action recognition
Tian D, Lu ZM, Chen X, Ma LH (2020) An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition. Multimed Tools Appl, 79(2)
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: 2014 IEEE Conference on computer vision and pattern recognition, pp 588–595
Wang L, Huynh DQ, Koniusz P (2020) A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process 29:15–28
Article MathSciNet Google Scholar
Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl-Based Syst 158:43–53
Article Google Scholar
Wang X, Qi C (2020) Detecting action-relevant regions for action recognition using a three-stage saliency detection technique. Multimed Tools Appl 79 (11):7413–7433
Article Google Scholar
Wei P, Sun H, Zheng N (2019) Learning composite latent structures for 3d human action representation and recognition. IEEE Trans Multimed 21 (9):2195–2208
Article Google Scholar
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2020) Unsupervised data augmentation for consistency training
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. 01
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2):4–10
Article Google Scholar
Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: 2014 International conference on digital image computing: Techniques and applications (DICTA), pp 1–8

Download references

Author information

Authors and Affiliations

School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, 213164, China
Chengyu Zhang, Jiuzhen Liang, Zhenjie Hou & Zhan Huan
Department of Computer Science and Technology, Hohai University, Nanjing, Jiangsu, 210000, China
Xing Li
Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotten, NC, 28213, USA
Yunfei Xia
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Lan Di

Authors

Chengyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiuzhen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunfei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Lan Di
View author publications
You can also search for this author in PubMed Google Scholar
Zhenjie Hou
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Huan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiuzhen Liang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Liang, J., Li, X. et al. Human action recognition based on enhanced data guidance and key node spatial temporal graph convolution. Multimed Tools Appl 81, 8349–8366 (2022). https://doi.org/10.1007/s11042-022-11947-8

Download citation

Received: 17 April 2021
Revised: 07 December 2021
Accepted: 03 January 2022
Published: 02 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-11947-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition based on enhanced data guidance and key node spatial temporal graph convolution

Abstract

Access this article

Similar content being viewed by others

Convolutional neural network: a review of models, methodologies and applications to object detection

Visual attention network

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action recognition based on enhanced data guidance and key node spatial temporal graph convolution

Abstract

Access this article

Similar content being viewed by others

Convolutional neural network: a review of models, methodologies and applications to object detection

Visual attention network

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation