Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition

Kong, Jun; Wang, Shengquan; Jiang, Min; Liu, TianShan

doi:10.1007/s00521-023-08671-1

Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition

Original Article
Published: 14 June 2023

Volume 35, pages 18487–18504, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jun Kong ORCID: orcid.org/0000-0003-2551-4748¹,
Shengquan Wang²^na1,
Min Jiang²^na1 &
…
TianShan Liu³^na1

243 Accesses
1 Citation
Explore all metrics

Abstract

A novel mechanism for skeleton-based action recognition is proposed in this paper by enhancing and fusing diverse skeleton features from distinct levels. Graph convolutional neural networks (GCNs) have been proven to be efficient in skeleton-based action recognition. However, most graph convolutional networks tend to capture and fuse discriminative information from different forms of data in spatial neighborhoods. In that case, the deeper interactions among different forms of data as well as the extraction of information in the temporal and channel dimensions are limited. To tackle the issue, we propose the ternary adaptive graph convolution (TAGC) module to capture spatiotemporal information by graph convolution. A novel skeleton information, called parallax information, is explored from original joints or bones with little computation to further improve the performance of action recognition. In addition, in order to make better use of multiple streams, multi-stream feature fusion (MSFF) is proposed to mine deeper-level hybrid features supplementing the original streams. And a graph-based ternary enhance (GTE) module is proposed to further refine the extracted discriminative features. Finally, the proposed multi-stream ternary enhanced graph convolutional network (MS-TEGCN) achieves the state-of-the-art results through extensive experiments on three challenging datasets for skeleton-based action recognition, NTU-60, NTU-120 and Kinetics-Skeleton.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

Article 10 January 2023

Dual-domain graph convolutional networks for skeleton-based action recognition

Article 15 March 2022

Enhanced decoupling graph convolution network for skeleton-based action recognition

Article 26 October 2023

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Abdelbaky A, Aly S (2020) Human action recognition using short-time motion energy template images and pcanet features. Neural Comput Appl
Chang YL, Chan CS, Remagnino P (2021) Action recognition on continuous video. Neural Comput Appl 33(8)
Jing C, Wei P, Sun H, Zheng N (2020) Spatiotemporal neural networks for action recognition based on joint loss. Neural Comput Appl 32(1)
Hou Y, Yu H, Zhou D, Wang P, Zhang Q (2021) Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition. Neural Comput Appl 99:1–12
Google Scholar
Zong M, Wang R, Chen Z, Wang M, Wang X, Potgieter J (2020) Multi-cue based 3d residual network for action recognition. Neural Comput Appl, pp 1–15
Guha R, Khan AH, Singh PK, Sarkar R, Bhattacharjee D (2021) Cga: a new feature selection model for visual human action recognition. Neural Comput Appl 33(10):5267–5286
Article Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, pp 816–833
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Article Google Scholar
Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4471–4479
Koniusz P, Wang L, Cherian A (2021) Tensor representations for action recognition. IEEE Trans Pattern Anal Mach Intell 44(2):648–665
Article Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Wang L, Koniusz P, Huynh DQ (2019) Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNS. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8698–8708
Wang L, Koniusz P (2021) Self-supervising action recognition by statistical moment and subspace descriptors. In: Proceedings of the 29th ACM international conference on multimedia, pp 4324–4333
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Wang L, Huynh DQ, Koniusz P (2019) A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process 29:15–28
Article MathSciNet MATH Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7912–7921
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. arXiv preprint arXiv:1912.06971
Hu Z, Pan Z, Wang Q, Yu L, Fei S (2021) Forward-reverse adaptive graph convolutional networks for skeleton-based action recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.2021.12.054
Article Google Scholar
Liu K, Gao L, Khan NM, Qi L, Guan L (2021) Integrating vertex and edge features with graph convolutional networks for skeleton-based action recognition. Neurocomputing 466:190–201. https://doi.org/10.1016/j.neucom.2021.09.034
Article Google Scholar
Xie J, Miao Q, Liu R, Xin W, Tang L, Zhong S, Gao X (2021) Attention adjacency matrix based graph convolutional networks for skeleton-based action recognition. Neurocomputing 440:230–239. https://doi.org/10.1016/j.neucom.2021.02.001
Article Google Scholar
Zhang X, Xu C, Tian X, Tao D (2020) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060. https://doi.org/10.1109/TNNLS.2019.2935173
Article Google Scholar
Jie H, Li S, Gang S (2017) Squeeze-and-excitation networks, vol 99
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks
Li X, Hu X, Yang J (2019) Spatial group-wise enhance: enhancing semantic feature learning in convolutional networks
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module
Wang X, Girshick R, Gupta A, He K. Non-local neural networks
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Ren Z, Wang Z (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 8351–8361
Yang Z, Li Y, Yang J, Luo J (2018) Action recognition with spatio-temporal visual attention on skeleton image sequences. IEEE Trans Circuits Syst Video Technol 29(8):2405–2415
Article Google Scholar
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Si C, Chen W, Wang W, Wang L, Tan T(2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236
Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11045–11052
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236
Hu G, Cui B, Yu S (2019) Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention. In: 2019 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1216–1221
Li Q, Mo H, Zhao J, Hao H, Li H (2020) Spatio-temporal dual affine differential invariant for skeleton-based action recognition. arXiv preprint arXiv:2004.09802
Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2003.03007
Yang H, Yan D, Zhang L, Li D, Sun Y, You S, Maybank SJ (2020) Feedback graph convolutional network for skeleton-based action recognition. arXiv preprint arXiv:2003.07564
Gao J, He T, Zhou X, Ge S (2019) Focusing and diffusion: Bidirectional attentive graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1912.11521
Yu J, Yoon Y, Jeon M (2020) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. arXiv preprint arXiv:2003.07514
Zhang X, Xu C, Tao D (2020) Context aware graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14333–14342
Cho S, Maqbool M, Liu F, Foroosh H (2020) Self-attention network for skeleton-based human action recognition. In: The IEEE winter conference on applications of computer vision, pp 635–644
Song YF, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. ACM
Xu K, Ye F, Zhong Q, Xie D (2021) Topology-aware convolutional neural network for efficient skeleton-based action recognition
Wei J, Wang Y, Guo M, Lv P, Yang X, Xu M (2021) Dynamic hypergraph convolutional networks for skeleton-based action recognition
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC (2017) Skeleton based human action recognition with global context-aware attention lstm networks
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process
Liu M (2018) Recognizing human actions as the evolution of pose estimation maps. In: 2018 IEEE/CVF conference on computer vision and pattern recognition
Caetano C, Bremond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI)
Cheng K, Zhang Y, He X, Chen W, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semantics-guided neural networks for efficient skeleton-based human action recognition
Ye F, Pu S, Zhong Q, Li, C, Xie D, Tang H (2020) Dynamic GCN: context-enriched topology learning for skeleton-based action recognition

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61362030 and Grant 61201429, in part by the China Postdoctoral Science Foundation under Grant 2015M581720 and Grant 2016M600360, in part by the Jiangsu Postdoctoral Science Foundation under Grant 1601216C, in part by the Scientific and Technological Aid Program of Xinjiang under Grant 2017E0279, in part by 111 Projects under Grant B12018.

Author information

Shengquan Wang, Min Jiang and Tian Shan Liu have contributed equally to this work.

Authors and Affiliations

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, 214122, Jiangsu, China
Jun Kong
Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, 214122, Jiangsu, China
Shengquan Wang & Min Jiang
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, 999077, China
TianShan Liu

Authors

Jun Kong
View author publications
You can also search for this author in PubMed Google Scholar
Shengquan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Jiang
View author publications
You can also search for this author in PubMed Google Scholar
TianShan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Kong.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article. Authors are responsible for the correctness of the statements provided in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kong, J., Wang, S., Jiang, M. et al. Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition. Neural Comput & Applic 35, 18487–18504 (2023). https://doi.org/10.1007/s00521-023-08671-1

Download citation

Received: 23 May 2022
Accepted: 10 May 2023
Published: 14 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08671-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

Dual-domain graph convolutional networks for skeleton-based action recognition

Enhanced decoupling graph convolution network for skeleton-based action recognition

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

Dual-domain graph convolutional networks for skeleton-based action recognition

Enhanced decoupling graph convolution network for skeleton-based action recognition

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation