Skip to main content
Log in

Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A novel mechanism for skeleton-based action recognition is proposed in this paper by enhancing and fusing diverse skeleton features from distinct levels. Graph convolutional neural networks (GCNs) have been proven to be efficient in skeleton-based action recognition. However, most graph convolutional networks tend to capture and fuse discriminative information from different forms of data in spatial neighborhoods. In that case, the deeper interactions among different forms of data as well as the extraction of information in the temporal and channel dimensions are limited. To tackle the issue, we propose the ternary adaptive graph convolution (TAGC) module to capture spatiotemporal information by graph convolution. A novel skeleton information, called parallax information, is explored from original joints or bones with little computation to further improve the performance of action recognition. In addition, in order to make better use of multiple streams, multi-stream feature fusion (MSFF) is proposed to mine deeper-level hybrid features supplementing the original streams. And a graph-based ternary enhance (GTE) module is proposed to further refine the extracted discriminative features. Finally, the proposed multi-stream ternary enhanced graph convolutional network (MS-TEGCN) achieves the state-of-the-art results through extensive experiments on three challenging datasets for skeleton-based action recognition, NTU-60, NTU-120 and Kinetics-Skeleton.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Abdelbaky A, Aly S (2020) Human action recognition using short-time motion energy template images and pcanet features. Neural Comput Appl

  2. Chang YL, Chan CS, Remagnino P (2021) Action recognition on continuous video. Neural Comput Appl 33(8)

  3. Jing C, Wei P, Sun H, Zheng N (2020) Spatiotemporal neural networks for action recognition based on joint loss. Neural Comput Appl 32(1)

  4. Hou Y, Yu H, Zhou D, Wang P, Zhang Q (2021) Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition. Neural Comput Appl 99:1–12

    Google Scholar 

  5. Zong M, Wang R, Chen Z, Wang M, Wang X, Potgieter J (2020) Multi-cue based 3d residual network for action recognition. Neural Comput Appl, pp 1–15

  6. Guha R, Khan AH, Singh PK, Sarkar R, Bhattacharjee D (2021) Cga: a new feature selection model for visual human action recognition. Neural Comput Appl 33(10):5267–5286

    Article  Google Scholar 

  7. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595

  8. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  9. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

  10. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, pp 816–833

  11. Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466

  12. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297

  13. Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106

  14. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362

    Article  Google Scholar 

  15. Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4471–4479

  16. Koniusz P, Wang L, Cherian A (2021) Tensor representations for action recognition. IEEE Trans Pattern Anal Mach Intell 44(2):648–665

    Article  Google Scholar 

  17. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

  18. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035

  19. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27

  20. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941

  21. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  22. Wang L, Koniusz P, Huynh DQ (2019) Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNS. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8698–8708

  23. Wang L, Koniusz P (2021) Self-supervising action recognition by statistical moment and subspace descriptors. In: Proceedings of the 29th ACM international conference on multimedia, pp 4324–4333

  24. Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631

  25. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126

  26. Wang L, Huynh DQ, Koniusz P (2019) A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process 29:15–28

    Article  MathSciNet  MATH  Google Scholar 

  27. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7912–7921

  28. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. arXiv preprint arXiv:1912.06971

  29. Hu Z, Pan Z, Wang Q, Yu L, Fei S (2021) Forward-reverse adaptive graph convolutional networks for skeleton-based action recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.2021.12.054

    Article  Google Scholar 

  30. Liu K, Gao L, Khan NM, Qi L, Guan L (2021) Integrating vertex and edge features with graph convolutional networks for skeleton-based action recognition. Neurocomputing 466:190–201. https://doi.org/10.1016/j.neucom.2021.09.034

    Article  Google Scholar 

  31. Xie J, Miao Q, Liu R, Xin W, Tang L, Zhong S, Gao X (2021) Attention adjacency matrix based graph convolutional networks for skeleton-based action recognition. Neurocomputing 440:230–239. https://doi.org/10.1016/j.neucom.2021.02.001

    Article  Google Scholar 

  32. Zhang X, Xu C, Tian X, Tao D (2020) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060. https://doi.org/10.1109/TNNLS.2019.2935173

    Article  Google Scholar 

  33. Jie H, Li S, Gang S (2017) Squeeze-and-excitation networks, vol 99

  34. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks

  35. Li X, Hu X, Yang J (2019) Spatial group-wise enhance: enhancing semantic feature learning in convolutional networks

  36. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module

  37. Wang X, Girshick R, Gupta A, He K. Non-local neural networks

  38. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154

  39. Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Ren Z, Wang Z (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 8351–8361

  40. Yang Z, Li Y, Yang J, Luo J (2018) Action recognition with spatio-temporal visual attention on skeleton image sequences. IEEE Trans Circuits Syst Video Technol 29(8):2405–2415

    Article  Google Scholar 

  41. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314

  42. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950

  43. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  44. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

  45. Si C, Chen W, Wang W, Wang L, Tan T(2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236

  46. Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11045–11052

  47. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236

  48. Hu G, Cui B, Yu S (2019) Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention. In: 2019 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1216–1221

  49. Li Q, Mo H, Zhao J, Hao H, Li H (2020) Spatio-temporal dual affine differential invariant for skeleton-based action recognition. arXiv preprint arXiv:2004.09802

  50. Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2003.03007

  51. Yang H, Yan D, Zhang L, Li D, Sun Y, You S, Maybank SJ (2020) Feedback graph convolutional network for skeleton-based action recognition. arXiv preprint arXiv:2003.07564

  52. Gao J, He T, Zhou X, Ge S (2019) Focusing and diffusion: Bidirectional attentive graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1912.11521

  53. Yu J, Yoon Y, Jeon M (2020) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. arXiv preprint arXiv:2003.07514

  54. Zhang X, Xu C, Tao D (2020) Context aware graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14333–14342

  55. Cho S, Maqbool M, Liu F, Foroosh H (2020) Self-attention network for skeleton-based human action recognition. In: The IEEE winter conference on applications of computer vision, pp 635–644

  56. Song YF, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. ACM

  57. Xu K, Ye F, Zhong Q, Xie D (2021) Topology-aware convolutional neural network for efficient skeleton-based action recognition

  58. Wei J, Wang Y, Guo M, Lv P, Yang X, Xu M (2021) Dynamic hypergraph convolutional networks for skeleton-based action recognition

  59. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152

  60. Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387

  61. Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC (2017) Skeleton based human action recognition with global context-aware attention lstm networks

  62. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process

  63. Liu M (2018) Recognizing human actions as the evolution of pose estimation maps. In: 2018 IEEE/CVF conference on computer vision and pattern recognition

  64. Caetano C, Bremond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI)

  65. Cheng K, Zhang Y, He X, Chen W, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  66. Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semantics-guided neural networks for efficient skeleton-based human action recognition

  67. Ye F, Pu S, Zhong Q, Li, C, Xie D, Tang H (2020) Dynamic GCN: context-enriched topology learning for skeleton-based action recognition

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61362030 and Grant 61201429, in part by the China Postdoctoral Science Foundation under Grant 2015M581720 and Grant 2016M600360, in part by the Jiangsu Postdoctoral Science Foundation under Grant 1601216C, in part by the Scientific and Technological Aid Program of Xinjiang under Grant 2017E0279, in part by 111 Projects under Grant B12018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Kong.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article. Authors are responsible for the correctness of the statements provided in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, J., Wang, S., Jiang, M. et al. Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition. Neural Comput & Applic 35, 18487–18504 (2023). https://doi.org/10.1007/s00521-023-08671-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08671-1

Keywords

Navigation