Skip to main content
Log in

Direction-guided two-stream convolutional neural networks for skeleton-based action recognition

  • Neural Networks
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In skeleton-based action recognition, treating skeleton data as pseudoimages using convolutional neural networks (CNNs) has proven to be effective. However, among existing CNN-based approaches, most focus on modeling information at the joint-level ignoring the size and direction information of the skeleton edges, which play an important role in action recognition, and these approaches may not be optimal. In addition, combining the directionality of human motion to portray action motion variation information is rarely considered in existing approaches, although it is more natural and reasonable for action sequence modeling. In this work, we propose a novel direction-guided two-stream convolutional neural network for skeleton-based action recognition. In the first stream, our model focuses on our defined edge-level information (including edge and edge_motion information) with directionality in the skeleton data to explore the spatiotemporal features of the action. In the second stream, since the motion is directional, we define different skeleton edge directions and extract different motion information (including translation and rotation information) in different directions to better exploit the motion features of the action. In addition, we propose a description of human motion inscribed by a combination of translation and rotation, and explore how they are integrated. We conducted extensive experiments on two challenging datasets, the NTU-RGB+D 60 and NTU-RGB+D 120 datasets, to verify the superiority of our proposed method over state-of-the-art methods. The experimental results demonstrate that the proposed direction-guided edge-level information and motion information complement each other for better action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  • Caetano C, Sena J, Brémond F et al. (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–8

  • Chen H, Jiang Y, Ko H (2021) Action recognition with domain invariant features of skeleton image. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–7

  • Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), IEEE, pp 579–583

  • Hou Y, Li Z, Wang P et al (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811

    Article  Google Scholar 

  • Hou Y, Yu H, Zhou D et al (2021) Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition. Neural Comput Appl 33(23):16,439-16,450

    Article  Google Scholar 

  • Jing C, Wei P, Sun H et al (2020) Spatiotemporal neural networks for action recognition based on joint loss. Neural Comput Appl 32(9):4293–4302

    Article  Google Scholar 

  • Ke Q, Bennamoun M, An S et al (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855

    Article  MathSciNet  MATH  Google Scholar 

  • Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 25

  • Li C, Hou Y, Wang P et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628

    Article  Google Scholar 

  • Li C, Zhong Q, Xie D, et al. (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055

  • Li M, Chen S, Chen X, et al. (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603

  • Liu H, Tu J, Liu M (2017a) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106

  • Liu J, Shahroudy A, Xu D, et al (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, Springer, pp 816–833

  • Liu J, Wang G, Duan LY et al (2017) Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans Image Process 27(4):1586–1599

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701

    Article  Google Scholar 

  • Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362

    Article  Google Scholar 

  • Liu X, Li Y, Xia R (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300

    Article  Google Scholar 

  • Naveenkumar M, Domnic S (2021) Spatio temporal joint distance maps for skeleton-based action recognition using convolutional neural networks. Int J Image Graphics 21(05):2140,001

    Article  Google Scholar 

  • Naveenkumar M, Domnic S, et al (2020) Learning representations from spatio-temporal distance maps for 3d action recognition with convolutional neural networks

  • Qin Y, Mo L, Li C et al (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Visual Comput 36(3):621–631

    Article  Google Scholar 

  • Qin Z, Liu Y, Ji P, et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv preprint arXiv:2105.01563

  • Ren B, Liu M, Ding R, et al (2020) A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907

  • Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  • Shi L, Zhang Y, Cheng J, et al (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921

  • Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511

  • Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511

    Google Scholar 

  • Trelinski J, Kwolek B (2021) Cnn-based and dtw features for human activity recognition on depth maps. Neural Comput Appl 33(21):14,551-14,563

    Article  Google Scholar 

  • Wang P, Li W, Li C et al (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53

    Article  Google Scholar 

  • Xia R, Li Y, Luo W (2021) Laga-net: Local-and-global attention network for skeleton based action recognition. IEEE Trans Multimedia

  • Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

  • Yao S, Muqing W, Weiyao X (2020) Two-stream convolutional neural network for skeleton-based action recognition. In: 2020 IEEE 6th international conference on computer and communications (ICCC), IEEE, pp 2436–2440

  • Yun L, Panpan X, Hui L et al (2021) A review of action recognition using joints based on deep learning. J Electronics Inf 43(6):1789–1802

    Google Scholar 

  • Zhang P, Lan C, Xing J, et al (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126

  • Zhang P, Xue J, Lan C, et al (2018) Adding attentiveness to the neurons in recurrent neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 135–151

Download references

Acknowledgements

This work was supported in part by the Leading Talent Team Project of Anhui, the province, and the Anqing Normal University and Tongling University Joint Training Postgraduate Research Innovation Fund Project (tlaqsflhy2).

Author information

Authors and Affiliations

Authors

Contributions

All author contributed to the study conception and design, data collection and analysis were performed by Peng Zhang, Manzhen Sun and Min Sheng. The first draft of the manuscript was written by Benyue Su, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Benyue Su.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, B., Zhang, P., Sun, M. et al. Direction-guided two-stream convolutional neural networks for skeleton-based action recognition. Soft Comput 27, 11833–11842 (2023). https://doi.org/10.1007/s00500-023-07862-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-07862-1

Keywords

Navigation