Direction-guided two-stream convolutional neural networks for skeleton-based action recognition

Su, Benyue; Zhang, Peng; Sun, Manzhen; Sheng, Min

doi:10.1007/s00500-023-07862-1

Direction-guided two-stream convolutional neural networks for skeleton-based action recognition

Neural Networks
Published: 09 February 2023

Volume 27, pages 11833–11842, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Benyue Su ORCID: orcid.org/0000-0003-1300-2083^1,3,
Peng Zhang^1,2,
Manzhen Sun^1,2 &
…
Min Sheng⁴

312 Accesses
2 Citations
Explore all metrics

Abstract

In skeleton-based action recognition, treating skeleton data as pseudoimages using convolutional neural networks (CNNs) has proven to be effective. However, among existing CNN-based approaches, most focus on modeling information at the joint-level ignoring the size and direction information of the skeleton edges, which play an important role in action recognition, and these approaches may not be optimal. In addition, combining the directionality of human motion to portray action motion variation information is rarely considered in existing approaches, although it is more natural and reasonable for action sequence modeling. In this work, we propose a novel direction-guided two-stream convolutional neural network for skeleton-based action recognition. In the first stream, our model focuses on our defined edge-level information (including edge and edge_motion information) with directionality in the skeleton data to explore the spatiotemporal features of the action. In the second stream, since the motion is directional, we define different skeleton edge directions and extract different motion information (including translation and rotation information) in different directions to better exploit the motion features of the action. In addition, we propose a description of human motion inscribed by a combination of translation and rotation, and explore how they are integrated. We conducted extensive experiments on two challenging datasets, the NTU-RGB+D 60 and NTU-RGB+D 120 datasets, to verify the superiority of our proposed method over state-of-the-art methods. The experimental results demonstrate that the proposed direction-guided edge-level information and motion information complement each other for better action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition

Article 12 May 2023

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Article 29 May 2020

Semantic-guided multi-scale human skeleton action recognition

Article 12 August 2022

Availability of data

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Caetano C, Sena J, Brémond F et al. (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–8
Chen H, Jiang Y, Ko H (2021) Action recognition with domain invariant features of skeleton image. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–7
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), IEEE, pp 579–583
Hou Y, Li Z, Wang P et al (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811
Article Google Scholar
Hou Y, Yu H, Zhou D et al (2021) Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition. Neural Comput Appl 33(23):16,439-16,450
Article Google Scholar
Jing C, Wei P, Sun H et al (2020) Spatiotemporal neural networks for action recognition based on joint loss. Neural Comput Appl 32(9):4293–4302
Article Google Scholar
Ke Q, Bennamoun M, An S et al (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855
Article MathSciNet MATH Google Scholar
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 25
Li C, Hou Y, Wang P et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Article Google Scholar
Li C, Zhong Q, Xie D, et al. (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055
Li M, Chen S, Chen X, et al. (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
Liu H, Tu J, Liu M (2017a) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106
Liu J, Shahroudy A, Xu D, et al (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, Springer, pp 816–833
Liu J, Wang G, Duan LY et al (2017) Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans Image Process 27(4):1586–1599
Article MathSciNet MATH Google Scholar
Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701
Article Google Scholar
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Article Google Scholar
Liu X, Li Y, Xia R (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300
Article Google Scholar
Naveenkumar M, Domnic S (2021) Spatio temporal joint distance maps for skeleton-based action recognition using convolutional neural networks. Int J Image Graphics 21(05):2140,001
Article Google Scholar
Naveenkumar M, Domnic S, et al (2020) Learning representations from spatio-temporal distance maps for 3d action recognition with convolutional neural networks
Qin Y, Mo L, Li C et al (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Visual Comput 36(3):621–631
Article Google Scholar
Qin Z, Liu Y, Ji P, et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv preprint arXiv:2105.01563
Ren B, Liu M, Ding R, et al (2020) A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907
Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Shi L, Zhang Y, Cheng J, et al (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921
Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511
Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511
Google Scholar
Trelinski J, Kwolek B (2021) Cnn-based and dtw features for human activity recognition on depth maps. Neural Comput Appl 33(21):14,551-14,563
Article Google Scholar
Wang P, Li W, Li C et al (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53
Article Google Scholar
Xia R, Li Y, Luo W (2021) Laga-net: Local-and-global attention network for skeleton based action recognition. IEEE Trans Multimedia
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Yao S, Muqing W, Weiyao X (2020) Two-stream convolutional neural network for skeleton-based action recognition. In: 2020 IEEE 6th international conference on computer and communications (ICCC), IEEE, pp 2436–2440
Yun L, Panpan X, Hui L et al (2021) A review of action recognition using joints based on deep learning. J Electronics Inf 43(6):1789–1802
Google Scholar
Zhang P, Lan C, Xing J, et al (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Zhang P, Xue J, Lan C, et al (2018) Adding attentiveness to the neurons in recurrent neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 135–151

Download references

Acknowledgements

This work was supported in part by the Leading Talent Team Project of Anhui, the province, and the Anqing Normal University and Tongling University Joint Training Postgraduate Research Innovation Fund Project (tlaqsflhy2).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing Normal University, Anqing, 246133, Anhui, China
Benyue Su, Peng Zhang & Manzhen Sun
School of Computer and Information, Anqing Normal University, Anqing, 246133, Anhui, China
Peng Zhang & Manzhen Sun
School of Mathematics and Computer, Tongling University, Tongling, 244061, Anhui, China
Benyue Su
School of Mathematics and Physics, Anqing Normal University, Anqing, 246133, Anhui, China
Min Sheng

Authors

Benyue Su
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Manzhen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Min Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All author contributed to the study conception and design, data collection and analysis were performed by Peng Zhang, Manzhen Sun and Min Sheng. The first draft of the manuscript was written by Benyue Su, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Benyue Su.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Su, B., Zhang, P., Sun, M. et al. Direction-guided two-stream convolutional neural networks for skeleton-based action recognition. Soft Comput 27, 11833–11842 (2023). https://doi.org/10.1007/s00500-023-07862-1

Download citation

Accepted: 19 January 2023
Published: 09 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00500-023-07862-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Direction-guided two-stream convolutional neural networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Semantic-guided multi-scale human skeleton action recognition

Availability of data

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Direction-guided two-stream convolutional neural networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Semantic-guided multi-scale human skeleton action recognition

Availability of data

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation