Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

Chen, Minglong; Liang, Jiuzhen; Liu, Hao

doi:10.1007/s11227-024-05900-9

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

Published: 29 January 2024

Volume 80, pages 11614–11639, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Minglong Chen¹,
Jiuzhen Liang¹ &
Hao Liu¹

216 Accesses
Explore all metrics

Abstract

In recent years, action recognition has been an essential branch of video understanding and a hot research direction. Among them, the graph convolutional network (GCN) is widely used in skeleton-based action recognition and has achieved remarkable performance. However, in practical situations, recognizing human action often depends on the movement of a part of the joints. In the existing GCN-based methods, the size of a single frame of the skeleton graph is fixed, and all joints of the human body will participate in the whole operation process, so the critical joints in the moving process cannot be flexibly selected. Therefore, this paper takes the adaptive graph convolutional network (AGCN) as the baseline and uses the graph-pooling method to select the critical joints in the human moving process. We design two new networks: Pooling-AGCN and U-AGCN and use them to form the multi-stream P&U AGCNs for action recognition. Extensive experiments show the complementarity between the two networks and that the method proposed in this paper outperforms the recent work on the three large-scale public datasets (NTU-RGB+D 60, NTU-RGB+D 120, Kinetics-Skeleton).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Mixed graph convolution and residual transformation network for skeleton-based action recognition

Article 23 May 2021

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Article 13 June 2022

Two Stream Multi-Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Data Availability

The data and code that support the findings of this study are available from the corresponding author upon reasonable request.

References

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence. https://doi.org/10.48550/arXiv.1801.07455
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12026–12035. https://doi.org/10.48550/arXiv.1805.07694
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3595–3603. https://doi.org/10.1109/CVPR.2019.01230
Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 601–610. https://doi.org/10.48550/arXiv.1811.12013
Lee J, Lee M, Lee D, Lee S (2022) Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2208.10741
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/CVPR.2019.00532
Article Google Scholar
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011
Article Google Scholar
Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp 20–27
Weng J, Weng C, Yuan J (2017) Spatio-temporal naive-bayes nearest-neighbor (st-nbnn) for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4171–4180
Li B, Dai Y, Cheng X, Chen H, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. IEEE. https://doi.org/10.1109/ICMEW.2017.8026282
Article Google Scholar
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. https://doi.org/10.24963/ijcai.2018/109
Caetano C, Sena J, Brémond F, Santos JAD, Schwartz WR (2019) SkeleMotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition. https://doi.org/10.1109/AVSS.2019.8909840
Xu K, Ye F, Zhong Q, Xie D (2022) Topology-aware convolutional neural network for efficient skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence Vol 36, pp 2866–2874. https://doi.org/10.48550/arXiv.2112.04178
Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP.2017.2785279
Article MathSciNet Google Scholar
Zheng W, Li L, Zhang Z, Huang Y, Wang L (2019) Relational network for skeleton-based action recognition. In: International Conference on Multimedia and Expo https://doi.org/10.48550/arXiv.1805.02556
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): building a longer and deeper RNN. IEEE. https://doi.org/10.1109/CVPR.2018.00572
Article Google Scholar
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/J.CVIU.2021.103219
Article Google Scholar
Qiu H, Hou B, Ren B, Zhang X (2022) Spatio-temporal tuples transformer for skeleton-based action recognition. arXiv preprint arXiv:2201.02849
Bai R, Li M, Meng B, Li F, Jiang M, Ren J, Sun D (2022) Hierarchical graph convolutional skeleton transformer for action recognition. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 01–06
Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 55–63. https://doi.org/10.1145/3394171.3413941
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 143–152
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1227–1236. https://doi.org/10.1109/CVPR.2019.00132
Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488
Article Google Scholar
Zhou H, Liu Q, Wang Y (2023) Learning discriminative representations for skeleton based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 10608–10617
Huang X, Zhou H, Feng B, Wang X, Liu W, Wang J, Feng H, Han J, Ding E, Wang J (2023) Graph contrastive learning for skeleton-based action recognition. arXiv preprint arXiv:2301.10900
Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems 31. https://doi.org/10.48550/arXiv.1806.08804
Gao H, Ji S (2019) Graph u-nets. In: International Conference on Machine Learning, PMLR, pp 2083–2092. https://doi.org/10.48550/arXiv.1905.05178
Li M, Chen S, Zhang Y, Tsang I (2020) Graph cross networks with vertex infomax pooling. Adv Neural Inf Process Syst 33:14093–14105. https://doi.org/10.48550/arXiv.2010.01804
Article Google Scholar
Baek J, Kang M, Hwang SJ (2021) Accurate learning of graph representations with graph multiset pooling. arXiv preprint arXiv:2102.11533 https://doi.org/10.48550/arXiv.2102.11533
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 1010–1019. https://doi.org/10.1109/CVPR.2016.115
Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/tpami.2019.2916873
Article Google Scholar
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev A, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. arXiv:abs/1705.06950
Song Y-F, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1–5
Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051
Article Google Scholar
Ding X, Yang K, Chen W (2020) A semantics-guided graph convolutional network for skeleton-based action recognition. In: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence, pp 130–136
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1112–1121
Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 11045–11052
Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 2669–2676
Yang H, Gu Y, Zhu J, Hu K, Zhang X (2020) PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8:10040–10047. https://doi.org/10.1109/ACCESS.2020.2964115
Article Google Scholar
Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 52(3):2317–2331. https://doi.org/10.48550/arXiv.2003.07514
Article Google Scholar
Li L, Wang M, Ni B, Wang H, Yang J, Zhang W (2021) 3d human action representation learning via cross-view consistency pursuit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4741–4750
Ahn D, Kim S, Hong H, Ko BC (2023) Star-transformer: a spatio-temporal cross attention transformer for human action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision pp 3330–3339
Kim S, Ahn D, Ko BC (2023) Cross-modal learning with 3d deformable attention for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp 10265–10275
Cai J-X, Hu J, Tang X, Hung T-Y, Tan Y-P (2020) Deep historical long short-term memory network for action recognition. Neurocomputing 407:428–438
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Engineering, Changzhou University, Changzhou, 213164, China
Minglong Chen, Jiuzhen Liang & Hao Liu

Authors

Minglong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiuzhen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Minglong Chen. Jiuzhen Liang and Hao Liu provided supervision. The first draft of the manuscript was written by Minglong Chen, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiuzhen Liang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, M., Liang, J. & Liu, H. Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition. J Supercomput 80, 11614–11639 (2024). https://doi.org/10.1007/s11227-024-05900-9

Download citation

Accepted: 04 January 2024
Published: 29 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11227-024-05900-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Mixed graph convolution and residual transformation network for skeleton-based action recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Two Stream Multi-Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Data Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Mixed graph convolution and residual transformation network for skeleton-based action recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Two Stream Multi-Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Data Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation