Skip to main content
Log in

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, action recognition has been an essential branch of video understanding and a hot research direction. Among them, the graph convolutional network (GCN) is widely used in skeleton-based action recognition and has achieved remarkable performance. However, in practical situations, recognizing human action often depends on the movement of a part of the joints. In the existing GCN-based methods, the size of a single frame of the skeleton graph is fixed, and all joints of the human body will participate in the whole operation process, so the critical joints in the moving process cannot be flexibly selected. Therefore, this paper takes the adaptive graph convolutional network (AGCN) as the baseline and uses the graph-pooling method to select the critical joints in the human moving process. We design two new networks: Pooling-AGCN and U-AGCN and use them to form the multi-stream P&U AGCNs for action recognition. Extensive experiments show the complementarity between the two networks and that the method proposed in this paper outperforms the recent work on the three large-scale public datasets (NTU-RGB+D 60, NTU-RGB+D 120, Kinetics-Skeleton).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability

The data and code that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence. https://doi.org/10.48550/arXiv.1801.07455

  2. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12026–12035. https://doi.org/10.48550/arXiv.1805.07694

  3. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3595–3603. https://doi.org/10.1109/CVPR.2019.01230

  4. Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 601–610. https://doi.org/10.48550/arXiv.1811.12013

  5. Lee J, Lee M, Lee D, Lee S (2022) Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2208.10741

  6. Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/CVPR.2019.00532

    Article  Google Scholar 

  7. Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011

    Article  Google Scholar 

  8. Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp 20–27

  9. Weng J, Weng C, Yuan J (2017) Spatio-temporal naive-bayes nearest-neighbor (st-nbnn) for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4171–4180

  10. Li B, Dai Y, Cheng X, Chen H, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. IEEE. https://doi.org/10.1109/ICMEW.2017.8026282

    Article  Google Scholar 

  11. Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. https://doi.org/10.24963/ijcai.2018/109

  12. Caetano C, Sena J, Brémond F, Santos JAD, Schwartz WR (2019) SkeleMotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition. https://doi.org/10.1109/AVSS.2019.8909840

  13. Xu K, Ye F, Zhong Q, Xie D (2022) Topology-aware convolutional neural network for efficient skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence Vol 36, pp 2866–2874. https://doi.org/10.48550/arXiv.2112.04178

  14. Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP.2017.2785279

    Article  MathSciNet  Google Scholar 

  15. Zheng W, Li L, Zhang Z, Huang Y, Wang L (2019) Relational network for skeleton-based action recognition. In: International Conference on Multimedia and Expo https://doi.org/10.48550/arXiv.1805.02556

  16. Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): building a longer and deeper RNN. IEEE. https://doi.org/10.1109/CVPR.2018.00572

    Article  Google Scholar 

  17. Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/J.CVIU.2021.103219

    Article  Google Scholar 

  18. Qiu H, Hou B, Ren B, Zhang X (2022) Spatio-temporal tuples transformer for skeleton-based action recognition. arXiv preprint arXiv:2201.02849

  19. Bai R, Li M, Meng B, Li F, Jiang M, Ren J, Sun D (2022) Hierarchical graph convolutional skeleton transformer for action recognition. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 01–06

  20. Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 55–63. https://doi.org/10.1145/3394171.3413941

  21. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 143–152

  22. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1227–1236. https://doi.org/10.1109/CVPR.2019.00132

  23. Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488

    Article  Google Scholar 

  24. Zhou H, Liu Q, Wang Y (2023) Learning discriminative representations for skeleton based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 10608–10617

  25. Huang X, Zhou H, Feng B, Wang X, Liu W, Wang J, Feng H, Han J, Ding E, Wang J (2023) Graph contrastive learning for skeleton-based action recognition. arXiv preprint arXiv:2301.10900

  26. Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems 31. https://doi.org/10.48550/arXiv.1806.08804

  27. Gao H, Ji S (2019) Graph u-nets. In: International Conference on Machine Learning, PMLR, pp 2083–2092. https://doi.org/10.48550/arXiv.1905.05178

  28. Li M, Chen S, Zhang Y, Tsang I (2020) Graph cross networks with vertex infomax pooling. Adv Neural Inf Process Syst 33:14093–14105. https://doi.org/10.48550/arXiv.2010.01804

    Article  Google Scholar 

  29. Baek J, Kang M, Hwang SJ (2021) Accurate learning of graph representations with graph multiset pooling. arXiv preprint arXiv:2102.11533https://doi.org/10.48550/arXiv.2102.11533

  30. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 1010–1019. https://doi.org/10.1109/CVPR.2016.115

  31. Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/tpami.2019.2916873

    Article  Google Scholar 

  32. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev A, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. arXiv:abs/1705.06950

  33. Song Y-F, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1–5

  34. Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051

    Article  Google Scholar 

  35. Ding X, Yang K, Chen W (2020) A semantics-guided graph convolutional network for skeleton-based action recognition. In: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence, pp 130–136

  36. Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1112–1121

  37. Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 11045–11052

  38. Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 2669–2676

  39. Yang H, Gu Y, Zhu J, Hu K, Zhang X (2020) PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8:10040–10047. https://doi.org/10.1109/ACCESS.2020.2964115

    Article  Google Scholar 

  40. Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 52(3):2317–2331. https://doi.org/10.48550/arXiv.2003.07514

    Article  Google Scholar 

  41. Li L, Wang M, Ni B, Wang H, Yang J, Zhang W (2021) 3d human action representation learning via cross-view consistency pursuit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4741–4750

  42. Ahn D, Kim S, Hong H, Ko BC (2023) Star-transformer: a spatio-temporal cross attention transformer for human action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision pp 3330–3339

  43. Kim S, Ahn D, Ko BC (2023) Cross-modal learning with 3d deformable attention for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp 10265–10275

  44. Cai J-X, Hu J, Tang X, Hung T-Y, Tan Y-P (2020) Deep historical long short-term memory network for action recognition. Neurocomputing 407:428–438

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Minglong Chen. Jiuzhen Liang and Hao Liu provided supervision. The first draft of the manuscript was written by Minglong Chen, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiuzhen Liang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, M., Liang, J. & Liu, H. Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition. J Supercomput 80, 11614–11639 (2024). https://doi.org/10.1007/s11227-024-05900-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-05900-9

Keywords

Navigation