Abstract
In recent years, action recognition has been an essential branch of video understanding and a hot research direction. Among them, the graph convolutional network (GCN) is widely used in skeleton-based action recognition and has achieved remarkable performance. However, in practical situations, recognizing human action often depends on the movement of a part of the joints. In the existing GCN-based methods, the size of a single frame of the skeleton graph is fixed, and all joints of the human body will participate in the whole operation process, so the critical joints in the moving process cannot be flexibly selected. Therefore, this paper takes the adaptive graph convolutional network (AGCN) as the baseline and uses the graph-pooling method to select the critical joints in the human moving process. We design two new networks: Pooling-AGCN and U-AGCN and use them to form the multi-stream P&U AGCNs for action recognition. Extensive experiments show the complementarity between the two networks and that the method proposed in this paper outperforms the recent work on the three large-scale public datasets (NTU-RGB+D 60, NTU-RGB+D 120, Kinetics-Skeleton).
Similar content being viewed by others
Data Availability
The data and code that support the findings of this study are available from the corresponding author upon reasonable request.
References
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence. https://doi.org/10.48550/arXiv.1801.07455
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12026–12035. https://doi.org/10.48550/arXiv.1805.07694
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3595–3603. https://doi.org/10.1109/CVPR.2019.01230
Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 601–610. https://doi.org/10.48550/arXiv.1811.12013
Lee J, Lee M, Lee D, Lee S (2022) Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2208.10741
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/CVPR.2019.00532
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Comput Vis Image Underst 158:85–105. https://doi.org/10.1016/j.cviu.2017.01.011
Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp 20–27
Weng J, Weng C, Yuan J (2017) Spatio-temporal naive-bayes nearest-neighbor (st-nbnn) for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4171–4180
Li B, Dai Y, Cheng X, Chen H, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. IEEE. https://doi.org/10.1109/ICMEW.2017.8026282
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. https://doi.org/10.24963/ijcai.2018/109
Caetano C, Sena J, Brémond F, Santos JAD, Schwartz WR (2019) SkeleMotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition. https://doi.org/10.1109/AVSS.2019.8909840
Xu K, Ye F, Zhong Q, Xie D (2022) Topology-aware convolutional neural network for efficient skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence Vol 36, pp 2866–2874. https://doi.org/10.48550/arXiv.2112.04178
Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP.2017.2785279
Zheng W, Li L, Zhang Z, Huang Y, Wang L (2019) Relational network for skeleton-based action recognition. In: International Conference on Multimedia and Expo https://doi.org/10.48550/arXiv.1805.02556
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): building a longer and deeper RNN. IEEE. https://doi.org/10.1109/CVPR.2018.00572
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219. https://doi.org/10.1016/J.CVIU.2021.103219
Qiu H, Hou B, Ren B, Zhang X (2022) Spatio-temporal tuples transformer for skeleton-based action recognition. arXiv preprint arXiv:2201.02849
Bai R, Li M, Meng B, Li F, Jiang M, Ren J, Sun D (2022) Hierarchical graph convolutional skeleton transformer for action recognition. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 01–06
Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 55–63. https://doi.org/10.1145/3394171.3413941
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 143–152
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1227–1236. https://doi.org/10.1109/CVPR.2019.00132
Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488
Zhou H, Liu Q, Wang Y (2023) Learning discriminative representations for skeleton based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 10608–10617
Huang X, Zhou H, Feng B, Wang X, Liu W, Wang J, Feng H, Han J, Ding E, Wang J (2023) Graph contrastive learning for skeleton-based action recognition. arXiv preprint arXiv:2301.10900
Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems 31. https://doi.org/10.48550/arXiv.1806.08804
Gao H, Ji S (2019) Graph u-nets. In: International Conference on Machine Learning, PMLR, pp 2083–2092. https://doi.org/10.48550/arXiv.1905.05178
Li M, Chen S, Zhang Y, Tsang I (2020) Graph cross networks with vertex infomax pooling. Adv Neural Inf Process Syst 33:14093–14105. https://doi.org/10.48550/arXiv.2010.01804
Baek J, Kang M, Hwang SJ (2021) Accurate learning of graph representations with graph multiset pooling. arXiv preprint arXiv:2102.11533https://doi.org/10.48550/arXiv.2102.11533
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 1010–1019. https://doi.org/10.1109/CVPR.2016.115
Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/tpami.2019.2916873
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev A, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. arXiv:abs/1705.06950
Song Y-F, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1–5
Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925. https://doi.org/10.1109/TCSVT.2020.3015051
Ding X, Yang K, Chen W (2020) A semantics-guided graph convolutional network for skeleton-based action recognition. In: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence, pp 130–136
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 1112–1121
Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 11045–11052
Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence vol 34, pp 2669–2676
Yang H, Gu Y, Zhu J, Hu K, Zhang X (2020) PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8:10040–10047. https://doi.org/10.1109/ACCESS.2020.2964115
Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 52(3):2317–2331. https://doi.org/10.48550/arXiv.2003.07514
Li L, Wang M, Ni B, Wang H, Yang J, Zhang W (2021) 3d human action representation learning via cross-view consistency pursuit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4741–4750
Ahn D, Kim S, Hong H, Ko BC (2023) Star-transformer: a spatio-temporal cross attention transformer for human action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision pp 3330–3339
Kim S, Ahn D, Ko BC (2023) Cross-modal learning with 3d deformable attention for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp 10265–10275
Cai J-X, Hu J, Tang X, Hung T-Y, Tan Y-P (2020) Deep historical long short-term memory network for action recognition. Neurocomputing 407:428–438
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Minglong Chen. Jiuzhen Liang and Hao Liu provided supervision. The first draft of the manuscript was written by Minglong Chen, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
This article does not contain any studies with animals performed by any of the authors.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, M., Liang, J. & Liu, H. Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition. J Supercomput 80, 11614–11639 (2024). https://doi.org/10.1007/s11227-024-05900-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-05900-9