Skip to main content
Log in

Asymmetric information-regularized learning for skeleton-based action recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Skeleton-based action recognition has recently achieved remarkable progress, which is typically formulated as a spatial-temporal graph-based classification problem. Nevertheless, most existing approaches straightforwardly model the skeleton topology via a pure encoder and lack explicit guidance to promote the representation capability. To handle the above constraint, the proposed Asymmetric Information-Regularized Graph Convolutional Network (AIR-GCN) explores an effective asymmetric paradigm based on information theory, to force the encoder to learn more representative features. Furthermore, each sample indeed has a unique spatial-temporal topology due to the dynamic action process and AIR-GCN introduces two novel operators to learn spatial-temporal representation beyond the inherent structural relations: leveraging the Topology-regularized Spatial Routing (TrSR) to encode instance-dependent relational graphs and the Topology-regularized Temporal Routing (TrTR) to capture action-specific motion patterns for reducing the ambiguity of highly similar actions. Extensive experiments are conducted on four widely used datasets: Northwestern-UCLA, NTU RGB+D 60, NTU RGB+D 120 and Kinetics Skeleton. The results demonstrate that AIR-GCN achieves notably better performance compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Wang Y, Li M, Cai H, Chen WM, Han S (2022) Lite pose: Efficient architecture design for 2d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp 13126–13136)

  2. Li W, Liu H, Tang H, Wang P, Van Gool L (2022) Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp 13147–13156)

  3. Yan Y, Xu J, Ni B, Zhang W, Yang X (2017) Skeleton-aided articulated motion generation. In Proceedings of the 25th ACM international conference on Multimedia (pp 199–207)

  4. Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence pp 786–792

  5. H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani (2022) Infogcn: Representation learning for human skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 20186–20196

  6. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence

  7. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 12026–12035

  8. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 143–152

  9. Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision pp 13359–13368

  10. Ash RB (2012) Information theory. Courier Corporation

  11. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Process Mag 34(4):18–42

    Article  Google Scholar 

  12. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems vol 29

  13. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems vol 30

  14. Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Advances in neural information processing systems vol 31

  15. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362

    Article  Google Scholar 

  16. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 3288–3297

  17. Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 499–508

  18. Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In Proceedings of the 27th ACM International Conference on Multimedia pp 601–610

  19. Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 1112–1121

  20. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 7912–7921

  21. Ke L, Peng KC, Lyu S (2022) Towards to-at spatio-temporal focus for skeleton-based action recognition. arXiv:2202.02314

  22. Wen YH, Gao L, Fu H, Zhang FL, Xia S, Liu YJ (2022) Motif-gcns with local and non-local temporal blocks for skeleton-based action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence

  23. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  24. Li M, Chen S, Zhao Y, Zhang Y, Wang Y, Tian Q (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 214–223

  25. Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 1010–1019

  26. Liu J, Shahroudy A, Perez M, Wang G, Duan LY, Kot AC (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701

    Article  Google Scholar 

  27. Wang J, Nie X, Xia Y, Wu Y, Zhu SC (2014) Cross-view action modeling, learning and recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 2649–2656

  28. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset arXiv:1705.06950

  29. Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 7291–7299

  30. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 770–778

  31. Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 183–192

  32. Wang J, Liu Z, Wu Y, Yuan J (2013) Learning actionlet ensemble for 3d human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927

    Article  Google Scholar 

  33. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition pp 1110–1118

  34. Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In Proceedings of the IEEE international conference on computer vision pp 1012–1020

  35. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 1227–1236

  36. Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In European Conference on Computer Vision pp 536–553, Springer

  37. Ding C, Liu K, Korhonen J, Belyaev E (2021) Spatio-temporal difference descriptor for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence 35:1227–1235

  38. Chen Z, Li S, Yang B, Li Q, Liu H (2021) Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence 35:1113–1122

    Article  Google Scholar 

  39. Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia pp 55–63

  40. Xu K, Ye F, Zhong Q, Xie D (2022) Topology-aware convolutional neural network for efficient skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence 36:2866–2874

    Article  Google Scholar 

  41. Alsarhan T, Ali U, Lu H (2022) Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition. Comput Vis Image Underst 216:103348

    Article  Google Scholar 

  42. Liu Y, Zhang H, Li Y, He K, Xu D (2023) Skeleton-based human action recognition via large-kernel attention graph convolutional network. IEEE Trans Visual Comput Graphics 29(5):2575–2585

    Article  Google Scholar 

  43. Gedamu K, Ji Y, Gao L, Yang Y, Shen HT (2023) Relation-mining self-attention network for skeleton-based human action recognition. Pattern Recogn 139:109455

    Article  Google Scholar 

  44. Liu Y, Zhang H, Xu D, He K (2022) Graph transformer network with temporal kernel attention for skeleton-based action recognition. Knowl-Based Syst 240:108146

    Article  Google Scholar 

  45. Song YF, Zhang Z, Shan C, Wang L (2023) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488

    Article  Google Scholar 

  46. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 3595–3603

  47. Chen T, Zhou D, Wang J, Wang S, Guan Y, He X, Ding E (2021) Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. In Proceedings of the 29th ACM international conference on multimedia pp 4334–4342

  48. Duan H, Zhao Y, Chen K, Lin D, Dai B (2022) Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 2969–2978

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China 61876158, Fundamental Research Funds for the Central Universities 2682021ZTPY030.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kunlun Wu.

Ethics declarations

Conflict of interest statement

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, K., Gong, X. Asymmetric information-regularized learning for skeleton-based action recognition. Appl Intell 53, 31065–31076 (2023). https://doi.org/10.1007/s10489-023-05173-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05173-4

Keywords

Navigation