Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Shu, Yang; Li, Wanggen; Li, Doudou; Gao, Kun; Jie, Biao

doi:10.1007/978-981-99-8429-9_2

Yang Shu¹⁵,
Wanggen Li¹⁵,
Doudou Li¹⁵,
Kun Gao¹⁵ &
…
Biao Jie¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1143 Accesses

Abstract

Due to the small size, anti-interference and strong robustness of skeletal data, research on human skeleton-based action recognition has become a mainstream. However, due to the incomplete utilization of semantic information and insufficient time modeling, most methods may not be able to fully explore the connections between non-adjacent joints in the spatial or temporal dimensions. Therefore, we propose a Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition (MDKA-GCN) to solve the above problems. In the spatial configuration, we explicitly introduce the channel graph composed of high-level semantics (joint type and frame index) of joints into the network to enhance the representation ability of spatiotemporal features. MDKA-GCN uses joint-level, velocity-level and bone-level graphs to more deeply mine the hidden features of human skeletons. In the time configuration, two lightweight multi-scale strategies are proposed, which can be more robust to time changes. Extensive experiments on NTU-RGB+D 60 datasets and NTU-RGB+D 120 datasets show that MDKA-GCN has reached an advanced level, and surpasses the performance of most lightweight SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Google Scholar
Xu, K., Ye, F., Zhong, Q., Xie, D.: Topology-aware convolutional neural network for efficient skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2866–2874 (2022)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Google Scholar
Qin, Z., et al.: Fusing higher-order features in graph neural networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2022)
Google Scholar
Wen, Y.H., Gao, L., Fu, H., Zhang, F.L., Xia, S., Liu, Y.J.: Motif-GCNS with local and non-local temporal blocks for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2009–2023 (2022)
Article Google Scholar
Liu, Y., Zhang, H., Li, Y., He, K., Xu, D.: Skeleton-based human action recognition via large-kernel attention graph convolutional network. IEEE Trans. Visual Comput. Graphics 29(5), 2575–2585 (2023)
Article Google Scholar
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Wang, Y., Li, Y., Wang, G., Liu, X.: Multi-scale attention network for single image super-resolution. arXiv preprint arXiv:2209.14145 (2022)
Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual attention network. arXiv preprint arXiv:2202.09741 (2022)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)
Google Scholar
Cheng, K., Zhang, Y., He, X., Cheng, J., Lu, H.: Extremely lightweight skeleton-based action recognition with shiftGCN++. IEEE Trans. Image Process. 30, 7333–7348 (2021)
Article Google Scholar
Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1474–1488 (2022)
Article Google Scholar
Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: EPSANet: an efficient pyramid split attention block on convolutional neural network. arxiv 2021. arXiv preprint arXiv:2105.14447 (2021)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Google Scholar
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: NTU RGB+ D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2019)
Article Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Google Scholar
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Symbiotic graph neural networks for 3d skeleton-based human action recognition and motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3316–3333 (2021)
Article Google Scholar
Miao, S., Hou, Y., Gao, Z., Xu, M., Li, W.: A central difference graph convolutional operator for skeleton-based action recognition. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4893–4899 (2021)
Article Google Scholar
Duan, H., Wang, J., Chen, K., Lin, D.: PYSKL: towards good practices for skeleton action recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7351–7354 (2022)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61976006).

Author information

Authors and Affiliations

Anhui Normal University, Wuhu, China
Yang Shu, Wanggen Li, Doudou Li, Kun Gao & Biao Jie

Authors

Yang Shu
View author publications
You can also search for this author in PubMed Google Scholar
Wanggen Li
View author publications
You can also search for this author in PubMed Google Scholar
Doudou Li
View author publications
You can also search for this author in PubMed Google Scholar
Kun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Biao Jie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanggen Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, Y., Li, W., Li, D., Gao, K., Jie, B. (2024). Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_2
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition