A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

Wei, Pengpeng; Xiong, Lei; He, Yan; Yao, Leiyue

doi:10.1007/978-981-99-0416-7_53

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1015))

Included in the following conference series:

International Conference on Internet of Things, Communication and Intelligent Technology

468 Accesses

Abstract

Human action recognition (HAR) has been attracting researchers’ attentions because of its broad application prospect and significant research value. Although researchers have made great achievements, dealing with the various duration of the actions and insufficient training samples is still a big challenge for HAR. To address the two issues, a new human action representation method and a multi-scale deep neural network (DNN) model are proposed. The contributions are threefold. First, a new data structure which is named as joint motion matrix (JMM) is put forward for HAR description. By using the JMM, the motion information loss problem which is caused by spatio-temporal occlusion can be effectively solved. Second, a data augmentation method based on the characteristics of human physiological structure and human actions is proposed to generate action samples with brand new motion information. Third, the combined usage of spatial pyramid pooling (SPP) layer and global average pooling (GAP) layer not only enables the DNNs to receive multi-scale inputs, but also provides an effective way for the DNNs to extract more detail global features. The proposed method was evaluated on two small but challenging datasets, Florence 3D Actions and UTKinect-Action3D. The experiments show that the proposed data augmentation strategy is highly effective for DNN training, and the proposed method has strong capacities in multi-scale feature learning and computational cost reductions. The proposed approach achieves well performances on the two datasets with accuracies of 94.27% and 96.83%, which are the highest accuracies in the comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Human action recognition using Lie Group features and convolutional neural networks

Article 16 January 2020

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

Article 29 January 2024

References

Bakar, A.R.: Advances in human action recognition: an updated survey. IET Image Process. 13(13), 2381–2394 (2019)
Google Scholar
Marikkannu, P.: An efficient content based image retrieval using an optimized neural network for medical application. Multimed. Tools Appl. 79(31/32), 22277–22292 (2020)
Google Scholar
Myeongjun, K.: Spatio-temporal slowfast self-attention network for action recognition. In: 2020 IEEE International Conference on Image Processing, pp. 2206–2210. IEEE, Abu Dhabi, United Arab Emirates (2020)
Google Scholar
Dong, H.: Design of support vector machine based automatic classification method of sports video. Modern Electron. Tech. 42(7), 81 (2019)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 432–439 (2003)
Google Scholar
Bobick, A.F.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Google Scholar
Wang, L.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16(6), 1646–1661 (2007)
Google Scholar
Tommer, L.: Kinect identity: technology and experience. Computer 44(4), 94–96 (2011)
Article Google Scholar
Leiyue, Y., Weidong, W.: A new approach to fall detection based on the human torso motion model. Appl. Sci. 7(10), 993 (2017)
Article Google Scholar
Yong, D., Wang, W.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer vision and Pattern Recognition, pp. 1110–1118. IEEE, Boston, MA, USA (2015)
Google Scholar
Bo, L., Xuelian, C.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops, pp. 601–604. IEEE, Hong Kong (2017)
Google Scholar
Sheng, L., Tingting, J.: 3D human skeleton data compression for action recognition. In: 2019 IEEE Visual Communications and Image Processing, pp. 1–4. IEEE, Sydney, NSW, Australia (2019)
Google Scholar
Zhao, W., Yinfu, F.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (The Official Publication of the European Association for Signal Processing) 120, 691–701 (2016)
Google Scholar
Zhengyuan, Y., Yuncheng, L., Jianchao, Y., Jiebo, L.: Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Trans. Circ. Syst. Video Technol. 29(8), 2405–2415 (2019)
Google Scholar
Xinyi, L., Hongbo, Z., Yixiang, Z., Jinlong, H.: JTCR: Joint Trajectory Character Recognition for human action recognition. In: 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), pp. 350–353. IEEE, Yunlin, Taiwan, China (2019)
Google Scholar
Leiyue, Y., Wei, Y., Wei, H.: A data augmentation method for human action recognition using dense joint motion images. Appl. Soft Comput., 106713–106723 (2020)
Google Scholar
Min, L., Qiang, C., Shuicheng, Y.: Network in network. Multidiscip. Digital Publish. Inst. 17(11), 2556 (2014)
Google Scholar
Yaxin, L., Kesheng, W.: Modified convolutional neural network with global average pooling for intelligent fault diagnosis of industrial gearbox. Maintenance Reliab. 22(1), 63–72 (2020)
Google Scholar
Kaiming, H., Xiangyu, Z., Jian, S.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Google Scholar
Hyunmin, L., Kwangki, K.: Compact spatial pyramid pooling deep convolutional neural network based hand gestures decoder. Appl. Sci. 10(21), 7898 (2020)
Google Scholar
Chengwu, L., Lin, Q., Yifeng, H.: 3D human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2920–2932 (2018)
Google Scholar
Jian, L., Naveed, A., Ajmal, M.: Adversarial attack on skeleton-based human action recognition. IEEE Trans. Neur. Netw. Learn. Syst. 33(4), 1609–1622 (2022)
Google Scholar
Xiaojuan, W., Tianqi, L., Ziliang, G.: Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens. J. 21(21), 24653–24664 (2021)
Article Google Scholar
Zhanchao, H., Jianlin, W., Xuesong, W.: DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 522, 241–258 (2020)
Google Scholar
Lorenzo, S., Vincenzo, V.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 479–485 (2013)
Google Scholar
Lu, X., Chen, C.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE, Providence, RI, USA (2012)
Google Scholar
Dinhtan, P., Tiennam, N., Hai, V.: Analyzing role of joint subset selection in human action recognition. In: 2019 6th NAFOSTED Conference on Information and Computer Science, pp. 61–66. IEEE, Hanoi, Vietnam (2019)
Google Scholar
Ghaish, H., Shoukry, A.: Covp3dj: Skeleton parts-based-covariance descriptor for human action recognition. IEEE Trans. Circ. Syst. Video Technol. 30(7), 343–350 (2018)
Google Scholar
Vemulapalli, R., Chellapa, R.: Rolling rotations for recognizing human actions from 3D skeletal data. In: 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4471–4479. IEEE, Las Vegas, NV, USA (2016)
Google Scholar
Chongyang, D., Kai, L., Guang, L.: Spatio-temporal weighted posture motion features for human skeleton action recognition research. J. Comput. 43(1), 29–40 (2020)
Google Scholar
Jun, L., Dong, X., Gang, W.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 816–833 (2017)
Google Scholar
Ping, W., Hongbin, S., Nanning, Z.: Learning composite latent structures for 3D Human action representation and recognition. IEEE Trans. Multimed. 21(9), 2195–2208 (2019)
Google Scholar

Download references

Acknowledgement

This research was supported by the Scientific and Technological Projects of the Nanchang Science and Technology Bureau under Grant GJJ202010, GJJ202017, GJJ212015 and by the Special Project 03 of Jiangxi Provincial Department of Science and Technology under Grant 20212ABC03A36.

Author information

Authors and Affiliations

School of Information Engineering, Nanchang University, No. 999 of Xuefu Road, Nan Chang, 330031, China
Pengpeng Wei
School of Information Engineering, Jiangxi University of Chinese Medicine, No. 818 of Xingwan Road, Nan Chang, 330004, China
Yan He
The Center of Collaboration and Innovation, Jiangxi University of Technology, No. 99 of ZiYang Road, Nan Chang, 330098, China
Lei Xiong & Leiyue Yao

Authors

Pengpeng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yan He
View author publications
You can also search for this author in PubMed Google Scholar
Leiyue Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Xiong .

Editor information

Editors and Affiliations

Central South University, Changsha, Hunan, China
Jian Dong
School of Information Engineering, Shenzhen University, Shenzhen, Guangdong, China
Long Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, P., Xiong, L., He, Y., Yao, L. (2023). A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples. In: Dong, J., Zhang, L. (eds) Proceedings of the International Conference on Internet of Things, Communication and Intelligent Technology . IoTCIT 2022. Lecture Notes in Electrical Engineering, vol 1015. Springer, Singapore. https://doi.org/10.1007/978-981-99-0416-7_53

Download citation

DOI: https://doi.org/10.1007/978-981-99-0416-7_53
Published: 24 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0415-0
Online ISBN: 978-981-99-0416-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

Abstract

Access this chapter

Similar content being viewed by others

Human action recognition using Lie Group features and convolutional neural networks

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

Abstract

Access this chapter

Similar content being viewed by others

Human action recognition using Lie Group features and convolutional neural networks

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognition

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation