Abstract
Human action recognition (HAR) has been attracting researchers’ attentions because of its broad application prospect and significant research value. Although researchers have made great achievements, dealing with the various duration of the actions and insufficient training samples is still a big challenge for HAR. To address the two issues, a new human action representation method and a multi-scale deep neural network (DNN) model are proposed. The contributions are threefold. First, a new data structure which is named as joint motion matrix (JMM) is put forward for HAR description. By using the JMM, the motion information loss problem which is caused by spatio-temporal occlusion can be effectively solved. Second, a data augmentation method based on the characteristics of human physiological structure and human actions is proposed to generate action samples with brand new motion information. Third, the combined usage of spatial pyramid pooling (SPP) layer and global average pooling (GAP) layer not only enables the DNNs to receive multi-scale inputs, but also provides an effective way for the DNNs to extract more detail global features. The proposed method was evaluated on two small but challenging datasets, Florence 3D Actions and UTKinect-Action3D. The experiments show that the proposed data augmentation strategy is highly effective for DNN training, and the proposed method has strong capacities in multi-scale feature learning and computational cost reductions. The proposed approach achieves well performances on the two datasets with accuracies of 94.27% and 96.83%, which are the highest accuracies in the comparisons.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bakar, A.R.: Advances in human action recognition: an updated survey. IET Image Process. 13(13), 2381–2394 (2019)
Marikkannu, P.: An efficient content based image retrieval using an optimized neural network for medical application. Multimed. Tools Appl. 79(31/32), 22277–22292 (2020)
Myeongjun, K.: Spatio-temporal slowfast self-attention network for action recognition. In: 2020 IEEE International Conference on Image Processing, pp. 2206–2210. IEEE, Abu Dhabi, United Arab Emirates (2020)
Dong, H.: Design of support vector machine based automatic classification method of sports video. Modern Electron. Tech. 42(7), 81 (2019)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 432–439 (2003)
Bobick, A.F.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Wang, L.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16(6), 1646–1661 (2007)
Tommer, L.: Kinect identity: technology and experience. Computer 44(4), 94–96 (2011)
Leiyue, Y., Weidong, W.: A new approach to fall detection based on the human torso motion model. Appl. Sci. 7(10), 993 (2017)
Yong, D., Wang, W.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer vision and Pattern Recognition, pp. 1110–1118. IEEE, Boston, MA, USA (2015)
Bo, L., Xuelian, C.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops, pp. 601–604. IEEE, Hong Kong (2017)
Sheng, L., Tingting, J.: 3D human skeleton data compression for action recognition. In: 2019 IEEE Visual Communications and Image Processing, pp. 1–4. IEEE, Sydney, NSW, Australia (2019)
Zhao, W., Yinfu, F.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (The Official Publication of the European Association for Signal Processing) 120, 691–701 (2016)
Zhengyuan, Y., Yuncheng, L., Jianchao, Y., Jiebo, L.: Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Trans. Circ. Syst. Video Technol. 29(8), 2405–2415 (2019)
Xinyi, L., Hongbo, Z., Yixiang, Z., Jinlong, H.: JTCR: Joint Trajectory Character Recognition for human action recognition. In: 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), pp. 350–353. IEEE, Yunlin, Taiwan, China (2019)
Leiyue, Y., Wei, Y., Wei, H.: A data augmentation method for human action recognition using dense joint motion images. Appl. Soft Comput., 106713–106723 (2020)
Min, L., Qiang, C., Shuicheng, Y.: Network in network. Multidiscip. Digital Publish. Inst. 17(11), 2556 (2014)
Yaxin, L., Kesheng, W.: Modified convolutional neural network with global average pooling for intelligent fault diagnosis of industrial gearbox. Maintenance Reliab. 22(1), 63–72 (2020)
Kaiming, H., Xiangyu, Z., Jian, S.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Hyunmin, L., Kwangki, K.: Compact spatial pyramid pooling deep convolutional neural network based hand gestures decoder. Appl. Sci. 10(21), 7898 (2020)
Chengwu, L., Lin, Q., Yifeng, H.: 3D human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2920–2932 (2018)
Jian, L., Naveed, A., Ajmal, M.: Adversarial attack on skeleton-based human action recognition. IEEE Trans. Neur. Netw. Learn. Syst. 33(4), 1609–1622 (2022)
Xiaojuan, W., Tianqi, L., Ziliang, G.: Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens. J. 21(21), 24653–24664 (2021)
Zhanchao, H., Jianlin, W., Xuesong, W.: DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 522, 241–258 (2020)
Lorenzo, S., Vincenzo, V.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 479–485 (2013)
Lu, X., Chen, C.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE, Providence, RI, USA (2012)
Dinhtan, P., Tiennam, N., Hai, V.: Analyzing role of joint subset selection in human action recognition. In: 2019 6th NAFOSTED Conference on Information and Computer Science, pp. 61–66. IEEE, Hanoi, Vietnam (2019)
Ghaish, H., Shoukry, A.: Covp3dj: Skeleton parts-based-covariance descriptor for human action recognition. IEEE Trans. Circ. Syst. Video Technol. 30(7), 343–350 (2018)
Vemulapalli, R., Chellapa, R.: Rolling rotations for recognizing human actions from 3D skeletal data. In: 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4471–4479. IEEE, Las Vegas, NV, USA (2016)
Chongyang, D., Kai, L., Guang, L.: Spatio-temporal weighted posture motion features for human skeleton action recognition research. J. Comput. 43(1), 29–40 (2020)
Jun, L., Dong, X., Gang, W.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 816–833 (2017)
Ping, W., Hongbin, S., Nanning, Z.: Learning composite latent structures for 3D Human action representation and recognition. IEEE Trans. Multimed. 21(9), 2195–2208 (2019)
Acknowledgement
This research was supported by the Scientific and Technological Projects of the Nanchang Science and Technology Bureau under Grant GJJ202010, GJJ202017, GJJ212015 and by the Special Project 03 of Jiangxi Provincial Department of Science and Technology under Grant 20212ABC03A36.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wei, P., Xiong, L., He, Y., Yao, L. (2023). A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples. In: Dong, J., Zhang, L. (eds) Proceedings of the International Conference on Internet of Things, Communication and Intelligent Technology . IoTCIT 2022. Lecture Notes in Electrical Engineering, vol 1015. Springer, Singapore. https://doi.org/10.1007/978-981-99-0416-7_53
Download citation
DOI: https://doi.org/10.1007/978-981-99-0416-7_53
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0415-0
Online ISBN: 978-981-99-0416-7
eBook Packages: EngineeringEngineering (R0)