Skip to main content

A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

  • Conference paper
  • First Online:
Proceedings of the International Conference on Internet of Things, Communication and Intelligent Technology (IoTCIT 2022)

Abstract

Human action recognition (HAR) has been attracting researchers’ attentions because of its broad application prospect and significant research value. Although researchers have made great achievements, dealing with the various duration of the actions and insufficient training samples is still a big challenge for HAR. To address the two issues, a new human action representation method and a multi-scale deep neural network (DNN) model are proposed. The contributions are threefold. First, a new data structure which is named as joint motion matrix (JMM) is put forward for HAR description. By using the JMM, the motion information loss problem which is caused by spatio-temporal occlusion can be effectively solved. Second, a data augmentation method based on the characteristics of human physiological structure and human actions is proposed to generate action samples with brand new motion information. Third, the combined usage of spatial pyramid pooling (SPP) layer and global average pooling (GAP) layer not only enables the DNNs to receive multi-scale inputs, but also provides an effective way for the DNNs to extract more detail global features. The proposed method was evaluated on two small but challenging datasets, Florence 3D Actions and UTKinect-Action3D. The experiments show that the proposed data augmentation strategy is highly effective for DNN training, and the proposed method has strong capacities in multi-scale feature learning and computational cost reductions. The proposed approach achieves well performances on the two datasets with accuracies of 94.27% and 96.83%, which are the highest accuracies in the comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bakar, A.R.: Advances in human action recognition: an updated survey. IET Image Process. 13(13), 2381–2394 (2019)

    Google Scholar 

  2. Marikkannu, P.: An efficient content based image retrieval using an optimized neural network for medical application. Multimed. Tools Appl. 79(31/32), 22277–22292 (2020)

    Google Scholar 

  3. Myeongjun, K.: Spatio-temporal slowfast self-attention network for action recognition. In: 2020 IEEE International Conference on Image Processing, pp. 2206–2210. IEEE, Abu Dhabi, United Arab Emirates (2020)

    Google Scholar 

  4. Dong, H.: Design of support vector machine based automatic classification method of sports video. Modern Electron. Tech. 42(7), 81 (2019)

    Google Scholar 

  5. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 432–439 (2003)

    Google Scholar 

  6. Bobick, A.F.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Google Scholar 

  7. Wang, L.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16(6), 1646–1661 (2007)

    Google Scholar 

  8. Tommer, L.: Kinect identity: technology and experience. Computer 44(4), 94–96 (2011)

    Article  Google Scholar 

  9. Leiyue, Y., Weidong, W.: A new approach to fall detection based on the human torso motion model. Appl. Sci. 7(10), 993 (2017)

    Article  Google Scholar 

  10. Yong, D., Wang, W.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer vision and Pattern Recognition, pp. 1110–1118. IEEE, Boston, MA, USA (2015)

    Google Scholar 

  11. Bo, L., Xuelian, C.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops, pp. 601–604. IEEE, Hong Kong (2017)

    Google Scholar 

  12. Sheng, L., Tingting, J.: 3D human skeleton data compression for action recognition. In: 2019 IEEE Visual Communications and Image Processing, pp. 1–4. IEEE, Sydney, NSW, Australia (2019)

    Google Scholar 

  13. Zhao, W., Yinfu, F.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (The Official Publication of the European Association for Signal Processing) 120, 691–701 (2016)

    Google Scholar 

  14. Zhengyuan, Y., Yuncheng, L., Jianchao, Y., Jiebo, L.: Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Trans. Circ. Syst. Video Technol. 29(8), 2405–2415 (2019)

    Google Scholar 

  15. Xinyi, L., Hongbo, Z., Yixiang, Z., Jinlong, H.: JTCR: Joint Trajectory Character Recognition for human action recognition. In: 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), pp. 350–353. IEEE, Yunlin, Taiwan, China (2019)

    Google Scholar 

  16. Leiyue, Y., Wei, Y., Wei, H.: A data augmentation method for human action recognition using dense joint motion images. Appl. Soft Comput., 106713–106723 (2020)

    Google Scholar 

  17. Min, L., Qiang, C., Shuicheng, Y.: Network in network. Multidiscip. Digital Publish. Inst. 17(11), 2556 (2014)

    Google Scholar 

  18. Yaxin, L., Kesheng, W.: Modified convolutional neural network with global average pooling for intelligent fault diagnosis of industrial gearbox. Maintenance Reliab. 22(1), 63–72 (2020)

    Google Scholar 

  19. Kaiming, H., Xiangyu, Z., Jian, S.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Google Scholar 

  20. Hyunmin, L., Kwangki, K.: Compact spatial pyramid pooling deep convolutional neural network based hand gestures decoder. Appl. Sci. 10(21), 7898 (2020)

    Google Scholar 

  21. Chengwu, L., Lin, Q., Yifeng, H.: 3D human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2920–2932 (2018)

    Google Scholar 

  22. Jian, L., Naveed, A., Ajmal, M.: Adversarial attack on skeleton-based human action recognition. IEEE Trans. Neur. Netw. Learn. Syst. 33(4), 1609–1622 (2022)

    Google Scholar 

  23. Xiaojuan, W., Tianqi, L., Ziliang, G.: Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens. J. 21(21), 24653–24664 (2021)

    Article  Google Scholar 

  24. Zhanchao, H., Jianlin, W., Xuesong, W.: DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 522, 241–258 (2020)

    Google Scholar 

  25. Lorenzo, S., Vincenzo, V.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 479–485 (2013)

    Google Scholar 

  26. Lu, X., Chen, C.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE, Providence, RI, USA (2012)

    Google Scholar 

  27. Dinhtan, P., Tiennam, N., Hai, V.: Analyzing role of joint subset selection in human action recognition. In: 2019 6th NAFOSTED Conference on Information and Computer Science, pp. 61–66. IEEE, Hanoi, Vietnam (2019)

    Google Scholar 

  28. Ghaish, H., Shoukry, A.: Covp3dj: Skeleton parts-based-covariance descriptor for human action recognition. IEEE Trans. Circ. Syst. Video Technol. 30(7), 343–350 (2018)

    Google Scholar 

  29. Vemulapalli, R., Chellapa, R.: Rolling rotations for recognizing human actions from 3D skeletal data. In: 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4471–4479. IEEE, Las Vegas, NV, USA (2016)

    Google Scholar 

  30. Chongyang, D., Kai, L., Guang, L.: Spatio-temporal weighted posture motion features for human skeleton action recognition research. J. Comput. 43(1), 29–40 (2020)

    Google Scholar 

  31. Jun, L., Dong, X., Gang, W.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 816–833 (2017)

    Google Scholar 

  32. Ping, W., Hongbin, S., Nanning, Z.: Learning composite latent structures for 3D Human action representation and recognition. IEEE Trans. Multimed. 21(9), 2195–2208 (2019)

    Google Scholar 

Download references

Acknowledgement

This research was supported by the Scientific and Technological Projects of the Nanchang Science and Technology Bureau under Grant GJJ202010, GJJ202017, GJJ212015 and by the Special Project 03 of Jiangxi Provincial Department of Science and Technology under Grant 20212ABC03A36.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Xiong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, P., Xiong, L., He, Y., Yao, L. (2023). A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples. In: Dong, J., Zhang, L. (eds) Proceedings of the International Conference on Internet of Things, Communication and Intelligent Technology . IoTCIT 2022. Lecture Notes in Electrical Engineering, vol 1015. Springer, Singapore. https://doi.org/10.1007/978-981-99-0416-7_53

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0416-7_53

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0415-0

  • Online ISBN: 978-981-99-0416-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics