RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition

Zhang, Yi-Xiang; Zhang, Hong-Bo; Du, Ji-Xiang; Lei, Qing; Yang, Lijie; Zhong, Bineng

doi:10.1007/s11760-021-01868-8

RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition

Original Paper
Published: 23 February 2021

Volume 15, pages 1379–1386, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yi-Xiang Zhang¹,
Hong-Bo Zhang ORCID: orcid.org/0000-0001-5536-5224¹,
Ji-Xiang Du²,
Qing Lei³,
Lijie Yang³ &
…
Bineng Zhong⁴

477 Accesses
6 Citations
Explore all metrics

Abstract

Most 3D skeleton feature-based human action recognition methods are sensitive to changes in viewpoints, motion scales, and human scales. In addition, acquiring depth information from a real scene in outdoor environments results in poor precision or high computational costs. To address these drawbacks, in this study, we propose a new RGB video and 2D skeleton-based action recognition method including local joint trajectory volume representation and feature coding. First, a video is transferred to a set of volumes, which are called the local joint trajectory volumes. Then, hand-crafted and convolutional networks are used to calculate the features of each volume-based RGB image sequence. Different from most works that use convolutional networks to learn global video features, in this paper, the problem of using a convolutional network to represent local video regions is discussed. Finally, the feature set of each joint is encoded into the Fisher vector as an action feature. The classifier is trained by a linear SVM. The experimental results show that skeleton joint-based features result in a more compact and effective action representation approach than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recognit. Lett. 119, 3–11 (2019)
Article Google Scholar
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Visiom Comput. 60, 4–21 (2017)
Article Google Scholar
Zhang, H.B., Zhang, Y.X., Zhong, B., Lei, Q., Yang, L., Du, J.X., Chen, D.S.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), (2019)
DasDawn, D., Shaikh, S.: A comprehensive survey of human action recognition with spatio-temporal interest point (stip) detector. Visual Comput., 32(3) (2016)
Presti, L.L., Cascia, M.L.: 3D skeleton-based human action classification: a survey. Pattern Recognit. 53, 130–147 (2016)
Article Google Scholar
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: Rgb-d-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018)
Article Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. 36(5), 914–927 (2014)
Article Google Scholar
Li, M., Leung, H.: Graph-based approach for 3D human skeletal action recognition. Pattern Recognit. Lett. 87, 195–202 (2017)
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4489–4497 (2015)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)
Article Google Scholar
Wu, D., Shao, L.: Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–731 (2014)
Liu, X., Li, Y., Xia, R.: Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition. Signal Image Video Process. 14(4), 1227–1234 (2020)
Article Google Scholar
Xia, L., Chen, C., Aggarwal, J. K.: View invariant human action recognition using histograms of 3D joints. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27 (2012)
Kerola, T., Inoue, N., Shinoda, K.: Spectral graph skeletons for 3D action recognition. In: Asia Conference on Computer Vision, pp. 417–432. Springer International Publishing, Cham (2015)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence, pp. 7444–7452. New Orleans, LA, United states (2018)
Huang, Q., Zhou, F., Qin, R., Zhao, Y.: View transform graph attention recurrent networks for skeleton-based action recognition. Signal Image Video Process. (2020)
Rahmani, H., Mahmood, A., Huynh, D.Q, Mian, A.: Hopc: Histogram of oriented principal components of 3D pointclouds for action recognition. In Asia Conference on Computer Vision, pp. 742–757. Springer International Publishing, Cham (2014)
Nazir, S., Yousaf, M.H., Velastin, S.A.: Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput. Electr. Eng. 72, 660–669 (2018)
Article Google Scholar
Dedeoğlu, Y., Töreyin, B., Güdükbay, U., Çetin, A.E.: Silhouette-based method for object classification and human action recognition in video. In: International Conference on Human–Computer Interaction, pp 64–77. Springer-Verlag, Berlin, Heidelberg (2006)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)
Article Google Scholar
Pei, L., Ye, M., Zhao, X., Xiang, T., Li, T.: Learning spatio-temporal features for action recognition from the side of the video. Signal Image Video Process. 10(1), 199–206 (2016)
Article Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer International Publishing, Cham (2016)
Shahroudy, A., Ng, T., Gong, Y., Wang, G.: Deep multimodal feature analysis for action recognition in RGB+D videos. IEEE Trans. Pattern Anal. 40(5), 1045–1058 (2018)
Article Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3551–3558 (2013)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: International Joint Conferences on Artificial Intelligence, IJCAI 13, p. 1493-C1500. AAAI Press (2013)
Wang, J., Yuan, J., Chen, Z., Wu, Y.: Spatial Locality-Aware Sparse Coding and Dictionary Learning, vol. 25, pp. 491–505. Singapore (2012)
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4D normals for activity recognition from depth sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Yi-Xiang Zhang & Hong-Bo Zhang
Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen, China
Ji-Xiang Du
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, China
Qing Lei & Lijie Yang
Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, China
Bineng Zhong

Authors

Yi-Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Xiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Qing Lei
View author publications
You can also search for this author in PubMed Google Scholar
Lijie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bineng Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Bo Zhang.

Ethics declarations

Conflicts of interest

All of authors have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the Natural Science Foundation of China (Nos. 61871196, 61673186, 61972167 and 62001176), National Key Research and Development Program of China (No. 2019YFC1604700), Natural Science Foundation of Fujian Province of China (Nos. 2019J01082 and 2020J01085), the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-YX601, ZQN-710)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, YX., Zhang, HB., Du, JX. et al. RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition. SIViP 15, 1379–1386 (2021). https://doi.org/10.1007/s11760-021-01868-8

Download citation

Received: 11 June 2020
Revised: 17 November 2020
Accepted: 29 January 2021
Published: 23 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11760-021-01868-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation