Skip to main content
Log in

Human activity recognition in RGB-D videos by dynamic images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human Activity Recognition in RGB-D videos has been an active research topic during the last decade. However, only a few efforts have been made, for recognizing human activity in RGB-D videos where several performers are performing simultaneously. In this paper we introduce such a challenging dataset with several performers performing the activities simultaniously. We present a novel method for recognizing human activities performed simultaniously in the same videos. The proposed method aims in capturing the motion information of the whole video by producing a dynamic image corresponding to the input video. We use two parallel ResNet-101 architectures to produce the dynamic images for the RGB video and depth video separately. The dynamic images contain only the motion information of the whole frame, which is the main cue for analyzing the motion of the performer during action. Hence, dynamic images help recognizing human action by concentrating only on the motion information appeared on the frame. We send the two dynamic images through a fully connected layer for classification of activity. The proposed dynamic image reduces the complexity of the recognition process by extracting a sparse matrix from a video, while preserving the motion information required for activity recognition, and produces comparable results with respect to the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aghbolaghi MA, Bertiche H, Roig V, Kasaei S, Escalera S (2017) Action recognition from RGB-D data: comparison and fusion of spatio-temporal handcrafted features and deep strategies. In: ICCV workshops

  2. Akula A, Shah AK, Ghosh R (2018) Deep learning approach for human action recognition in infrared images. Cognitive Systems Research. https://doi.org/10.1016/j.cogsys.2018.04.002

  3. Baek S, Shi Z, Kawade M, Kim TK (2017) Kinematic-layout-aware random forests for depth-based action recognition BMVC

  4. Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2017) Action recognition with dynamic image networks. IEEE Tran PAMI, https://doi.org/10.1109/TPAMI.2017.2769085

  5. Chen J, Zhao G, Kellokumpu VP, Pietikäinen M (2011) Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition. In: ICCV, pp 1524–1531

  6. Chen C, Jafari R, Kehtarnavaz N (2017) A survey of depth and inertial sensor fusion for human action recognition. Multimed Tools Applic 76(3):4405–4425

    Article  Google Scholar 

  7. Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2017) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimed Tools Applic 76(3):4651–4669

    Article  Google Scholar 

  8. Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Tran PAMI 39(4):773–787

    Article  Google Scholar 

  9. Gonzalez-Sanchez T, Puig D (2011) Real-time body gesture recognition using depth camera. Electron Lett 47(12):697–698

    Article  Google Scholar 

  10. Guindel C, Martin Jose D, Armingol M (2019) Traffic scene awareness for intelligent vehicles using ConvNets and stereo vision. Robot Auton Syst 112:109–122

    Article  Google Scholar 

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

  12. Hu JF, Zheng WS, Pan J, Lai J, Zhang J (2018) Deep bilinear learning for RGB-D action recognition. In: ECCV, pp 1–17

  13. Ji Y, Xu F, Yang Y, Shen F, Shen HT, Zheng WS (2019) A large-scale varying-view RGB-D action dataset for arbitrary-view human action recognition. arxiv:1904.10681

  14. Kong Y, Fu Y (2015) Bilinear heterogeneous information machine for RGB-D action recognition. In: CVPR

  15. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPR workshops, pp 9–14

  16. Mukherjee S, Mukherjee DP (2013) A design-of-experiment based statistical technique for detection of key-frames. Multimed Tools Applic 62(3):847–877

    Article  Google Scholar 

  17. Maryam AA, Kasaei S (2018) Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos. Multimed Tools Applic 77 (11):14115–14135

    Article  Google Scholar 

  18. Negin F, Zdemir FO, Akgul CB, Yuksel KA, Ercil A (2013) A decision forest based feature selection framework for action recognition from rgb-depth cameras. In: ICIAR

  19. Oreifej O, Liu Z, Redmond WA (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR

  20. Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) HOPC: histogram of oriented principal components of 3d pointclouds for action recognition. In: ECCV

  21. Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp 1010–1019

  22. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  23. Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222

    Article  MathSciNet  Google Scholar 

  24. Spinello L, Arras KO (2011) People detection in rgb-d data. In: IROS

  25. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining action let ensemble for action recognition with depth cameras. CVPR, 1290–1297

  26. Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona PO (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Tran HMS, 46(4)

  27. Wang P, Wang S, Gao Z, Hou Y, Li W (2017) Structured images for RGB-D action recognition. In: ICCV workshops, pp 1005–1014

  28. Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. arXiv:1711.08362v2 [cs.CV]

  29. Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: AAAI, pp 7404–7411

  30. Wilson G, Pereyda C, Raghunath N, de la Cruz G, Goel S, Nesaei S, Minor B, Edgecombe MS, Taylor ME, Cook DJ (2018) Robot-enabled support of daily activities in smart home environments. Cognitive Systems Research, https://doi.org/10.1016/j.cogsys.2018.10.032

  31. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995

  32. Yang X, Tian Y (2012) EigenJoints-based action recognition using Native-Bayes-Nearest-Neighbor. In: CVPR workshops, pp 14–19

  33. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM multimedia, pp 1057–1060

  34. Zhang J, Li W, Ogunbona P, Wang P, Tang C (2016) RGB-D-based action recognition datasets: a survey. Pattern Recogn 60(2016):86–105

    Article  Google Scholar 

  35. Zhang H, Li Y, Wang P, Liu Y, Shen C (2018) RGB-D based action recognition with light-weight 3D convolutional networks. arxiv:1811.09908

  36. Ziaeetabar F, Kulvicius T, Tamosiunaite M, Worgotter F (2018) Recognition and prediction of manipulation actions using enriched semantic event chains. Robot Auton Syst 110:173–188

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to acknowledge the financial support provided by the Science and Engineering Research Board (SERB), the Government of India, through the project grant numbered ECR/2016/00652. The authors wish to acknowledge the NVIDIA GPU grant team for providing graphics card to perform the necessary experiments for this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehasis Mukherjee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukherjee, S., Anvitha, L. & Lahari, T.M. Human activity recognition in RGB-D videos by dynamic images. Multimed Tools Appl 79, 19787–19801 (2020). https://doi.org/10.1007/s11042-020-08747-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08747-3

Keywords

Navigation