Abstract
In this paper, a method based on depth spatial-temporal maps(DSTMs) is presented for human action recognition from depth video sequences, which provides compact global spatial and temporal information of human motion for action recognition. In our approach, the initial frame of depth sequences is dilated to generate 3D body mask. The new depth sequences of major part of the human body are then computed after using 3D body mask on each depth frame. We project each frame of the new depth sequences onto three orthogonal axes to get three binary lists. Under each projection axis, binary lists are stitching in order through an entire depth sequence forming a DSTM. We evaluate our method on two standard databases. Experimental results show that this method could effectively capture the spatial and temporal information of human motion and improve the accuracy of human action recognition.
Similar content being viewed by others
References
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. Pattern Anal Mach Intell IEEE Trans 23(3):257–267
Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. In: Engineering in medicine and biology society. IEEE, pp 4983–4986
Chen C, Liu K, Jafari R et al (2014) Home-based senior fitness test measurement system using collaborative inertial and depth sensors. In: Engineering in medicine and biology society. IEEE, pp 4135–4138
Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. Conf Proc IEEE Eng Med Biol Soc 2014:4983–4986
Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. IEEE Xplore, pp 1092–1099
Davis JW (2001) Hierarchical motion history images for recognizing human motion. In: IEEE workshop on detection and recognition of events in video, 2001. Proceedings. IEEE, pp 39–46
Fan X, Tjahjadi T (2017) A dynamic framework based on local Zernike moment and motion history image for facial expression recognition. Pattern Recog, pp 399–406
Laptev I, Marszalek M, Schmid C et al (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer vision and pattern recognition workshops. IEEE, pp 9–14
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Computer vision and pattern recognition. IEEE, pp 716–723
Sun J, Wu X, Yan S et al (2009) Hierarchical spatial-temporal context modeling for action recognition. Cvpr, pp 2004–2011
Tian YL, Cao L, Liu Z et al (2012) Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans Syst Man Cybern Part C 42 (3):313–323
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition. IEEE computer society, pp 588–595
Xia L, Aggarwal JK (2013) Spatial-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer vision and pattern recognition. IEEE, pp 2834–2841
Yang AY, Jafari R, Sastry SS et al (2009) Distributed recognition of human actions using wearable motion sensor networks. J Ambient Intell Smart Environ 1(2):103–115
Yang X, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM international conference on multimedia. ACM, pp 1057–1060
Yang X, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM international conference on multimedia. ACM, pp 1057–1060
Zhang L, Zhang L, Tao D et al (2015) A sparse and discriminative tensor to vector projection for human gait feature representation. Sig Process 106 (C):245–252
Zhao N, Zhang L, Du B et al (2016) Sparse tensor discriminative locality alignment for gait recognition. In: International joint conference on neural networks. IEEE, pp 4489–4495
Zhang B, Yang Y, Chen C et al (2017) Action recognition using 3D histograms of texture and A multi-class boosting classifier. IEEE Trans Image Process Publ IEEE Sig Process Soc 26(10):4648–4660
Yang X, Tian YL (2014) Super normal vector for activity recognition using depth sequences. In: Computer vision and pattern recognition. IEEE, pp 804–811
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, X., Hou, Z., Liang, J. et al. Human action recognition based on 3D body mask and depth spatial-temporal maps. Multimed Tools Appl 79, 35761–35778 (2020). https://doi.org/10.1007/s11042-020-09593-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09593-z