Semi-supervised Learning for Human Pose Recognition with RGB-D Light-Model

Wang, Xinbo; Zhang, Guoshan; Yu, Dahai; Liu, Dan

doi:10.1007/978-3-319-48896-7_72

Xinbo Wang^16,17,
Guoshan Zhang¹⁶,
Dahai Yu¹⁷ &
…
Dan Liu^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2504 Accesses

Abstract

This work targets human pose recognition based on RGB-D videos. In recently, RGB-D based methods can be typically represented as either maps-based approaches or skeleton-based approaches. This paper proposes a semi-supervised learning method for evaluating human posture via RGB-D and light-model. The light-model is generated to represent depth sequence, by using the dynamic-fusion strategy. In this regard, light-model has richer information than depth image, and a CNN classifier is further constructed to recognize human pose with trained labeled light model data. Soft correlation and hard correlation are used to adjust the CNN output of non-labeled data. This paper constructs a set of posture data which consist of RGB images and light model. The experiments results show that our method is more accuracy than the state of the art, and the efficient is also competitive. This study implies that feature extracted from 3D models is reliable for human pose recognition, especially for sitting posture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Proceedings of the Pattern Recognition (ICPR 2004), vol. 3, pp. 32–36. IEEE Computer Society (2004)
Google Scholar
Heng, W., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, vol. 2 (2014)
Google Scholar
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Google Scholar
Tran, D., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Google Scholar
Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2014)
Google Scholar
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)
Article Google Scholar
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE (2013)
Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Lu, X., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2013)
Google Scholar
Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Ng, J.Y.-H., et al.: Beyond short snippets: deep networks for video classification. arXiv preprint arXiv:1503.08909a (2015)
Wang, K., Wang, X., Lin, L., et al.: 3D human activity recognition with reconfigurable convolutional neural networks. In: Proceedings of the ACM International Conference on Multimedia. ACM (2014)
Google Scholar
Whelan, T., et al.: Kintinuous: spatially extended kinectfusion. MIT-CSAIL-TR-2012-020 (2012)
Google Scholar
Nießner, M., et al.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32(6) (2013). Article No. 169
Google Scholar
Whelan, T., et al.: ElasticFusion: dense SLAM without a pose graph. In: RSS (2015)
Google Scholar
Blan, A.O., et al.: Shining a light on human pose: on shadows, shading and the estimation of pose and shape. In: IEEE 11th International Conference on Computer Vision, ICCV 2007. IEEE (2007)
Google Scholar
Lee, M.W., Nevatia, R.: Body part detection for human pose estimation and tracking. In: IEEE Workshop on Motion and Video Computing, WMVC 2007. IEEE (2007)
Google Scholar
Lee, M.W., Nevatia, R.: Dynamic human pose estimation using Markov chain Monte Carlo approach. In: Seventh IEEE Workshops on Application of Computer Vision, WACV/MOTIONS 2005, vol. 1–2. IEEE (2005)
Google Scholar
Fathi, A., Mori, G.: Human pose estimation using motion exemplars. In: IEEE 11th International Conference on Computer Vision, ICCV 2007. IEEE (2007)
Google Scholar
Baumberg, A.M., Hogg, D.C.: An efficient method for contour tracking using active shape models. In: Proceedings of the 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects. IEEE (1994)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE (2010)
Google Scholar
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.M.: STOP: Space-Time Occupancy Patterns for 3D action recognition from depth map sequences. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 252–259. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33275-3_31
Chapter Google Scholar
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7441, pp. 872–885. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_62
Google Scholar
Mao, Y., et al.: Accurate 3d pose estimation from a single depth image. In: 2011 IEEE International Conference on Computer Vision (ICCV). IEEE (2011)
Google Scholar
Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for efficient anatomy detection and localization in CT studies. In: Menze, B., Langs, G., Tu, Z., Criminisi, A. (eds.) MCV 2010. LNCS, vol. 6533, pp. 106–117. Springer, Heidelberg (2011). doi:10.1007/978-3-642-18421-5_11
Chapter Google Scholar
Jalal, A., et al.: Recognition of human home activities via depth silhouettes and transformation for smart homes. Indoor Built Environ. 21(1), 184–190 (2011)
Article Google Scholar
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia. ACM (2012)
Google Scholar
Wu, S.-L., Cui, R.-Y.: Human behavior recognition based on sitting postures. In: 2010 International Symposium on Computer Communication Control and Automation (3CA), vol. 1. IEEE (2010)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. arXiv preprint arXiv:1505.00687 (2015)
Stikic, M., Van Laerhoven, K., Schiele, B.: Exploring semi-supervised and active learning for activity recognition. In: 12th IEEE International Symposium on Wearable Computers, ISWC 2008. IEEE (2008)
Google Scholar
Zhao, X., et al.: Human action recognition based on semi-supervised discriminant analysis with global constraint. Neurocomputing 105, 45–50 (2013)
Article Google Scholar
Zhang, T., et al.: Boosted multi-class semi-supervised learning for human action recognition. Pattern Recogn. 44(10), 2334–2342 (2011)
Article MATH Google Scholar
Guan, D., et al.: Activity recognition based on semi-supervised learning. In: 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007. IEEE (2007)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39, 1–38 (1977)
Google Scholar
Miller, D.J., Uyar, H.S.: A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Advances in Neural Information Processing Systems (1997)
Google Scholar
Zhao, Y., et al.: Combing RGB and depth map features for human activity recognition. In: 2012 Asia-Pacific on Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE (2012)
Google Scholar
Faria, D.R., Premebida, C., Nunes, U.: A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In: 2014 RO-MAN: The 23rd IEEE International Symposium on Robot and Human Interactive Communication. IEEE (2014)
Google Scholar
Ming, Y., Ruan, Q., Hauptmann, A.G.: Activity recognition from RGB-D camera with 3d local spatio-temporal features. In: 2012 IEEE International Conference on Multimedia and Expo (ICME). IEEE (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Tianjin University, Tianjin, 300072, China
Xinbo Wang & Guoshan Zhang
Tianjin Optical Electrical Gaosi Communication Engineering Technology Co., Ltd., Tianjin, 300384, China
Xinbo Wang, Dahai Yu & Dan Liu
Xi’an University of Architecture and Technology, Xi’an, 710043, China
Dan Liu

Authors

Xinbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoshan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dahai Yu
View author publications
You can also search for this author in PubMed Google Scholar
Dan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dahai Yu .

Editor information

Editors and Affiliations

Zhengzhou University, Zhengzhou, China
Enqing Chen
Jiaotong University, Xi’an, China
Yihong Gong
Zhengzhou University, Zhengzhou, China
Yun Tie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Zhang, G., Yu, D., Liu, D. (2016). Semi-supervised Learning for Human Pose Recognition with RGB-D Light-Model. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_72

Download citation

DOI: https://doi.org/10.1007/978-3-319-48896-7_72
Published: 27 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics