Three-dimensional image-based human pose recovery with hypergraph regularized autoencoders

Hong, Chaoqun; Yu, Jun; Jane, You; Yu, Zhiwen; Chen, Xuhui

doi:10.1007/s11042-016-3312-7

Three-dimensional image-based human pose recovery with hypergraph regularized autoencoders

Published: 08 February 2016

Volume 76, pages 10919–10937, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chaoqun Hong¹,
Jun Yu^2,3,
You Jane⁴,
Zhiwen Yu⁵ &
…
Xuhui Chen¹

354 Accesses
2 Citations
Explore all metrics

Abstract

Three-Dimensional image-based human pose recovery tries to retrieves 3D poses with 2D image. Therefore, one of the key problem is how to represent 2D images. However, semantic gap exists for current feature extractors, which limits recovery performance. In this paper, we propose a novel feature extractor with deep neural network. It is based on denoising autoencoders and improves previous autoencoders by adopting locality preserved restriction. To impose this restriction, we introduce manifold regularization with hypergraph learning. Hypergraph Laplacian matrix is constructed with patch alignment framework. In this way, an automatic feature extractor for images is achieved. Experimental results on three datasets show that the recovery error can be reduced by 10 % to 20 %, which demonstrates the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Article 09 May 2024

References

Agarwal A, Triggs B (2006) Recovering 3d human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 28(1):44–58
Article Google Scholar
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Article Google Scholar
Brand M (1999) Shadow puppetry. In: Proc IEEE International Conference of Computer Vision, IEEE Press, pp 1237–1244
Chen C, Yang Y, Nie F, Odobez JM (2011) 3d human pose recovery from image by efficient visual feature selection. Comput Vis Image Underst 115(3):290–299
Article Google Scholar
Chen M, Weinberger KQ, Sha F, Bengio Y (2014) Marginalized denoising auto-encoders for nonlinear representations. In: IEEE International Conference on Machine Learning, IEEE, pp 1476–1484
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc IEEE International Conference on Computer Vision and Pattern Recognition, IEEE Press, pp 886–893
Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: IEEE International Conference on Computer Vision and Pattern Recognition, IEEE
Gong C, Fu K, Loza A, Wu Q, Liu J, Yang J (2014) Pagerank tracker: From ranking to tracking. IEEE Transactions on Cybernetics 44(6):882–893
Article Google Scholar
Hinton G E, Osindero S, Teh Y W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hong C, Yu J, Tao D, Wang M (2015a) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015b) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Howe N, Leventon M, Freeman W (1999) Bayesian reconstruction of 3d human motion from single-camera video. In: Neural Information Processing Systems, pp 820–826
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Article Google Scholar
Karasuyama M, Mamitsuka H (2013) Manifold-based similarity adaptation for label propagation. In: Advances in Neural Information Processing Systems, MIT Press, pp 1547–1555
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: IEEE Conference on Machine Learning, IEEE, p 609C616
Liu L, Shao L, Li X (2013) Building holistic descriptors for scene recognition: A multi-objective genetic programming approach. In: ACM International Conference on Multimedia, ACM, p 997C1006
Mori G, Malik J (2002) Estimating human body configurations using shape context matching. In: European Conference of Computer Vision, vol 2, pp 666–680
Mori G, Belongie S, Malik J (2005) Efficient shape matching using shape contexts. IEEE Trans Pattern Anal Mach Intell 27(11):1832–1837
Article MATH Google Scholar
Rosales R, Sclaroff S (2000) Inferring body pose without tracking body parts. In: Proc. IEEE International Conference of Computer Vision and Pattern Recognition, IEEE Press, vol 2, pp 721–727
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM Multimedia, ACM, pp 357–360
Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter sensitive hashing. In: Proc. IEEE International Conference of Computer Vision, IEEE Press, vol 2, pp 750–757
Sigal L, Balan A O, Black M J (2010) Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1-2):4–27
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, MIT Press, pp 568–576
Song M, Tao D, Huang X, Chen C, Bu J (2012) Three-dimensional face reconstruction from a single image by a coupled rbf network. IEEE Trans Image Process 21(5):2887–2897
Article MathSciNet Google Scholar
Song M, Tao D, Sun S, Chen C, Bu J (2013) Joint sparse learning for 3-d facial expression generation. IEEE Trans Image Process 22(8):3283–3295
Article Google Scholar
Song M, Tao D, Sun S, Chen C, Maybank S (2014) Robust 3d face landmark localization based on local coordinate coding. IEEE Trans Image Process 23 (12):5108–5122
Article MathSciNet Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE
Yang M, Qiu G, Huang J, Elliman D (2006) Near-duplicate image recognition and content-based image retrieval using adaptive hierarchical geometric centroids. In: Proc IEEE International Conference on Pattern Recognition, IEEE Press, pp 958–961
Yoshua B (2009) Learning deep architectures for ai. Foundations and Trends in Machine Learning 2(1):1–127
Article MATH Google Scholar
Yu J, Tao D (2013) Modern machine learning techniques and their applications in Cartoon Animation research. Wiley-IEEE Press, New Jork
Book Google Scholar
Yu J, Hong R, Wang M, You J (2014) Image clustering based on sparse patch alignment framework. Pattern Recogn 47(11):3512–3519
Article Google Scholar
Yu J (2015) Human pose recovery by supervised spectral embedding. Neurocomputing p. doi:10.1016/j.neucom.2015.04.005
Yuan Y (2015) Scene recognition by manifold regularized deep learning architecture. IEEE Transactions on Neural Networks and Learning Systems, Lu X
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21:1299–1313
Article Google Scholar
Zhou D, Huang J, Scholkopf B (2007) Learning with hypergraphs: Clustering, classification, and embedding. In: Advances in Neural Information Processing Systems, MIT Press, vol 19, pp 1601–1608

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Xiamen University of Technology, Xiamen, China
Chaoqun Hong & Xuhui Chen
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Jun Yu
Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, Hangzhou Dianzi University, Hangzhou, People’s Republic of China
Jun Yu
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
You Jane
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Zhiwen Yu

Authors

Chaoqun Hong
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar
You Jane
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Yu.

Additional information

This work is supported by the National Natural Science Foundation of China (61202145, 61572199, 61472110), the Natural Science Foundation of Fujian Province of China (2014J01256), the Zhejiang Provincial Natural Science Foundation of China (LR15F020002), the Guangdong Natural Science Funds for Distinguished Young Scholars (S2013050014677), the grant from Science and Technology Planning Project of Guangdong Province (2015A050502011), and the central university project (2014G0007).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, C., Yu, J., Jane, Y. et al. Three-dimensional image-based human pose recovery with hypergraph regularized autoencoders. Multimed Tools Appl 76, 10919–10937 (2017). https://doi.org/10.1007/s11042-016-3312-7

Download citation

Received: 24 October 2015
Revised: 19 January 2016
Accepted: 26 January 2016
Published: 08 February 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11042-016-3312-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Three-dimensional image-based human pose recovery with hypergraph regularized autoencoders

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Three-dimensional image-based human pose recovery with hypergraph regularized autoencoders

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation