Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Li, Sijin; Liu, Zhi-Qiang; Chan, Antoni B.

doi:10.1007/s11263-014-0767-8

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Published: 26 September 2014

Volume 113, pages 19–36, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Sijin Li¹,
Zhi-Qiang Liu² &
Antoni B. Chan^3,4

2334 Accesses
57 Citations
3 Altmetric
Explore all metrics

Abstract

We propose a heterogeneous multi-task learning framework for human pose estimation from monocular images using a deep convolutional neural network. In particular, we simultaneously learn a human pose regressor and sliding-window body-part and joint-point detectors in a deep network architecture. We show that including the detection tasks helps to regularize the network, directing it to converge to a good solution. We report competitive and state-of-art results on several datasets. We also empirically show that the learned neurons in the middle layer of our network are tuned to localized body parts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Stacked Mixed-Scale Networks for Human Pose Estimation

Notes

The full results can be viewed at http://visal.cs.cityu.edu.hk/research/hmlpe-demo/.
As pointed out in (Hara and Chellappa 2013; Pishchulin et al. 2012), the code in the Buffy toolkit does not compute PCP correctly.
Since we have different definitions of torso and head parts, we do not show the evaluation of these parts here.

References

Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision, 87(1–2), 28–52.
Article Google Scholar
Dalal, N., & Triggs, B. (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition.
Dantone, M., Gall, J., Leistner, C., & van Gool L. (2013) Human pose estimation from still images using body parts dependent joint regressors. In: IEEE Conference on Computer Vision and Pattern Recognition.
Eichner, M., & Ferrari, V. (2009a) Better appearance models for pictorial structures. In: British Machine Vision Conference, pp 1–11.
Eichner, M., & Ferrari, V. (2009b) Upper body detector. http://groups.inf.ed.ac.uk/calvin/calvin_upperbody_detector/
Eichner, M., & Ferrari, V. (2010) We are family: Joint pose estimation of multiple persons. In: European Conference.on Computer Vision.
Eichner, M., & Ferrari, V. (2012). Human pose co-estimation and applications. IEEE Trans Pattern Anal Mach Intell.
Eichner, M., Marin-Jimenez, M., Zisserman, A., & Ferrari, V. (2012). 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International Journal of Computer Vision, 99(2), 190–214.
Article MathSciNet Google Scholar
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
MATH MathSciNet Google Scholar
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915–1929.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Gülçehrem, C., & Bengio, Y. (2013) Knowledge matters: Importance of prior information for optimization. In: International Conference on Learning Representations.
Hara, K., & Chellappa, R. (2013) Computationally efficient regression on a dependency graph for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Jain, A., Tompson, J., Andriluka, M., Taylor, G. W., & Bregler, C. (2014) Learning human pose estimation features with convolutional networks. In: International Conference on Learning Representations.
Johnson, S., & Everingham, M. (2011) Learning effective human pose estimation from inaccurate annotation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems.
Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012) Building high-level features using large scale unsupervised learning. In: International Conference on Machine Learning.
van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
MATH Google Scholar
Nair, V., & Hinton, G. E. (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning.
Pishchulin, L., Jain, A., Andriluka, M., Thormaehlen, T., & Schiele, B. (2012) Articulated people detection and pose estimation: Reshaping the future. In: IEEE Conference on Computer Vision and Pattern Recognition.
Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013) Poselet conditioned pictorial structures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 588–595.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In J. A. Anderson & E. Rosenfeld (Eds.), Neurocomputing: Foundations of research (pp. 696–699). Cambridge, MA: MIT Press.
Sapp, B., & Taskar, B. (2013) Modec: Multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Sapp, B., Toshev, A., & Taskar, B. (2010) Cascaded models for articulated pose estimation. In: European Conference on Computer Vision.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011) Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning, 15, 1929–1958.
Sun, Y., Wang, X., & Tang, X. (2013) Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition.
Toshev, A., & Szegedy, C. (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition.
Weston, J., Ratle, F., & Collobert, R. (2008) Deep learning via semi-supervised embedding. In: International Conference on Machine Learning.
Yang, X., Kim, S., & Xing, E. P. (2009) Heterogeneous multitask learning with joint sparsity constraints. In: Neural Information Processing Systems.
Yang, Y., & Ramanan, D. (2011) Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition.
Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.
Article Google Scholar
Yu, K., Tresp, V., & Schwaighofer, A. (2005) Learning gaussian processes from multiple tasks. In: International Conference on Machine Learning, pp 1012–1019.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision – ECCV 2014. Lecture Notes in Computer Science (Vol. 8689, pp. 818–833). Springer.

Download references

Acknowledgments

A.B.C. was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 123212 and CityU 110513). This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, under GRF 9041574 (CityU 118810), GRF 9041905 (CityU 119313).

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Sijin Li
School of Creative Media (SCM), City University of Hong Kong, Hong Kong, China
Zhi-Qiang Liu
Department of Computer Science, Multimedia software Engineering Research Centre (MERC), City University of Hong Kong, Hong Kong, China
Antoni B. Chan
Multimedia software Engineering Research Centre (MERC), Shenzhen, Guangdong, China
Antoni B. Chan

Authors

Sijin Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Antoni B. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sijin Li.

Additional information

Communicated by Marc’Aurelio Ranzato, Geoffrey E. Hinton, and Yann Lecun.

Appendix

See Figure 12.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Liu, ZQ. & Chan, A.B. Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network. Int J Comput Vis 113, 19–36 (2015). https://doi.org/10.1007/s11263-014-0767-8

Download citation

Received: 09 February 2014
Accepted: 10 September 2014
Published: 26 September 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11263-014-0767-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Stacked Mixed-Scale Networks for Human Pose Estimation

Notes

References

Acknowledgments