Skip to main content

Human Pose Estimation by a Series of Residual Auto-Encoders

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2017)

Abstract

Pose estimation is the task of predicting the pose of an object in an image or in a sequence of images. Here, we focus on articulated human pose estimation in scenes with a single person. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a heatmap prediction of body joints. In this network topology, features are processed across all scales which captures the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. We propose some improvements to this type of regression-based networks to further increase performance, namely: (a) increase the number of parameters of the auto-encoder networks in the pipeline, (b) use stronger regularization along with heavy data augmentation, (c) use sub-pixel precision for more precise joint localization, and (d) combine all auto-encoders output heatmaps into a single prediction, which further increases body joint prediction accuracy. We demonstrate state-of-the-art results on the popular FLIC and LSP datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693. IEEE (2014)

    Google Scholar 

  2. Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  3. Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV, pp. 3487–3494 (2013)

    Google Scholar 

  4. Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: CVPR, vol. 13, p. 3 (2013)

    Google Scholar 

  5. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)

    Google Scholar 

  6. Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS, pp. 1736–1744 (2014)

    Google Scholar 

  7. Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv:1602.00134 (2016)

  8. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. arXiv:1603.06937 (2016)

  9. Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Proceedings of CVPR (2011)

    Google Scholar 

  10. Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In IEEE Proceedings of CVPR, pp. 588–595 (2013)

    Google Scholar 

  11. Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV, pp. 723–730. IEEE (2011)

    Google Scholar 

  12. Dantone, M., Gall, J., Leistner, C., Van Gool, L.: Human pose estimation using body parts dependent joint regressors. In: IEEE Proceedings of CVPR, pp. 3041–3048 (2013)

    Google Scholar 

  13. Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). doi:10.1007/978-3-319-10605-2_3

    Google Scholar 

  14. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. arXiv:1511.06645 (2015)

  15. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp. 1799–1807 (2014)

    Google Scholar 

  16. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. arXiv:1605.03170 (2016)

  17. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: IEEE Proceedings of CVPR, pp. 1653–1660 (2014)

    Google Scholar 

  18. Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv:1602.00134 (2016)

  19. Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). doi:10.1007/978-3-319-46478-7_44

    Chapter  Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 (2015)

  21. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Proceedings of CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  22. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop, no. EPFL-CONF-192376 (2011)

    Google Scholar 

  23. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

  24. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  25. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 (2015)

  26. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, p. 5 (2010)

    Google Scholar 

  27. Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. arXiv:1605.02914 (2016)

  28. Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. arXiv:1603.08212 (2016)

Download references

Acknowledgments

This work was supported by the FCT project LARSyS (UID/EEA/50009/2013) and FCT PhD grant to author MF (SFRH/BD/79812/2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Farrajota .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Farrajota, M., Rodrigues, J.M.F., du Buf, J.M.H. (2017). Human Pose Estimation by a Series of Residual Auto-Encoders. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58838-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58837-7

  • Online ISBN: 978-3-319-58838-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics