A Robust Estimation of 2D Human Upper-Body Poses Using Fully Convolutional Network

  • Seunghee Lee
  • Jungmo Koo
  • Hyungjin Kim
  • Kwangyik Jung
  • Hyun MyungEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 751)


We present an approach to efficiently detect the 2D human upper-body pose in RGB images. Among the system for estimating the joints position, the method using only RGB camera sensor is very cost-effective compared to the system with high-priced sensors such as a motion capture system. In this work, we use semantic segmentation using a fully convolutional network to estimate the upper-body poses of each skeleton and choose the location coordinate using joint heatmaps. The architecture is designed to learn joint locations and their association via the sequential prediction process. We demonstrate the performance of the proposed method using various datasets.



This work was supported by the Technology Innovation Program, 10045252, Development of robot task intelligence technology, supported by the Ministry of Trade, Industry, and Energy (MOTIE, Korea). The students are supported by Korea Minister of Ministry of Land, Infrastructure and Transport (MOLIT) as U-City Master and Doctor Course Grant Program.


  1. 1.
    Aggarwal, J., Cai, Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73, 428–440 (1999)CrossRefGoogle Scholar
  2. 2.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 90–126 (2006)CrossRefGoogle Scholar
  3. 3.
    Oxford Dictionaries. Accessed 16 Oct 2017
  4. 4.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 3108–3113 (2010)Google Scholar
  5. 5.
    Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N.: Human skeleton tracking from depth data using geodesic distances and optical flow. Image Vis. Comput. 30, 217–226 (2012)CrossRefGoogle Scholar
  6. 6.
    Straka, M., Hauswiesner, S., Rüther, M., Bischof, H.: Skeletal graph based human pose estimation in real-time. In: BMVC, pp. 1–12 (2011)Google Scholar
  7. 7.
    Shotton, J., Sharp, T., Kipman, A., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 116 (2013)CrossRefGoogle Scholar
  8. 8.
    Hernández-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., Escalera, S.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: Computer Vision and Pattern Recognition (CVPR), pp. 726–732 (2012)Google Scholar
  9. 9.
    Droeschel, D., Behnke, S.: 3D body pose estimation using an adaptive person model for articulated ICP. In: Intelligent Robotics and Applications, pp. 157–167 (2011)CrossRefGoogle Scholar
  10. 10.
    Kim, H., Lee, S., Lee, D., Choi, S., Ju, J., Myung, H.: Real-time human pose estimation and gesture recognition from depth images using superpixels and SVM classifier. Sensors 15(6), 12410–12427 (2015)CrossRefGoogle Scholar
  11. 11.
    Jain, H., Subramanian, A., Das, S., Mittal, A.: Real-time upper-body human pose estimation using a depth camera. In: Computer Vision/Computer Graphics Collaboration Techniques, pp. 227–238 (2011)CrossRefGoogle Scholar
  12. 12.
    Haritaogalu, I.: W4S: A real-time system for detecting and tracking people in 2 1/2-D. In: European Conference on Computer Vision (1998)Google Scholar
  13. 13.
    Fujiyoshi, H., Lipton, A.J., Kanade, T.: Real-time human motion analysis by image skeletonization. IEICE Trans. Inf. Syst. 87(1), 113–120 (2004)Google Scholar
  14. 14.
    Guo, Y., Xu, G., Tsuji, S.: Tracking human body motion based on a stick figure model. J. Vis. Commun. Image Represent. 5(1), 1–9 (1994)CrossRefGoogle Scholar
  15. 15.
    Ohya, J., Kishino, F.: Human posture estimation from multiple images using genetic algorithm. In: Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision and Image Processing. Proceedings of the 12th IAPR International Conference, vol. 1, pp. 750–753 (1994)Google Scholar
  16. 16.
    Takahashi, K., Uemura, T., Ohya, J.: Neural-network-based real-time human body posture estimation. In: Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol. 2, pp. 477–486 (2000)Google Scholar
  17. 17.
    Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2016)CrossRefGoogle Scholar
  18. 18.
    Zhang, Z., Seah, H.S., Quah, C.K., Sun, J.: GPU-accelerated real-time tracking of full-body motion with multi-layer search. IEEE Trans. Multimedia 15(1), 106–119 (2013)CrossRefGoogle Scholar
  19. 19.
    Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., … Blake, A.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)CrossRefGoogle Scholar
  20. 20.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: Computer Vision and Pattern Recognition (CVPR), pp. 755–762 (2010)Google Scholar
  21. 21.
    Yang, W., Ouyang, W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3073–3082 (2016)Google Scholar
  22. 22.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  23. 23.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)Google Scholar
  24. 24.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)Google Scholar
  25. 25.
    Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems, pp. 1736–1744 (2014)Google Scholar
  26. 26.
    Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723 (2016)Google Scholar
  27. 27.
    Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3681 (2013)Google Scholar
  28. 28.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation (2010)Google Scholar
  29. 29.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems, pp. 1129–1136 (2007)Google Scholar
  30. 30.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693 (2014)Google Scholar
  31. 31.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Computer Vision and Pattern, pp. 1–8 (2008)Google Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
  33. 33.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  34. 34.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning, pp. 647–655 (2014)Google Scholar
  35. 35.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., … Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: The 22nd ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  36. 36.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv:1412.6980
  37. 37.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)CrossRefGoogle Scholar
  38. 38.
    Koo, J., Lee, S., Kim, H., Jung, K., Oh, T., Myung, H.: Human upper-body pose estimation using fully convolutional network and joint heatmap. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Seunghee Lee
    • 1
  • Jungmo Koo
    • 1
  • Hyungjin Kim
    • 1
  • Kwangyik Jung
    • 1
  • Hyun Myung
    • 1
    • 2
    Email author
  1. 1.Department of Civil and Environmental EngineeringKAISTDaejeonKorea
  2. 2.Robotics ProgramKAISTDaejeonKorea

Personalised recommendations