Real-Time Upper-Body Human Pose Estimation Using a Depth Camera

  • Himanshu Prakash Jain
  • Anbumani Subramanian
  • Sukhendu Das
  • Anurag Mittal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6930)


Automatic detection and pose estimation of humans is an important task in Human-Computer Interaction (HCI), user interaction and event analysis. This paper presents a model based approach for detecting and estimating human pose by fusing depth and RGB color data from monocular view. The proposed system uses Haar cascade based detection and template matching to perform tracking of the most reliably detectable parts namely, head and torso. A stick figure model is used to represent the detected body parts. The fitting is then performed independently for each limb, using the weighted distance transform map. The fact that each limb is fitted independently speeds-up the fitting process and makes it robust, avoiding the combinatorial complexity problems that are common with these types of methods. The output is a stick figure model consistent with the pose of the person in the given input image. The algorithm works in real-time and is fully automatic and can detect multiple non-intersecting people.


Template Match Depth Camera Distance Transform Foreground Segmentation Human Motion Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zcam from 3dv systems (2009),
  2. 2.
    Aggarwal, J., Cai, Q.: Human motion analysis: A review. In: Proceedings of the Nonrigid and Articulated Motion Workshop, pp. 90–102 (1997)Google Scholar
  3. 3.
    Badler, N.I., Phillips, C.B., Webber, B.L.: Simulating Humans: Computer Graphics, Animation, and Control. Oxford University Press, Oxford (1993)zbMATHGoogle Scholar
  4. 4.
    Barrón, C., Kakadiaris, I.A.: Estimating anthropometry & pose from a single uncalibrated image. Computer Vision and Image Understanding 81, 269–284 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bradley, D.: Profile face detection (2003),
  6. 6.
    Chang, F., jen Chen, C., jen Lu, C.: A linear-time component-labeling algorithm using contour tracing technique. Computer Vision and Image Understanding 93, 206–220 (2004)CrossRefGoogle Scholar
  7. 7.
    Churchill, E., McConville, J.T., Laubach, L.L., Erskine, P., Downing, K., Churchill, T.: Anthropometric source book. A handbook of anthropometric data, vol. 2. NASA (1978)Google Scholar
  8. 8.
    Fujiyoshi, H., Lipton, A.J.: Real-time human motion analysis by image skeletonization. In: Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision (WACV 1998), pp. 15–21 (1998)Google Scholar
  9. 9.
    Guo, Y., Xu, G., Tsuji, S.: Tracking human body motion based on a stick figure model. Journal of Visual Comm. and Image Representation 5(1), 1–9 (1994)CrossRefGoogle Scholar
  10. 10.
    Haritaoglu, I., Harwood, D., Davis, L.: W4: Who? when? where? what? A real time system for detecting and tracking people. In: Proceedings of the Third IEEE Int. Conf. on Automatic Face and Gesture Recog., pp. 222–227 (1998)Google Scholar
  11. 11.
    Herda, L., Fua, P., Plänkers, R., Boulic, R., Thalmann, D.: Skeleton-based motion capture for robust reconstruction of human motion. In: Proceedings of the Computer Animation, pp. 77–83. IEEE Computer Society, Los Alamitos (2000)Google Scholar
  12. 12.
    Jensen, R.R., Paulsen, R.R., Larsen, R.: Analyzing gait using a time-of-flight camera. In: Salberg, A.-B., Hardeberg, J.Y., Jenssen, R. (eds.) SCIA 2009. LNCS, vol. 5575, pp. 21–30. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Johansson, G.: Visual motion perception. Scientific American 232(6), 76–89 (1975)CrossRefGoogle Scholar
  14. 14.
    Kolb, A., Barth, E., Koch, R., Larsen, R.: Time-of-flight cameras in computer graphics. Computer Graphics Forum 29, 141–159 (2010)CrossRefGoogle Scholar
  15. 15.
    Kruppa, H., Santana, M.C., Schiele, B.: Fast and robust face finding via local context. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (October 2003)Google Scholar
  16. 16.
    Lienhart, R., Maydt, J.: An extended set of Haar-like features for rapid object detection. In: Proceedings of the International Conference on Image Processing, vol. 1, pp. 900–903 (2002)Google Scholar
  17. 17.
    Microsoft: Kinect for xbox 360 (2010),
  18. 18.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)CrossRefGoogle Scholar
  19. 19.
    Ohya, J., Kishino, F.: Human posture estimation from multiple images using genetic algorithm. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference A: Computer Vision & Image Processing. vol. 1, pp. 750–753 (1994)Google Scholar
  20. 20.
    Rashid, R.F.: Towards a system for the interpretation of moving light display. IEEE Transactions on Pattern Analysis and Machine Intelligence 2(6), 574–581 (1980)CrossRefGoogle Scholar
  21. 21.
    Rosenfeld, A., Pfaltz, J.: Distance function on digital pictures. Pattern Recognition 1(1), 33–61 (1968)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Takahashi, K., Uemura, T., Ohya, J.: Neural-network-based real-time human body posture estimation. In: Proceedings of the IEEE Signal Processing Society Workshop Neural Networks for Signal Processing X, vol. 2, pp. 477–486 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Himanshu Prakash Jain
    • 1
  • Anbumani Subramanian
    • 2
  • Sukhendu Das
    • 1
  • Anurag Mittal
    • 1
  1. 1.Indian Institute of Technology MadrasIndia
  2. 2.HP Labs IndiaBangaloreIndia

Personalised recommendations