Efficient Estimation of Human Upper Body Pose in Static Depth Images

  • Brian Holt
  • Richard Bowden
Part of the Communications in Computer and Information Science book series (CCIS, volume 359)

Abstract

Automatic estimation of human pose has long been a goal of computer vision, to which a solution would have a wide range of applications. In this paper we formulate the pose estimation task within a regression and Hough voting framework to predict 2D joint locations from depth data captured by a consumer depth camera. In our approach the offset from each pixel to the location of each joint is predicted directly using random regression forests. The predictions are accumulated in Hough images which are treated as likelihood distributions where maxima correspond to joint location hypotheses. Our approach is evaluated on a publicly available dataset with good results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhu, Y., Fujimura, K.: A bayesian framework for human body pose tracking from depth image sequences. Sensors 10, 5280–5293 (2010)CrossRefGoogle Scholar
  2. 2.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: [36], pp. 755–762Google Scholar
  3. 3.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: [34]Google Scholar
  4. 4.
    Holt, B., Ong, E.J., Cooper, H., Bowden, R.: Putting the pieces together: Connected poselets for human pose estimation. In: Proceedings of the IEEE Workshop on Consumer Depth Cameras for Computer Vision, Barcelona, Spain (2011)Google Scholar
  5. 5.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: [32], pp. 1022–1029Google Scholar
  6. 6.
    Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: [34], pp. 617–624Google Scholar
  7. 7.
    Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression Forests for Efficient Anatomy Detection and Localization in CT Studies. In: Menze, B., Langs, G., Tu, Z., Criminisi, A. (eds.) MICCAI 2010. LNCS, vol. 6533, pp. 106–117. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Montillo, A., Ling, H.: Age regression from faces using random forests. In: Proceedings of the, Cairo, Egypt, pp. 2465–2468 (2009)Google Scholar
  9. 9.
    Reynolds, M., Doboš, J., Peel, L., Weyrich, T., Brostow, G.: Capturing time-of-flight data with confidence. In: [34]Google Scholar
  10. 10.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104, 90–126 (2006)CrossRefGoogle Scholar
  11. 11.
    Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 44–58 (2006)CrossRefGoogle Scholar
  12. 12.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: Proceedings of the IEEE International Conference on Computer Vision, Nice, France, p. 750 (2003)Google Scholar
  13. 13.
    Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. International Journal of Computer Vision 87, 28–52 (2010)CrossRefGoogle Scholar
  14. 14.
    Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., Torr, P.H.S.: Randomized trees for human pose detection. In: [35], pp. 1–8Google Scholar
  15. 15.
    Sigal, L., Black, M.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 2041–2048 (2006)Google Scholar
  16. 16.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: [33], pp. 227–240Google Scholar
  17. 17.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: [36], pp. 422–429Google Scholar
  18. 18.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. International Journal of Computer Vision 61, 55–79 (2005)CrossRefGoogle Scholar
  19. 19.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: [32], pp. 1014–1021Google Scholar
  20. 20.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: [33], pp. 168–181Google Scholar
  21. 21.
    Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: [34]Google Scholar
  22. 22.
    Eichner, M., Ferrari, V., Zurich, S.: Better appearance models for pictorial structures. In: Proceedings of the BMVA British Machine Vision Conference, London, UK, vol. 2, p. 6 (2009)Google Scholar
  23. 23.
    Singh, V.K., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: [33], pp. 314–327Google Scholar
  24. 24.
    Wang, Y., Mori, G.: Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 710–724. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Tian, T.P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: [36], pp. 81–88Google Scholar
  26. 26.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural computation 9, 1545–1588 (1997)CrossRefGoogle Scholar
  27. 27.
    Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1465–1479 (2006)CrossRefGoogle Scholar
  28. 28.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Chapman and Hall (1984)Google Scholar
  29. 29.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)MATHCrossRefGoogle Scholar
  30. 30.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: [35], pp. 1–8Google Scholar
  31. 31.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Proceedings of the NIPS, Vancouver, B.C., Canada, vol. 19, p. 1129. Citeseer (2006)Google Scholar
  32. 32.
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA (2009)Google Scholar
  33. 33.
    European Conference on Computer Vision. Proceedings of the European Conference on Computer Vision, Heraklion, Crete (2010)Google Scholar
  34. 34.
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Colorado Springs, USA (2011)Google Scholar
  35. 35.
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA (2008)Google Scholar
  36. 36.
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Brian Holt
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordU.K.

Personalised recommendations