Efficient Human Pose Estimation from Single Depth Images

  • J. Shotton
  • R. Girshick
  • A. Fitzgibbon
  • T. Sharp
  • M. Cook
  • M. Finocchio
  • R. Moore
  • P. Kohli
  • A. Criminisi
  • A. Kipman
  • A. Blake

Abstract

We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image, without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, and field-of-view cropping. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features, and parallelizable decision forests, both approaches can run super-realtime on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Parts of this chapter are reprinted, with permission, from Shotton et al., Proc IEEE Conf. Computer Vision and Pattern Recognition (CVPR) (2011), © 2011 IEEE.

References

  1. 24.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24 Google Scholar
  2. 37.
    Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3D human pose annotations. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  3. 41.
    Bregler C, Malik J (1998) Tracking people with twists and exponential maps. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  4. 50.
    Brubaker MA, Fleet DJ, Hertzmann A (2010) Physics-based person tracking using the anthropomorphic walker. Int J Comput Vis Google Scholar
  5. 69.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5) Google Scholar
  6. 78.
    Criminisi A, Shotton J, Robertson D, Konukoglu E (2010) Regression forests for efficient anatomy detection and localization in CT studies. In: MICCAI workshop on medical computer vision: recognition techniques and applications in medical imaging, Beijing. Springer, Berlin Google Scholar
  7. 80.
    Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3) Google Scholar
  8. 107.
    Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  9. 117.
    Gall J, Lempitsky V (2009) Class-specific Hough forests for object detection. IEEE Trans Pattern Anal Mach Intell Google Scholar
  10. 121.
    Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. In: Proc IEEE conf computer vision and pattern recognition (CVPR). IEEE, New York Google Scholar
  11. 131.
    Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of general-activity human poses from depth images. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  12. 144.
    Grest D, Woetzel J, Koch R (2005) Nonlinear body pose estimation from depth images. In: Proc annual symposium of the German association for pattern recognition (DAGM) Google Scholar
  13. 155.
    Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2) Google Scholar
  14. 186.
    Knoop S, Vacek S, Dillmann R (2006) Sensor fusion for 3D human body tracking with an articulated 3D body model. In: Proc IEEE intl conf on robotics and automation (ICRA) Google Scholar
  15. 210.
    Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1–3) Google Scholar
  16. 214.
    Lepetit V, Lagger P, Fua P (2005) Randomized trees for real-time keypoint recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  17. 247.
    Microsoft Corporation Kinect for Windows and Xbox 360 Google Scholar
  18. 257.
    Müller J, Arens M (2010) Human pose estimation with implicit shape models. In: ARTEMIS Google Scholar
  19. 293.
    Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In: Proc IEEE intl conf on robotics and automation (ICRA) Google Scholar
  20. 333.
    Sharp T (2008) Implementing decision trees and forests on a GPU. In: Proc European conf on computer vision (ECCV). Springer, Berlin Google Scholar
  21. 338.
    Shotton J, Winn J, Rother C, Criminisi A (2006) TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proc European conf on computer vision (ECCV). Springer, Berlin Google Scholar
  22. 343.
    Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  23. 344.
    Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A, Blake A (2012) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Mach Intell Google Scholar
  24. 345.
    Siddiqui M, Medioni G (2010) Human pose estimation from a single view point, real-time range sensor. In: CVCG at CVPR Google Scholar
  25. 346.
    Sigal L, Bhatia S, Roth S, Black MJ, Isard M (2004) Tracking loose-limbed people. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  26. 378.
    Urtasun R, Darrell T (2008) Local probabilistic regression for activity-independent human pose inference. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  27. 392.
    Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1) Google Scholar
  28. 396.
    Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. In: Proc ACM SIGGRAPH Google Scholar
  29. 403.
    Winn J, Shotton J (2006) The layout consistent random field for recognizing and segmenting partially occluded objects. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  30. 423.
    Zhu Y, Fujimura K (2007) Constrained optimization for human pose estimation from depth sequences. In: Proc Asian conf on computer vision (ACCV) Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • J. Shotton
    • 1
  • R. Girshick
    • 2
  • A. Fitzgibbon
    • 1
  • T. Sharp
    • 1
  • M. Cook
    • 1
  • M. Finocchio
    • 3
  • R. Moore
    • 4
  • P. Kohli
    • 1
  • A. Criminisi
    • 1
  • A. Kipman
    • 3
  • A. Blake
    • 1
  1. 1.Microsoft Research Ltd.CambridgeUK
  2. 2.University of CaliforniaBerkeleyUSA
  3. 3.Microsoft CorporationRedmondUSA
  4. 4.ST-EricssonRedmondUSA

Personalised recommendations