Real-Time Human Pose Recognition in Parts from Single Depth Images

  • Jamie Shotton
  • Andrew Fitzgibbon
  • Mat Cook
  • Toby Sharp
  • Mark Finocchio
  • Richard Moore
  • Alex Kipman
  • Andrew Blake
Part of the Studies in Computational Intelligence book series (SCI, volume 411)


This chapter describes a method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc.. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result into world space and finding local modes of a 3D non-parametric density. The system runs at around 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters.We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.


Body Part Training Image Depth Image Depth Camera Body Joint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: 3D human pose from silhouettes by relevance vector regression. In: Proc. CVPR (2004)Google Scholar
  2. 2.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9(7), 1545–1588 (1997)CrossRefGoogle Scholar
  3. 3.
    Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Ng, A.: Discriminative learning of markov random fields for segmentation of 3D scan data. In: Proc. CVPR (2005)Google Scholar
  4. 4.
    Autodesk MotionBuilderGoogle Scholar
  5. 5.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24 (2002)Google Scholar
  6. 6.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: Proc. ICCV (2009)Google Scholar
  7. 7.
    Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: Proc. CVPR (1998)Google Scholar
  8. 8.
    Breiman, L.: Random forests. Mach. Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  9. 9.
    CMU Mocap Database,
  10. 10.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24(5) (2002)Google Scholar
  11. 11.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  12. 12.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR (2003)Google Scholar
  13. 13.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: Proc. CVPR (2010)Google Scholar
  14. 14.
    Gavrila, D.M.: Pedestrian Detection from a Moving Vehicle. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 37–49. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comp. Sci. 38 (1985)Google Scholar
  16. 16.
    Grest, D., Woetzel, J., Koch, R.: Nonlinear Body Pose Estimation from Depth Images. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 285–292. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Ioffe, S., Forsyth, D.: Probabilistic methods for finding people. IJCV 43(1), 45–68 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Kalogerakis, E., Hertzmann, A., Singh, K.: Learning 3D mesh segmentation and labeling. ACM Trans. Graphics 29(3) (2010)Google Scholar
  19. 19.
    Knoop, S., Vacek, S., Dillmann, R.: Sensor fusion for 3D human body tracking with an articulated 3D body model. In: Proc. ICRA (2006)Google Scholar
  20. 20.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Proc. CVPR, vol. 2, pp. 775–781 (2005)Google Scholar
  21. 21.
    Microsoft Corp. Redmond WA. Kinect for Xbox 360 Google Scholar
  22. 22.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. In: CVIU (2006)Google Scholar
  23. 23.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS (2006)Google Scholar
  24. 24.
    Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Proc. ICCV (2003)Google Scholar
  25. 25.
    Navaratnam, R., Fitzgibbon, A.W., Cipolla, R.: The joint manifold model for semi-supervised multi-valued regression. In: Proc. ICCV (2007)Google Scholar
  26. 26.
    Ning, H., Xu, W., Gong, Y., Huang, T.S.: Discriminative learning of visual words for 3D human pose estimation. In: Proc. CVPR (2008)Google Scholar
  27. 27.
    Okada, R., Soatto, S.: Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 434–445. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  28. 28.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: Proc. ICRA (2010)Google Scholar
  29. 29.
    Poppe, R.: Vision-based human motion analysis: An overview. CVIU 108 (2007)Google Scholar
  30. 30.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. (1986)Google Scholar
  31. 31.
    Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: Proc. CVPR (2003)Google Scholar
  32. 32.
    Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., Torr, P.H.S.: Randomized trees for human pose detection. In: Proc. CVPR (2008)Google Scholar
  33. 33.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: Proc. ICCV (2003)Google Scholar
  34. 34.
    Sharp, T.: Implementing Decision Trees and Forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  35. 35.
    Shepherd, B.A.: An appraisal of a decision tree approach to image classification. In: IJCAI (1983)Google Scholar
  36. 36.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proc. CVPR (2008)Google Scholar
  37. 37.
    Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: CVCG at CVPR (2010)Google Scholar
  38. 38.
    Sidenbladh, H., Black, M.J., Sigal, L.: Implicit Probabilistic Models of Human Motion for Synthesis and Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  39. 39.
    Sigal, L., Bhatia, S., Roth, S., Black, M.J., Isard, M.: Tracking loose-limbed people. In: Proc. CVPR (2004)Google Scholar
  40. 40.
    Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proc. CVPR (2008)Google Scholar
  41. 41.
    Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: Proc. CVPR (2008)Google Scholar
  42. 42.
    Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. In: Proc. ACM SIGGRAPH (2009)Google Scholar
  43. 43.
    Winn, J., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: Proc. CVPR (2006)Google Scholar
  44. 44.
    Zhu, Y., Fujimura, K.: Constrained Optimization for Human Pose Estimation from Depth Sequences. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 408–418. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2013

Authors and Affiliations

  • Jamie Shotton
    • 1
  • Andrew Fitzgibbon
    • 1
  • Mat Cook
    • 1
  • Toby Sharp
    • 1
  • Mark Finocchio
    • 1
  • Richard Moore
    • 1
  • Alex Kipman
    • 1
  • Andrew Blake
    • 1
  1. 1.Microsoft Research Cambridge and Xbox IncubationCambridgeUK

Personalised recommendations