International Journal of Computer Vision

, Volume 113, Issue 3, pp 163–175 | Cite as

Metric Regression Forests for Correspondence Estimation

  • Gerard Pons-Moll
  • Jonathan Taylor
  • Jamie Shotton
  • Aaron Hertzmann
  • Andrew Fitzgibbon


We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training objective designed to directly minimize the entropy of distributions in a metric space. When applied to a model surface, viewed as a metric space defined by geodesic distances, MSIG aims to minimize image-to-model correspondence uncertainty. A naïve implementation of MSIG would scale quadratically with the number of training examples. As this is intractable for large datasets, we propose a method to compute MSIG in linear time. Our method is a principled generalization of the proxy classification objective, and does not require an extrinsic isometric embedding of the model surface in Euclidean space. Our experiments demonstrate that this leads to correspondences that are considerably more accurate than state of the art, using far fewer training images.


Human pose estimation Model based pose estimation Correspondence estimation Depth images Metric regression forests 


  1. Baak, A., Müller, M., Bharaj, G., Seidel, H., & Theobalt, C. (2011). A data-driven approach for real-time full body pose reconstruction from a depth camera. In: IEEE international conference on computer vision pp. 1092–1099.Google Scholar
  2. Balan, A., Sigal, L., Black, M., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In: IEEE conference on computer vision and pattern recognition.Google Scholar
  3. Bentley, J. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.zbMATHMathSciNetCrossRefGoogle Scholar
  4. Besl, P., & McKay, N. (1992). A method for registration of 3d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 239–256.CrossRefGoogle Scholar
  5. Black, M., & Rangarajan, A. (1996). On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal on Computer Vision, 19(1), 57–91.CrossRefGoogle Scholar
  6. Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal on Computer Vision, 87, 28–52.CrossRefGoogle Scholar
  7. Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal on Computer Vision, 56(3), 179–194.CrossRefGoogle Scholar
  8. Breiman, L. (1999). Random forests. Berkeley: UC. (Technical Report TR567).Google Scholar
  9. Brubaker, M., Fleet, D., & Hertzmann, A. (2010). Physics-based person tracking using the anthropomorphic walker. In: International journal on computer vision.Google Scholar
  10. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8(1), 75–85.Google Scholar
  11. Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. London: Springer.CrossRefGoogle Scholar
  12. Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal on Computer Vision, 61(2), 185–205.CrossRefGoogle Scholar
  13. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1), 269–271.zbMATHMathSciNetCrossRefGoogle Scholar
  14. Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal on Computer Vision, 87, 75–92.CrossRefGoogle Scholar
  15. Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. PAMI, 33(11), 2188–2202.CrossRefGoogle Scholar
  16. Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In: European conference on computer vision. Google Scholar
  17. Ganapathi, V., Plagemann, C., Thrun, S., & Koller, D. (2010). Real time motion capture using a time-of-flight camera. In: Conference in computer vision and pattern recognition.Google Scholar
  18. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In: IEEE international conference on computer vision, pp. 415–422.Google Scholar
  19. Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Crystallographica, 32(5), 922–923.CrossRefGoogle Scholar
  20. Lee, C., & Elgammal, A. (2010). Coupled visual and kinematic manifold models for tracking. International Journal on Computer Vision, 87, 118–139.Google Scholar
  21. Liu, W., & White, A. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15(1), 25–41.Google Scholar
  22. Memisevic, R., Sigal, L., & Fleet, D. J. (2012). Shared kernel information embedding for discriminative inference. PAMI, 34(4), 778–790.CrossRefGoogle Scholar
  23. Nowozin, S. (2012). Improved information gain estimates for decision tree induction. In: ICML.Google Scholar
  24. Parzen, E. (1962). On estimation of a probability density function and mode. The Aannals of Mathematical Statistics, 33(3), 1065–1076.zbMATHMathSciNetCrossRefGoogle Scholar
  25. Pons-Moll, G., Baak, A., Gall, J., Leal-Taixe, L., Mueller, M., Seidel, H., & Rosenhahn, B. (2011). Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. In: International conference on computer vision.Google Scholar
  26. Pons-Moll, G., Leal-Taixé, L., Truong, T., & Rosenhahn, B. (2011). Efficient and robust shape matching for model based human motion capture. In: DAGM.Google Scholar
  27. Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans (pp. 139–170). London: Springer.Google Scholar
  28. Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., & Fitzgibbon, A. (2013). Metric regression forests for human pose estimation. In: British machine vision conference (BMVC).Google Scholar
  29. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: IEEE conference in computer vision and pattern recognition, pp. 1297–1304.Google Scholar
  30. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In: Conference in computer vision and pattern recognition.Google Scholar
  31. Silverman, B. (1986). Density estimation for statistics and data analysis (Vol. 26). London: CRC press.Google Scholar
  32. Sminchisescu, C., Bo, L., Ionescu, C., & Kanaujia, A. (2011). Feature-based pose estimation. In Visual analysis of humans (pp. 225–251). London: Springer.Google Scholar
  33. Stoll, C., Hasler, N., Gall, J., Seidel, H., & Theobalt, C. (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp. 951–958.Google Scholar
  34. Taylor, J., Shotton, J., Sharp, T., & Fitzgibbon, A. (2012). The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: Conference in computer vision and pattern recognition.Google Scholar
  35. Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference in computer vision and pattern recognition, pp. 1–8.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Gerard Pons-Moll
    • 1
  • Jonathan Taylor
    • 2
  • Jamie Shotton
    • 2
  • Aaron Hertzmann
    • 3
  • Andrew Fitzgibbon
    • 2
  1. 1.Max Planck for Intelligent SystemsTübingenGermany
  2. 2.Microsoft ResearchCambridgeUK
  3. 3.Adobe ResearchSan FranciscoUSA

Personalised recommendations