Metric Regression Forests for Correspondence Estimation

Pons-Moll, Gerard; Taylor, Jonathan; Shotton, Jamie; Hertzmann, Aaron; Fitzgibbon, Andrew

doi:10.1007/s11263-015-0818-9

Metric Regression Forests for Correspondence Estimation

Published: 11 April 2015

Volume 113, pages 163–175, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Gerard Pons-Moll¹,
Jonathan Taylor²,
Jamie Shotton²,
Aaron Hertzmann³ &
…
Andrew Fitzgibbon²

907 Accesses
40 Citations
Explore all metrics

Abstract

We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training objective designed to directly minimize the entropy of distributions in a metric space. When applied to a model surface, viewed as a metric space defined by geodesic distances, MSIG aims to minimize image-to-model correspondence uncertainty. A naïve implementation of MSIG would scale quadratically with the number of training examples. As this is intractable for large datasets, we propose a method to compute MSIG in linear time. Our method is a principled generalization of the proxy classification objective, and does not require an extrinsic isometric embedding of the model surface in Euclidean space. Our experiments demonstrate that this leads to correspondences that are considerably more accurate than state of the art, using far fewer training images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Note that this is an extended version of Pons-Moll et al. (2013). Some portions of Taylor et al. (2012) have been included for clarity.
Distinct subscripts indicate whether \(p\) and \(l\) refer to vertices or spheres.

References

Baak, A., Müller, M., Bharaj, G., Seidel, H., & Theobalt, C. (2011). A data-driven approach for real-time full body pose reconstruction from a depth camera. In: IEEE international conference on computer vision pp. 1092–1099.
Balan, A., Sigal, L., Black, M., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In: IEEE conference on computer vision and pattern recognition.
Bentley, J. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Article MATH MathSciNet Google Scholar
Besl, P., & McKay, N. (1992). A method for registration of 3d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 239–256.
Article Google Scholar
Black, M., & Rangarajan, A. (1996). On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal on Computer Vision, 19(1), 57–91.
Article Google Scholar
Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal on Computer Vision, 87, 28–52.
Article Google Scholar
Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal on Computer Vision, 56(3), 179–194.
Article Google Scholar
Breiman, L. (1999). Random forests. Berkeley: UC. (Technical Report TR567).
Brubaker, M., Fleet, D., & Hertzmann, A. (2010). Physics-based person tracking using the anthropomorphic walker. In: International journal on computer vision.
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8(1), 75–85.
Google Scholar
Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. London: Springer.
Book Google Scholar
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal on Computer Vision, 61(2), 185–205.
Article Google Scholar
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1), 269–271.
Article MATH MathSciNet Google Scholar
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal on Computer Vision, 87, 75–92.
Article Google Scholar
Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. PAMI, 33(11), 2188–2202.
Article Google Scholar
Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In: European conference on computer vision.
Ganapathi, V., Plagemann, C., Thrun, S., & Koller, D. (2010). Real time motion capture using a time-of-flight camera. In: Conference in computer vision and pattern recognition.
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In: IEEE international conference on computer vision, pp. 415–422.
Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Crystallographica, 32(5), 922–923.
Article Google Scholar
Lee, C., & Elgammal, A. (2010). Coupled visual and kinematic manifold models for tracking. International Journal on Computer Vision, 87, 118–139.
Liu, W., & White, A. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15(1), 25–41.
Google Scholar
Memisevic, R., Sigal, L., & Fleet, D. J. (2012). Shared kernel information embedding for discriminative inference. PAMI, 34(4), 778–790.
Article Google Scholar
Nowozin, S. (2012). Improved information gain estimates for decision tree induction. In: ICML.
Parzen, E. (1962). On estimation of a probability density function and mode. The Aannals of Mathematical Statistics, 33(3), 1065–1076.
Article MATH MathSciNet Google Scholar
Pons-Moll, G., Baak, A., Gall, J., Leal-Taixe, L., Mueller, M., Seidel, H., & Rosenhahn, B. (2011). Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. In: International conference on computer vision.
Pons-Moll, G., Leal-Taixé, L., Truong, T., & Rosenhahn, B. (2011). Efficient and robust shape matching for model based human motion capture. In: DAGM.
Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans (pp. 139–170). London: Springer.
Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., & Fitzgibbon, A. (2013). Metric regression forests for human pose estimation. In: British machine vision conference (BMVC).
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: IEEE conference in computer vision and pattern recognition, pp. 1297–1304.
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In: Conference in computer vision and pattern recognition.
Silverman, B. (1986). Density estimation for statistics and data analysis (Vol. 26). London: CRC press.
Sminchisescu, C., Bo, L., Ionescu, C., & Kanaujia, A. (2011). Feature-based pose estimation. In Visual analysis of humans (pp. 225–251). London: Springer.
Stoll, C., Hasler, N., Gall, J., Seidel, H., & Theobalt, C. (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp. 951–958.
Taylor, J., Shotton, J., Sharp, T., & Fitzgibbon, A. (2012). The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: Conference in computer vision and pattern recognition.
Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference in computer vision and pattern recognition, pp. 1–8.

Download references

Author information

Authors and Affiliations

Max Planck for Intelligent Systems, Tübingen, Germany
Gerard Pons-Moll
Microsoft Research, Cambridge, UK
Jonathan Taylor, Jamie Shotton & Andrew Fitzgibbon
Adobe Research, San Francisco, USA
Aaron Hertzmann

Authors

Gerard Pons-Moll
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Shotton
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Hertzmann
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Fitzgibbon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerard Pons-Moll.

Additional information

Communicated by Tilo Burghardt, Majid Mirmehdi, Walterio Mayol-Cuevas, and Dima Damen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pons-Moll, G., Taylor, J., Shotton, J. et al. Metric Regression Forests for Correspondence Estimation. Int J Comput Vis 113, 163–175 (2015). https://doi.org/10.1007/s11263-015-0818-9

Download citation

Received: 27 May 2014
Accepted: 17 March 2015
Published: 11 April 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11263-015-0818-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metric Regression Forests for Correspondence Estimation

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Metric Regression Forests for Correspondence Estimation

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation