Skip to main content
Log in

Twin Gaussian Processes for Structured Prediction

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We describe twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, A., & Triggs, B. (2006). Recovering 3d human pose from monocular images. IEEE transactions on pattern analysis and machine intelligence.

  • Bar-Shalom, Y., & Fortman, T. (1988). Tracking and data association. San Diego: Academic Press.

    MATH  Google Scholar 

  • Battu, B., Krappers, A., & Koenderink, J. (2007). Ambiguity in pictorial depth. Perception 36.

  • Bishop, C., & Svensen, M. (2003). Bayesian mixtures of experts. In Uncertainty in artificial intelligence, 2003.

  • Blake, A., North, B., & Isard, M. (1999). Learning multi-class dynamics. Advances in Neural Information Processing Systems, 11, 389–395.

    Google Scholar 

  • Bo, L., & Sminchisescu, C. (2008). Twin Gaussian processes for structured prediction. Snowbird Learning, April.

  • Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE conference on computer vision and pattern recognition, 2008.

  • Brubaker, M., & Fleet, D. (2008). The kneed walker for human pose tracking. In IEEE international conference on computer vision and pattern recognition, 2008.

  • Choo, K., & Fleet, D. (2001). People tracking using hybrid Monte Carlo filtering. In IEEE international conference on computer vision, 2001.

  • CMU (2003). Human Motion DataBase. Online at http://mocap.cs.cmu.edu/search.html.

  • Cortes, C., Mohri, M., & Weston, J. (2005). A general regression technique for learning transductions. In International conference on machine learning (pp. 153–160) 2005.

  • Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. S. (2001a). On kernel-target alignment. In Advances in neural information processing systems (pp. 367–373) 2001.

  • Cristianini, N., Shawe-Taylor, J., & Kandola, J. S. (2001b). Spectral kernel methods for clustering. In Advances in neural information processing systems (pp. 649–655) 2001.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE international conference on computer vision and pattern recognition, 2005.

  • Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In IEEE international conference on computer vision and pattern recognition, 2000.

  • Deutscher, J., Davidson, A., & Reid, I. (2001). Articulated partitioning of high dimensional search spaces associated with articulated body motion capture. In IEEE international conference on computer vision and pattern recognition, 2001.

  • Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.

    Article  Google Scholar 

  • Elgammal, A., & Lee, C. (2004). Inferring 3d body pose from silhouettes using activity manifold learning. In IEEE international conference on computer vision and pattern recognition, 2004.

  • Geurts, P., Wehenkel, L., & d’Alché Buc, F. (2006). Kernelizing the output of tree-based methods. In International conference on machine learning (pp. 345–352) 2006.

  • Geurts, P., Wehenkel, L., & d’Alché Buc, F. (2007). Gradient boosting for kernelized output spaces. In International conference on machine learning, New York, NY, USA (pp. 289–296) 2007.

  • Gretton, A., Bousquet, O., Smola, A. J., & Schölkopf, B. (2005a). Measuring statistical dependence with Hilbert-Schmidt norms. In S. Jain & W.-S. Lee (Eds.), Proceedings algorithmic learning theory, 2005.

  • Gretton, A., Herbrich, R., Smola, A. J., Bousquet, O., & Schölkopf, B. (2005b). Kernel methods for measuring independence. Journal of Machine Learning Research, 6, 2075–2129.

    Google Scholar 

  • Guzman, A. N., & Holden, S. (2007). Twinned Gaussian processes. In Advances in neural information processing systems, December 2007.

  • Hinton, G. E., & Roweis, S. T. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833–840) 2002.

  • Isard, M., & Blake, A. (1998). CONDENSATION—conditional density propagation for visual tracking. International Journal of Computer Vision.

  • Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2006). Semi-supervised hierarchical models for 3D human pose reconstruction. In IEEE international conference on computer vision and pattern recognition, 2006.

  • Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2007). Spectral latent variable models for perceptual inference. In IEEE International Conference on Computer Vision, Vol. 1, 2007.

  • Kehl, R., Bray, M., & Gool, L. V. (2005). Full body tracking from multiple views using stochastic sampling. In IEEE international conference on computer vision and pattern recognition, 2005.

  • Koenderink, J. (1998). Pictorial relief. Philosophical Transactions Royal Society London A 356.

  • Koenderink, J., & van Doorn, A. (1979). The internal representation of solid shape with respect to vision. Biological Cybernetics 32(3).

  • Krauthgamer, R., & Lee, J. R. (2004). Navigating nets: simple algorithms for proximity search. In SODA ’04: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms (pp. 798–807) 2004.

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning, 2001.

  • Lawrence, N. (2005). Probabilistic non-linear component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.

    MathSciNet  Google Scholar 

  • Lee, H. J., & Chen, Z. (1985). Determination of 3D human body postures from a single view. Computer Vision, Graphics and Image Processing, 30, 148–168.

    Article  MathSciNet  Google Scholar 

  • Li, R., Yang, M., Sclaroff, S., & Tian, T. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision, 2006.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2).

  • Memisevic, R. (2006). Kernel information embeddings. In International conference on machine learning, 2006.

  • Morris, D., & Rehg, J. (1998). Singularity analysis for articulated object tracking. In IEEE international conference on computer vision and pattern recognition (pp. 289–296) 1998.

  • Neal, R. (1998). Annealed importance sampling (Technical Report 9805). Department of Statistics, University of Toronto.

  • Poppe, R. (2007). Evaluating example-based human pose estimation: Experiments on HumanEva sets. In HumanEva Workshop CVPR, 2007.

  • Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge: MIT Press.

    Google Scholar 

  • Rosales, R., & Sclaroff, S. (2002). Learning body pose via specialized maps. In Advances in neural information processing systems, 2002.

  • Roth, S., Sigal, L., & Black, M. (2004). Gibbs likelihoods for Bayesian tracking. In IEEE international conference on computer vision and pattern recognition, 2004.

  • Schaal, S., Atkeson, C., & Moore, A. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.

    Article  Google Scholar 

  • Serre, T., Wolf, L., Bileschi, S., & Riesenhuber, M. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426.

    Article  Google Scholar 

  • Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter sensitive hashing. In IEEE international conference on computer vision, 2003.

  • Sidenbladh, H., & Black, M. (2001). Learning image statistics for Bayesian tracking. In IEEE international conference on computer vision, 2001.

  • Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision, 2000.

  • Sidenbladh, H., Black, M., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In European conference on computer vision, 2002.

  • Sigal, L., & Black, M. (2006). HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University.

  • Sigal, L., Balan, A., & Black, M. J. (2007). Combined discriminative and generative articulated pose and non-rigid shape estimation. In Advances in neural information processing systems, 2007.

  • Sminchisescu, C. (2002). Consistency and coupling in human model likelihoods. In IEEE international conference on automatic face and gesture recognition (pp. 27–32). Washington, DC, 2002.

  • Sminchisescu, C., & Jepson, A. (2004a). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (pp. 759–766). Banff, 2004.

  • Sminchisescu, C., & Jepson, A. (2004b). Variational mixture smoothing for non-linear dynamical systems. In IEEE international conference on computer vision and pattern recognition (Vol. 2, pp. 608–615). Washington, DC, 2004.

  • Sminchisescu, C., & Telea, A. (2002). Human pose estimation from silhouettes. A consistent approach using distance level sets. In WSCG international conference for computer graphics, visualization and computer vision, Czech Republic, 2002.

  • Sminchisescu, C., & Triggs, B. (2001). Covariance-scaled sampling for monocular 3D body tracking. In IEEE international conference on computer vision and pattern recognition (Vol. 1, pp. 447–454). Hawaii, 2001.

  • Sminchisescu, C., & Triggs, B. (2002a). Building roadmaps of local minima of visual models. In European conference on computer vision (Vol. 1, pp. 566–582). Copenhagen, 2002.

  • Sminchisescu, C., & Triggs, B. (2002b). Hyperdynamics importance sampling. In European conference on computer vision (Vol. 1, pp. 769–783). Copenhagen, 2002.

  • Sminchisescu, C., & Triggs, B. (2003). Kinematic jump processes for monocular 3D human tracking. In IEEE international conference on computer vision and pattern recognition (Vol. 1, pp. 69–76). Madison, 2003.

  • Sminchisescu, C., & Triggs, B. (2005). Mapping minima and transitions in visual models. International Journal of Computer Vision 61(1).

  • Sminchisescu, C., & Welling, M. (2007). Generalized darting Monte-Carlo. In Artificial Intelligence and Statistics, Vol. 1, 2007.

  • Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Conditional visual tracking in kernel space. In Advances in neural information processing systems, 2005.

  • Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2006). Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE international conference on computer vision and pattern recognition, 2006.

  • Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). BM 3 E: discriminative density propagation for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In International conference on machine learning (pp. 911–918) 2000.

  • Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, 2004.

  • Tresp, V. (2000). Mixtures of Gaussian processes. In Advances in neural information processing systems, 2000.

  • Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning, 2004.

  • Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (2005). Priors for people tracking in small training sets. In IEEE international conference on computer vision, 2005.

  • Vincent, P., & Bengio, Y. (2002). Kernel matching pursuit. Machine Learning, 48, 165–187.

    Article  MATH  Google Scholar 

  • Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In IEEE international conference on computer vision and pattern recognition, 2008.

  • Wang, J., Fleet, D. J., & Hertzmann, A. (2008). Gaussian process dynamical models. In IEEE transactions on pattern analysis and machine intelligence, 2008.

  • Weston, J., Chapelle, O., Elisseeff, A., Scholkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems, 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristian Sminchisescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bo, L., Sminchisescu, C. Twin Gaussian Processes for Structured Prediction. Int J Comput Vis 87, 28–52 (2010). https://doi.org/10.1007/s11263-008-0204-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0204-y

Keywords

Navigation