Advertisement

Twin Gaussian Processes for Structured Prediction

  • Liefeng Bo
  • Cristian Sminchisescu
Article

Abstract

We describe twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.

Keywords

Structured prediction Gaussian processes 3d human pose reconstruction Feature extraction Video processing 

References

  1. Agarwal, A., & Triggs, B. (2006). Recovering 3d human pose from monocular images. IEEE transactions on pattern analysis and machine intelligence. Google Scholar
  2. Bar-Shalom, Y., & Fortman, T. (1988). Tracking and data association. San Diego: Academic Press. zbMATHGoogle Scholar
  3. Battu, B., Krappers, A., & Koenderink, J. (2007). Ambiguity in pictorial depth. Perception 36. Google Scholar
  4. Bishop, C., & Svensen, M. (2003). Bayesian mixtures of experts. In Uncertainty in artificial intelligence, 2003. Google Scholar
  5. Blake, A., North, B., & Isard, M. (1999). Learning multi-class dynamics. Advances in Neural Information Processing Systems, 11, 389–395. Google Scholar
  6. Bo, L., & Sminchisescu, C. (2008). Twin Gaussian processes for structured prediction. Snowbird Learning, April. Google Scholar
  7. Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE conference on computer vision and pattern recognition, 2008. Google Scholar
  8. Brubaker, M., & Fleet, D. (2008). The kneed walker for human pose tracking. In IEEE international conference on computer vision and pattern recognition, 2008. Google Scholar
  9. Choo, K., & Fleet, D. (2001). People tracking using hybrid Monte Carlo filtering. In IEEE international conference on computer vision, 2001. Google Scholar
  10. CMU (2003). Human Motion DataBase. Online at http://mocap.cs.cmu.edu/search.html.
  11. Cortes, C., Mohri, M., & Weston, J. (2005). A general regression technique for learning transductions. In International conference on machine learning (pp. 153–160) 2005. Google Scholar
  12. Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. S. (2001a). On kernel-target alignment. In Advances in neural information processing systems (pp. 367–373) 2001. Google Scholar
  13. Cristianini, N., Shawe-Taylor, J., & Kandola, J. S. (2001b). Spectral kernel methods for clustering. In Advances in neural information processing systems (pp. 649–655) 2001. Google Scholar
  14. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE international conference on computer vision and pattern recognition, 2005. Google Scholar
  15. Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In IEEE international conference on computer vision and pattern recognition, 2000. Google Scholar
  16. Deutscher, J., Davidson, A., & Reid, I. (2001). Articulated partitioning of high dimensional search spaces associated with articulated body motion capture. In IEEE international conference on computer vision and pattern recognition, 2001. Google Scholar
  17. Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222. CrossRefGoogle Scholar
  18. Elgammal, A., & Lee, C. (2004). Inferring 3d body pose from silhouettes using activity manifold learning. In IEEE international conference on computer vision and pattern recognition, 2004. Google Scholar
  19. Geurts, P., Wehenkel, L., & d’Alché Buc, F. (2006). Kernelizing the output of tree-based methods. In International conference on machine learning (pp. 345–352) 2006. Google Scholar
  20. Geurts, P., Wehenkel, L., & d’Alché Buc, F. (2007). Gradient boosting for kernelized output spaces. In International conference on machine learning, New York, NY, USA (pp. 289–296) 2007. Google Scholar
  21. Gretton, A., Bousquet, O., Smola, A. J., & Schölkopf, B. (2005a). Measuring statistical dependence with Hilbert-Schmidt norms. In S. Jain & W.-S. Lee (Eds.), Proceedings algorithmic learning theory, 2005. Google Scholar
  22. Gretton, A., Herbrich, R., Smola, A. J., Bousquet, O., & Schölkopf, B. (2005b). Kernel methods for measuring independence. Journal of Machine Learning Research, 6, 2075–2129. Google Scholar
  23. Guzman, A. N., & Holden, S. (2007). Twinned Gaussian processes. In Advances in neural information processing systems, December 2007. Google Scholar
  24. Hinton, G. E., & Roweis, S. T. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833–840) 2002. Google Scholar
  25. Isard, M., & Blake, A. (1998). CONDENSATION—conditional density propagation for visual tracking. International Journal of Computer Vision. Google Scholar
  26. Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2006). Semi-supervised hierarchical models for 3D human pose reconstruction. In IEEE international conference on computer vision and pattern recognition, 2006. Google Scholar
  27. Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2007). Spectral latent variable models for perceptual inference. In IEEE International Conference on Computer Vision, Vol. 1, 2007. Google Scholar
  28. Kehl, R., Bray, M., & Gool, L. V. (2005). Full body tracking from multiple views using stochastic sampling. In IEEE international conference on computer vision and pattern recognition, 2005. Google Scholar
  29. Koenderink, J. (1998). Pictorial relief. Philosophical Transactions Royal Society London A 356. Google Scholar
  30. Koenderink, J., & van Doorn, A. (1979). The internal representation of solid shape with respect to vision. Biological Cybernetics 32(3). Google Scholar
  31. Krauthgamer, R., & Lee, J. R. (2004). Navigating nets: simple algorithms for proximity search. In SODA ’04: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms (pp. 798–807) 2004. Google Scholar
  32. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning, 2001. Google Scholar
  33. Lawrence, N. (2005). Probabilistic non-linear component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816. MathSciNetGoogle Scholar
  34. Lee, H. J., & Chen, Z. (1985). Determination of 3D human body postures from a single view. Computer Vision, Graphics and Image Processing, 30, 148–168. CrossRefMathSciNetGoogle Scholar
  35. Li, R., Yang, M., Sclaroff, S., & Tian, T. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision, 2006. Google Scholar
  36. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2). Google Scholar
  37. Memisevic, R. (2006). Kernel information embeddings. In International conference on machine learning, 2006. Google Scholar
  38. Morris, D., & Rehg, J. (1998). Singularity analysis for articulated object tracking. In IEEE international conference on computer vision and pattern recognition (pp. 289–296) 1998. Google Scholar
  39. Neal, R. (1998). Annealed importance sampling (Technical Report 9805). Department of Statistics, University of Toronto. Google Scholar
  40. Poppe, R. (2007). Evaluating example-based human pose estimation: Experiments on HumanEva sets. In HumanEva Workshop CVPR, 2007. Google Scholar
  41. Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge: MIT Press. Google Scholar
  42. Rosales, R., & Sclaroff, S. (2002). Learning body pose via specialized maps. In Advances in neural information processing systems, 2002. Google Scholar
  43. Roth, S., Sigal, L., & Black, M. (2004). Gibbs likelihoods for Bayesian tracking. In IEEE international conference on computer vision and pattern recognition, 2004. Google Scholar
  44. Schaal, S., Atkeson, C., & Moore, A. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73. CrossRefGoogle Scholar
  45. Serre, T., Wolf, L., Bileschi, S., & Riesenhuber, M. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426. CrossRefGoogle Scholar
  46. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter sensitive hashing. In IEEE international conference on computer vision, 2003. Google Scholar
  47. Sidenbladh, H., & Black, M. (2001). Learning image statistics for Bayesian tracking. In IEEE international conference on computer vision, 2001. Google Scholar
  48. Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision, 2000. Google Scholar
  49. Sidenbladh, H., Black, M., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In European conference on computer vision, 2002. Google Scholar
  50. Sigal, L., & Black, M. (2006). HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University. Google Scholar
  51. Sigal, L., Balan, A., & Black, M. J. (2007). Combined discriminative and generative articulated pose and non-rigid shape estimation. In Advances in neural information processing systems, 2007. Google Scholar
  52. Sminchisescu, C. (2002). Consistency and coupling in human model likelihoods. In IEEE international conference on automatic face and gesture recognition (pp. 27–32). Washington, DC, 2002. Google Scholar
  53. Sminchisescu, C., & Jepson, A. (2004a). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (pp. 759–766). Banff, 2004. Google Scholar
  54. Sminchisescu, C., & Jepson, A. (2004b). Variational mixture smoothing for non-linear dynamical systems. In IEEE international conference on computer vision and pattern recognition (Vol. 2, pp. 608–615). Washington, DC, 2004. Google Scholar
  55. Sminchisescu, C., & Telea, A. (2002). Human pose estimation from silhouettes. A consistent approach using distance level sets. In WSCG international conference for computer graphics, visualization and computer vision, Czech Republic, 2002. Google Scholar
  56. Sminchisescu, C., & Triggs, B. (2001). Covariance-scaled sampling for monocular 3D body tracking. In IEEE international conference on computer vision and pattern recognition (Vol. 1, pp. 447–454). Hawaii, 2001. Google Scholar
  57. Sminchisescu, C., & Triggs, B. (2002a). Building roadmaps of local minima of visual models. In European conference on computer vision (Vol. 1, pp. 566–582). Copenhagen, 2002. Google Scholar
  58. Sminchisescu, C., & Triggs, B. (2002b). Hyperdynamics importance sampling. In European conference on computer vision (Vol. 1, pp. 769–783). Copenhagen, 2002. Google Scholar
  59. Sminchisescu, C., & Triggs, B. (2003). Kinematic jump processes for monocular 3D human tracking. In IEEE international conference on computer vision and pattern recognition (Vol. 1, pp. 69–76). Madison, 2003. Google Scholar
  60. Sminchisescu, C., & Triggs, B. (2005). Mapping minima and transitions in visual models. International Journal of Computer Vision 61(1). Google Scholar
  61. Sminchisescu, C., & Welling, M. (2007). Generalized darting Monte-Carlo. In Artificial Intelligence and Statistics, Vol. 1, 2007. Google Scholar
  62. Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Conditional visual tracking in kernel space. In Advances in neural information processing systems, 2005. Google Scholar
  63. Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2006). Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE international conference on computer vision and pattern recognition, 2006. Google Scholar
  64. Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). BM 3 E: discriminative density propagation for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar
  65. Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In International conference on machine learning (pp. 911–918) 2000. Google Scholar
  66. Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, 2004. Google Scholar
  67. Tresp, V. (2000). Mixtures of Gaussian processes. In Advances in neural information processing systems, 2000. Google Scholar
  68. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning, 2004. Google Scholar
  69. Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (2005). Priors for people tracking in small training sets. In IEEE international conference on computer vision, 2005. Google Scholar
  70. Vincent, P., & Bengio, Y. (2002). Kernel matching pursuit. Machine Learning, 48, 165–187. zbMATHCrossRefGoogle Scholar
  71. Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In IEEE international conference on computer vision and pattern recognition, 2008. Google Scholar
  72. Wang, J., Fleet, D. J., & Hertzmann, A. (2008). Gaussian process dynamical models. In IEEE transactions on pattern analysis and machine intelligence, 2008. Google Scholar
  73. Weston, J., Chapelle, O., Elisseeff, A., Scholkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems, 2002. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.TTI-ChicagoChicagoUSA
  2. 2.University of Bonn, INSBonnGermany

Personalised recommendations