Learning Generative Models for Multi-Activity Body Pose Estimation

Jaeggli, Tobias; Koller-Meier, Esther; Van Gool, Luc

doi:10.1007/s11263-008-0158-0

Learning Generative Models for Multi-Activity Body Pose Estimation

Published: 31 July 2008

Volume 83, pages 121–134, (2009)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Tobias Jaeggli¹,
Esther Koller-Meier¹ &
Luc Van Gool^1,2

381 Accesses
37 Citations
Explore all metrics

Abstract

We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, A., & Triggs, B. (2004a). 3D human pose from silhouettes by relevance vector regression. In IEEE conference on computer vision and pattern recognition (CVPR).
Agarwal, A., & Triggs, B. (2004b). Tracking articulated motion using a mixture of autoregressive models. In European conference on computer vision (ECCV).
Agarwal, A., & Triggs, B. (2005). Monocular human motion capture with a mixture of regressors. In IEEE CVPR workshop on vision for human-computer interaction.
Bailey, D. G. (2004). An efficient euclidean distance transform. In International workshop on combinatorial image analysis.
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society.
Doucet, A., Godsill, S., & Andrieu, C. (2000a). On sequentional Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing.
Doucet, A., Godsill, S., & West, M. (2000b). Monte Carlo filtering and smoothing with application to time-varying spectral estimation. In IEEE conference on acoustics, speech and signal processing (vol. II, pp. 701–704).
Elgammal, A., & Lee, C.-S. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In IEEE conference on computer vision and pattern recognition (CVPR).
Forney, G. D. (1973). The Viterbi algorithm. Proceedings of the IEEE, 61(3), 268–278.
Article MathSciNet Google Scholar
Forsyth, D. A., Arikan, O., Ikemoto, L., Brien, J. O., & Ramanan, D. (2006). Computational studies of human motion: Part 1. Computer Graphics and Vision, 1(2/3).
Grauman, K., Shakhnarovich, G., & Darrel, T. (2003). Inferring 3D structure with a statistical image-based shape model. International conference on computer vision (ICCV).
Isard, M. (2003). Pampas: Real-valued graphical models for computer vision. In IEEE conference on computer vision and pattern recognition (CVPR).
Isard, M., & Blake, A. (1998a). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28.
Article Google Scholar
Isard, M., & Blake, A. (1998b). A mixed-state CONDENSATION tracker with automatic model-switching. In International conference on computer vision (ICCV) (pp. 107–112).
Jaeggli, T., Koller-Meier, E., & Gool, L. V. (2006). Monocular tracking with a mixture of view-dependent learned models. In IV conference on articulated motion and deformable objects (AMDO).
Kschischang, F., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.
Article MATH MathSciNet Google Scholar
Lawrence, N. D. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.
MathSciNet Google Scholar
Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifolds for tracking. In International conference on computer vision (ICCV).
Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV) (pp. 137–150).
Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of non-linear manifold and dynamical models for high-dimensional time series. In International conference on computer vision (ICCV).
Lim, H., Camps, O. I., Sznaier, M., & Morariu, V. I. (2006). Dynamic appearance modeling for human tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 751–757).
Moeslund, T., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Article Google Scholar
Navaratnam, R., Fitzgibbon, A. W., & Cipolla, R. (2007). The joint manifold model for semi-supervised multi-valued regression. In International conference on computer vision (ICCV).
Pavlovic, V., Rehg, J. M., & MacCormick, J. (2001). Learning switching linear models of human motion. In Neural information processing systems.
Rosales, R., & Sclaroff, S. (2001). Learning body pose via specialized maps. In Neural information processing systems.
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Article Google Scholar
Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV) (pp. 702–718).
Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In IEEE conference on computer vision and pattern recognition (CVPR).
Sminchisescu, C., & Jepson, A. (2004). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (ICML).
Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. In IEEE conference on computer vision and pattern recognition (CVPR).
Sudderth, E. B., Ihler, A. T., Freeman, W. T., & Willsky, A. S. (2003). Nonparametric belief propagation. In IEEE conference on computer vision and pattern recognition (CVPR).
Sun, Y., Bray, M., Thayananthan, A., Yuanand, B., & Torr, P. (2006). Regression-based human motion capture from voxel data. In British machine vision conference.
Tenenbaum, J., de Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Article Google Scholar
Thayananthan, A., Navaratnam, R., Stenger, B., Torr, P., & Cipolla, R. (2006). Multivariate relevance vector machines for tracking. In European conference on computer vision (ECCV).
Tipping, M. (2000). The relevance vector machine. In Neural information processing systems.
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 238–245).
Wang, J. M., Fleet, D. J., & Hertzmann, A. (2006). Gaussian process dynamical models. In Neural information processing systems (pp. 1441–1448).
Wiberg, N. (1996). Codes and decoding on general graphs. PhD thesis, Department of Electrical Engineering, Linköping University, Sweden.
Yedidia, J., Freeman, W., & Weiss, Y. (2002). Understanding belief propagation and its generalizations (Technical report TR-2001-22). MERL.
Zivkovic, Z., & Verbeek, J. (2006). Transformation invariant component analysis for binary images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 254–259).

Download references

Author information

Authors and Affiliations

ETH Zurich, Zurich, Switzerland
Tobias Jaeggli, Esther Koller-Meier & Luc Van Gool
KU Leuven, Leuven, Belgium
Luc Van Gool

Authors

Tobias Jaeggli
View author publications
You can also search for this author in PubMed Google Scholar
Esther Koller-Meier
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Jaeggli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaeggli, T., Koller-Meier, E. & Van Gool, L. Learning Generative Models for Multi-Activity Body Pose Estimation. Int J Comput Vis 83, 121–134 (2009). https://doi.org/10.1007/s11263-008-0158-0

Download citation

Received: 30 January 2008
Accepted: 11 July 2008
Published: 31 July 2008
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11263-008-0158-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Generative Models for Multi-Activity Body Pose Estimation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Autoencoders and their applications in machine learning: a survey

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Generative Models for Multi-Activity Body Pose Estimation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Autoencoders and their applications in machine learning: a survey

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation