Skip to main content

Advertisement

Log in

Learning Generative Models for Multi-Activity Body Pose Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, A., & Triggs, B. (2004a). 3D human pose from silhouettes by relevance vector regression. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Agarwal, A., & Triggs, B. (2004b). Tracking articulated motion using a mixture of autoregressive models. In European conference on computer vision (ECCV).

  • Agarwal, A., & Triggs, B. (2005). Monocular human motion capture with a mixture of regressors. In IEEE CVPR workshop on vision for human-computer interaction.

  • Bailey, D. G. (2004). An efficient euclidean distance transform. In International workshop on combinatorial image analysis.

  • Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society.

  • Doucet, A., Godsill, S., & Andrieu, C. (2000a). On sequentional Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing.

  • Doucet, A., Godsill, S., & West, M. (2000b). Monte Carlo filtering and smoothing with application to time-varying spectral estimation. In IEEE conference on acoustics, speech and signal processing (vol. II, pp. 701–704).

  • Elgammal, A., & Lee, C.-S. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Forney, G. D. (1973). The Viterbi algorithm. Proceedings of the IEEE, 61(3), 268–278.

    Article  MathSciNet  Google Scholar 

  • Forsyth, D. A., Arikan, O., Ikemoto, L., Brien, J. O., & Ramanan, D. (2006). Computational studies of human motion: Part 1. Computer Graphics and Vision, 1(2/3).

  • Grauman, K., Shakhnarovich, G., & Darrel, T. (2003). Inferring 3D structure with a statistical image-based shape model. International conference on computer vision (ICCV).

  • Isard, M. (2003). Pampas: Real-valued graphical models for computer vision. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Isard, M., & Blake, A. (1998a). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28.

    Article  Google Scholar 

  • Isard, M., & Blake, A. (1998b). A mixed-state CONDENSATION tracker with automatic model-switching. In International conference on computer vision (ICCV) (pp. 107–112).

  • Jaeggli, T., Koller-Meier, E., & Gool, L. V. (2006). Monocular tracking with a mixture of view-dependent learned models. In IV conference on articulated motion and deformable objects (AMDO).

  • Kschischang, F., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.

    Article  MATH  MathSciNet  Google Scholar 

  • Lawrence, N. D. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.

    MathSciNet  Google Scholar 

  • Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifolds for tracking. In International conference on computer vision (ICCV).

  • Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV) (pp. 137–150).

  • Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of non-linear manifold and dynamical models for high-dimensional time series. In International conference on computer vision (ICCV).

  • Lim, H., Camps, O. I., Sznaier, M., & Morariu, V. I. (2006). Dynamic appearance modeling for human tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 751–757).

  • Moeslund, T., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.

    Article  Google Scholar 

  • Navaratnam, R., Fitzgibbon, A. W., & Cipolla, R. (2007). The joint manifold model for semi-supervised multi-valued regression. In International conference on computer vision (ICCV).

  • Pavlovic, V., Rehg, J. M., & MacCormick, J. (2001). Learning switching linear models of human motion. In Neural information processing systems.

  • Rosales, R., & Sclaroff, S. (2001). Learning body pose via specialized maps. In Neural information processing systems.

  • Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  • Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV) (pp. 702–718).

  • Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Sminchisescu, C., & Jepson, A. (2004). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (ICML).

  • Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Sudderth, E. B., Ihler, A. T., Freeman, W. T., & Willsky, A. S. (2003). Nonparametric belief propagation. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Sun, Y., Bray, M., Thayananthan, A., Yuanand, B., & Torr, P. (2006). Regression-based human motion capture from voxel data. In British machine vision conference.

  • Tenenbaum, J., de Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Article  Google Scholar 

  • Thayananthan, A., Navaratnam, R., Stenger, B., Torr, P., & Cipolla, R. (2006). Multivariate relevance vector machines for tracking. In European conference on computer vision (ECCV).

  • Tipping, M. (2000). The relevance vector machine. In Neural information processing systems.

  • Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 238–245).

  • Wang, J. M., Fleet, D. J., & Hertzmann, A. (2006). Gaussian process dynamical models. In Neural information processing systems (pp. 1441–1448).

  • Wiberg, N. (1996). Codes and decoding on general graphs. PhD thesis, Department of Electrical Engineering, Linköping University, Sweden.

  • Yedidia, J., Freeman, W., & Weiss, Y. (2002). Understanding belief propagation and its generalizations (Technical report TR-2001-22). MERL.

  • Zivkovic, Z., & Verbeek, J. (2006). Transformation invariant component analysis for binary images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 254–259).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Jaeggli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaeggli, T., Koller-Meier, E. & Van Gool, L. Learning Generative Models for Multi-Activity Body Pose Estimation. Int J Comput Vis 83, 121–134 (2009). https://doi.org/10.1007/s11263-008-0158-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0158-0

Keywords

Navigation