3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers

  • Rui Li
  • Tai-Peng Tian
  • Stan Sclaroff
  • Ming-Hsuan Yang
Open Access
Article

Abstract

A major challenge in applying Bayesian tracking methods for tracking 3D human body pose is the high dimensionality of the pose state space. It has been observed that the 3D human body pose parameters typically can be assumed to lie on a low-dimensional manifold embedded in the high-dimensional space. The goal of this work is to approximate the low-dimensional manifold so that a low-dimensional state vector can be obtained for efficient and effective Bayesian tracking. To achieve this goal, a globally coordinated mixture of factor analyzers is learned from motion capture data. Each factor analyzer in the mixture is a “locally linear dimensionality reducer” that approximates a part of the manifold. The global parametrization of the manifold is obtained by aligning these locally linear pieces in a global coordinate system. To enable automatic and optimal selection of the number of factor analyzers and the dimensionality of the manifold, a variational Bayesian formulation of the globally coordinated mixture of factor analyzers is proposed. The advantages of the proposed model are demonstrated in a multiple hypothesis tracker for tracking 3D human body pose. Quantitative comparisons on benchmark datasets show that the proposed method produces more accurate 3D pose estimates over time than those obtained from two previously proposed Bayesian tracking methods.

Keywords

3D human body tracking Particle filtering High-dimensional state space Variational methods 

References

  1. Agarwal, A., & Triggs, B. (2004). Tracking articulated motion with piecewise learned dynamical models. In Proceedings of the European conference on computer vision (ECCV) (Vol. 3, pp. 54–65). Google Scholar
  2. Balan, A., Sigal, L., & Black, M. (2005). A quantitative evaluation of video-based 3d person tracking. In IEEE workshop on VS-PETS (pp. 349–356). Google Scholar
  3. Beal, M. (2003). Variational algorithms for approximate Bayesian inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London. Google Scholar
  4. Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (NIPS) (pp. 585–591). Google Scholar
  5. Bishop, C., Svensén, M., & Williams, C. (1998). GTM: the generative topographic mapping. Neural Computation, 10(1), 215–234. CrossRefGoogle Scholar
  6. Brand, M. (2002). Charting a manifold. In Advances in neural information processing systems (NIPS) (pp. 961–968). Google Scholar
  7. Cham, T.-J., & Rehg, J. M. (1999). A multiple hypothesis approach to figure tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 239–245). Google Scholar
  8. Cheeseman, P., & Stutz, J. (1996). Bayesian classification (AutoClass: theory and results). In Advances in knowledge discovery and data mining (pp. 153–180). Google Scholar
  9. Choo, K., & Fleet, D. (2001). People tracking using hybrid Monte Carlo filtering. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 321–328). Google Scholar
  10. Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 126–133). Google Scholar
  11. Elgammal, A., & Lee, C.-S. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 681–688). Google Scholar
  12. Elgammal, A., & Lee, C.-S. (2009). Tracking people on a torus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 520–538. CrossRefGoogle Scholar
  13. Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers (Technical Report CRG-TR-96-1). University of Toronto. Google Scholar
  14. Ioffe, S., & Forsyth, D. (2001). Human tracking with mixtures of trees. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 690–695). Google Scholar
  15. Jefferys, W., & Berger, J. (1992). Ockham’s Razor and Bayesian analysis. American Scientist, 80, 64–72. Google Scholar
  16. Jenkins, O., & Matarić, M. (2004). A spatio-temporal extension to Isomap nonlinear dimensionality reduction. In Proceedings of the IEEE international conference on machine learning (ICML) (pp. 56–73). Google Scholar
  17. Ju, S. X., Black, M., & Yacoob, Y. (1996). Cardboard people: a parameterized model of articulated image motion. In International conference on automatic face and gesture recognition (pp. 38–44). Google Scholar
  18. Kass, R., & Raftery, A. (1995). Bayesian factors. Journal of the American Statistical Association, 90, 773–795. MATHCrossRefGoogle Scholar
  19. Lan, X., & Huttenlocher, D. (2004). A unified spatio-temporal articulated model for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 722–729). Google Scholar
  20. Lawrence, N. (2003). Gaussian process latent variable models for visualization of high dimensional data. In Advances in neural information processing systems (NIPS) (pp. 329–336). Google Scholar
  21. Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In Proceedings of the European conference on computer vision (ECCV) (Vol. 2, pp. 137–150). Google Scholar
  22. Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  23. Lin, R.-S., Liu, C.-B., Yang, M.-H., Ahuja, N., & Levinson, S. (2006). Learning nonlinear manifolds from time series. In Proceedings of the European conference on computer vision (ECCV) (Vol. 3, pp. 239–250). Google Scholar
  24. MacCormick, J., & Blake, A. (1999). A probabilistic exclusion principle for tracking multiple objects. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 572–578). Google Scholar
  25. MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–417. CrossRefGoogle Scholar
  26. MacKay, D. (1996). Bayesian non-linear modelling for the 1993 energy prediction competition. In G. Heidbreder (Ed.), Maximum entropy and Bayesian methods, Santa Barbara 1993 (pp. 221–234). Dordrecht: Kluwer. Google Scholar
  27. Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European conference on computer vision (ECCV) (pp. 666–680). Google Scholar
  28. Poppe, R. (2007a). Evaluating example-based pose estimation: experiments on the Humaneva sets. In Online proceedings of the workshop on evaluation of articulated human motion and pose estimation (EHuM) at the international conference on computer vision and pattern recognition (CVPR). Google Scholar
  29. Poppe, R. (2007b). Vision-based human motion analysis: an overview. Computer Vision and Image Understanding, 108, 4–18. CrossRefGoogle Scholar
  30. Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81. CrossRefGoogle Scholar
  31. Rasmussen, C. (2000). The infinite Gaussian mixture model. In Advances in neural information processing systems (NIPS) (pp. 554–560). Google Scholar
  32. Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with unknown number of components. Journal of the Royal Statistical Society, Series B, 59(4), 731–758. MATHCrossRefMathSciNetGoogle Scholar
  33. Roweis, R., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326. CrossRefGoogle Scholar
  34. Roweis, R., Saul, L., & Hinton, G. (2001). Global coordination of local linear models. In Advances in neural information processing systems (NIPS) (pp. 889–896). Google Scholar
  35. Safonova, A., Hodgins, J., & Pollard, N. (2004). Synthesizing physically realistic human motion in low dimensional, behavior-specific spaces. In ACM computer graphics (SIGGRAPH) (pp. 514–521). Google Scholar
  36. Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(1), 1299–1319. CrossRefGoogle Scholar
  37. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. MATHCrossRefMathSciNetGoogle Scholar
  38. Shakhnarovich, G., Viola, P., & Darrel, T. (2003). Fast pose estimation with parameter sensitive hashing. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 750–757). Google Scholar
  39. Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European conference on computer vision (ECCV) (pp. 702–718). Google Scholar
  40. Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 421–428). Google Scholar
  41. Sigal, L., & Black, M. (2006). HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University. Google Scholar
  42. Silva, V., & Tenenbaum, J. (2003). Global versus local methods in nonlinear dimensionality reduction. In Advances in neural information processing systems (NIPS) (pp. 705–712). Google Scholar
  43. Sminchisescu, C., & Jepson, A. (2004). Generative modelling for continuous non-linearly embedded visual inference. In Proceedings of the IEEE international conference on machine learning (ICML) (pp. 140–147). Google Scholar
  44. Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 447–454). Google Scholar
  45. Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (NIPS) (pp. 1259–1226). Google Scholar
  46. Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based esimator. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1063–1070). Google Scholar
  47. Sullivan, J., & Rittscher, J. (2001). Guiding random particles by deterministic search. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 323–330). Google Scholar
  48. Teh, W.-Y., & Roweis, S. (2002). Automatic alignment of local representations. In Advances in neural information processing systems (NIPS) (pp. 841–848). Google Scholar
  49. Tenenbaum, J., Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323. CrossRefGoogle Scholar
  50. Tian, T.-P., Li, R., & Sclaroff, S. (2005a). Articulated pose estimation in a learned smooth space of feasible solutions. In Learning workshop in conjunction with CVPR. Google Scholar
  51. Tian, T.-P., Li, R., & Sclaroff, S. (2005b). Tracking human body pose on a learned smooth space (Technical Report 2005-029). Boston University. Google Scholar
  52. Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 403–410). Google Scholar
  53. Urtasun, R., Fleet, D., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 238–245). Google Scholar
  54. Urtasun, R., Fleet, D., & Lawrence, N. (2008). Topologically-constrained latent variable models. In Proceedings of the IEEE international conference on machine learning (ICML). Google Scholar
  55. Verbeek, J. (2006). Learning non-linear image manifolds by combining local linear models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(10), 1864–1875. Google Scholar
  56. Wang, L., Hu, W., & Tan, T. (2003). Recent development in human motion analysis. Pattern Recognition, 36(3), 585–601. CrossRefGoogle Scholar
  57. Wang, J., Fleet, D., & Hertzman, A. (2008). Gaussian process and dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Rui Li
    • 1
  • Tai-Peng Tian
    • 1
  • Stan Sclaroff
    • 1
  • Ming-Hsuan Yang
    • 2
  1. 1.Computer Science DepartmentBoston UniversityBostonUSA
  2. 2.Electrical Engineering and Computer ScienceUniversity of CaliforniaMercedUSA

Personalised recommendations