Coupled Visual and Kinematic Manifold Models for Tracking

Article

Abstract

In this paper, we consider modeling data lying on multiple continuous manifolds. In particular, we model the shape manifold of a person performing a motion observed from different viewpoints along a view circle at a fixed camera height. We introduce a model that ties together the body configuration (kinematics) manifold and visual (observations) manifold in a way that facilitates tracking the 3D configuration with continuous relative view variability. The model exploits the low-dimensionality nature of both the body configuration manifold and the view manifold, where each of them are represented separately. The resulting representation is used for tracking complex motions within a Bayesian framework, in which the model provides a low-dimensional state representation as well as a constrained dynamic model for both body configuration and view variations. Experimental results estimating the 3D body posture from a single camera are presented for the HUMANEVA dataset and other complex motion video sequences.

Keywords

Visual manifold Human motion tracking Kinematic manifold Manifold learning Bayesian tracking Pose estimation 

Supplementary material

Below is the link to the electronic supplementary material. (WMV 802 KB)

Below is the link to the electronic supplementary material. (WMV 3.24 MB)

Below is the link to the electronic supplementary material. (WMV 2.32 MB)

References

  1. Aggarwal, J. K., & Cai, Q. (1999). Human motion analysis: a review. Computer Vision and Image Understanding, 73(3), 428–440. http://dx.doi.org/10.1006/cviu.1998.0744. CrossRefGoogle Scholar
  2. Agarwal, A., & Triggs, B. (2004). 3D human pose from silhuettes by relevance vector regression. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 882–888). Google Scholar
  3. Brand, M. (1999). Shadow puppetry. In Proceedings of the international conference on computer vision (ICCV) (Vol. 2, pp. 1237–1244). Google Scholar
  4. Campbell, L. W., & Bobick, A. F. (1995). Recognition of human body motion using phase space constraints. In Proceedings of the international conference on computer vision (ICCV) (p. 624). Google Scholar
  5. Christoudias, C. M., & Darrell, T. (2005). On modelling nonlinear shape-and-texture appearance manifolds. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 1067–1074). Google Scholar
  6. Darrell, T., & Pentland, A. (1993). Space-time gesture. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 335–340). Google Scholar
  7. Elgammal, A., & Lee, C. S. (2004a). Inferring 3D body pose from silhouettes using activity manifold learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 681–688). Google Scholar
  8. Elgammal, A., & Lee, C. S. (2004b). Separating style and content on a nonlinear manifold. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 478–485). Google Scholar
  9. Elgammal, A., & Lee, C. S. (2007). Nonlinear manifold learning for dynamic shape and dynamic appearance. Computer Vision and Image Understanding, 106(1), 31–46. CrossRefGoogle Scholar
  10. Elgammal, A., & Lee, C. S. (2009). Tracking people on a torus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 520–538. CrossRefGoogle Scholar
  11. Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98. http://dx.doi.org/10.1006/cviu.1998.0716. MATHCrossRefGoogle Scholar
  12. Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 73–80). Google Scholar
  13. Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In Proceedings of the international conference on computer vision (ICCV) (p. 641). Google Scholar
  14. Hogg, D. (1983). Model-based vision: a program to see a walking person. Image and Vision Computing, 1(1), 5–20. CrossRefGoogle Scholar
  15. Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 81–87). Google Scholar
  16. Lathauwer, L. D., de Moor, B., & Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4), 1253–1278. MATHCrossRefMathSciNetGoogle Scholar
  17. Lawrence, N. D. (2004). Gaussian process models for visualisation of high dimensional data. In Proceedings of advances in neural information processing (NIPS). Google Scholar
  18. Lee, C. S., & Elgammal, A. (2005). Homeomorphic manifold analysis: Learning decomposable generative models for human motion analysis. In Workshop on dynamical vision. Google Scholar
  19. Lee, C. S., & Elgammal, A. (2006). Simultaneous inference of view and body pose using torus manifolds. In Proceedings of the international conference on pattern recognition (ICPR) (pp. 489–494). Google Scholar
  20. Li, R., Tian, T. P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamic models for high-dimensional time series. In ICCV 2007 (pp. 1–8). Google Scholar
  21. Lin, R. S., Liu, C. B., Yang, M. H., Ahuja, N., & Levinson, S. (2006). Learning nonlinear manifolds from time series. In Proceedings of the European conference on computer vision (ECCV) (pp. 245–256). Google Scholar
  22. Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley. MATHGoogle Scholar
  23. Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126. CrossRefGoogle Scholar
  24. Moon, K., & Pavlovic, V. (2006). Impact of dynamics on subspace embedding and tracking of sequences. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 198–205). Google Scholar
  25. Morariu, V. I., & Camps, O. I. (2006). Modeling correspondences for multi-camera tracking using nonlinear manifold learning and target dynamics. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 545–552). Google Scholar
  26. Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European conference on computer vision (ECCV) (pp. 666–680). Google Scholar
  27. Murase, H., & Nayar, S. (1995). Visual learning and recognition of 3D objects from appearance. International Journal of Computer Vision, 14(1), 5–24. CrossRefGoogle Scholar
  28. O’Rourke, J. (1980). Badler: model-based image analysis of human motion using constraint propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 522–536. Google Scholar
  29. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497. CrossRefGoogle Scholar
  30. Rahimi, A., Recht, B., & Darrell, T. (2005). Learning appearance manifolds from video. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 868–875). Google Scholar
  31. Rehg, J. M., & Kanade, T. (1995). Model-based tracking of self-occluding articulated objects. In Proceedings of the international conference on computer vision (ICCV) (pp. 612–617). Google Scholar
  32. Rohr, K. (1994). Towards model-based recognition of human movements in image sequence. Computer Vision, Graphics, and Image Processing, 59(1), 94–115. CrossRefGoogle Scholar
  33. Rosales, R., Athitsos, V., & Sclaroff, S. (2001). 3D hand pose reconstruction using specialized mappings. In Proceedings of the international conference on computer vision (ICCV) (pp. 378–387). Google Scholar
  34. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326. CrossRefGoogle Scholar
  35. Schlkopf, B., & Smola, A. (2002). Learning with Kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press. Google Scholar
  36. Shakhnarovich, G., Fisher, J. W., & Darrell, T. (2002). Face recognition from long-term observations. In Proceedings of the European conference on computer vision (ECCV) (pp. 851–865). Google Scholar
  37. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Proceedings of the international conference on computer vision (ICCV) (pp. 750–759). Google Scholar
  38. Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000). Stochastic tracking of 3D human figures using 2d image motion. In Proceedings of the European conference on computer vision (ECCV) (pp. 702–718). Google Scholar
  39. Sigal, L., & Black, M. J. (2006). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University. Google Scholar
  40. Sminchisescu, C., & Jepson, A. (2004). Generative modeling of continuous non-linearly embedded visual inference. In Proceedings of the international conference on machine learning (ICML) (pp. 140–147). Google Scholar
  41. Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. N. (2005). Discriminative density propagation for 3D human motion estimation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 390–397). Google Scholar
  42. Tenenbaum, J. B., & Freeman, W. T. (2000). Separating style and content with bilinear models. Neural Computation, 12, 1247–1283. CrossRefGoogle Scholar
  43. Tenenbaum, J., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323. CrossRefGoogle Scholar
  44. Tian, T. P., Li, R., & Sclaroff, S. (2005). Articulated pose estimation in a learned smooth space of feasible solutions. In Workshop on learning in computer vision and pattern recognition. Google Scholar
  45. Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In Proceedings of the international conference on computer vision (ICCV) (pp. 403–410). Google Scholar
  46. Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 238–245). Google Scholar
  47. Vasilescu, M. A. O. (2002). Human motion signatures: analysis, synthesis, recognition. In Proceedings of the international conference on pattern recognition (ICPR) (Vol. 3, pp. 456–460). Google Scholar
  48. Vasilescu, M. A. O., & Terzopoulos, D. (2002). Multilinear analysis of image ensembles: tensorfaces. In Proceedings of the European conference on computer vision (ECCV) (pp. 447–460). Google Scholar
  49. Wang, J., Fleet, D. J., & Hertzmann, A. (2005). Gaussian process dynamical models. In Proceedings of advances in neural information processing (NIPS). Google Scholar
  50. Yacoob, Y., & Black, M. J. (1999). Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, 73(2), 232–247. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Electronic Engineering, School of Electronic Engineering, Communication Engineering and Computer ScienceYeungnam UniversityGyeongsanSouth Korea
  2. 2.Department of Computer ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations