International Journal of Computer Vision

, Volume 79, Issue 3, pp 247–269 | Cite as

Human Motion Tracking with a Kinematic Parameterization of Extremal Contours

Article

Abstract

This paper addresses the problem of human motion tracking from multiple image sequences. The human body is described by five articulated mechanical chains and human body-parts are described by volumetric primitives with curved surfaces. If such a surface is observed with a camera, an extremal contour appears in the image whenever the surface turns smoothly away from the viewer. We describe a method that recovers human motion through a kinematic parameterization of these extremal contours. The method exploits the fact that the observed image motion of these contours is a function of both the rigid displacement of the surface and of the relative position and orientation between the viewer and the curved surface. First, we describe a parameterization of an extremal-contour point velocity for the case of developable surfaces. Second, we use the zero-reference kinematic representation and we derive an explicit formula that links extremal contour velocities to the angular velocities associated with the kinematic model. Third, we show how the chamfer-distance may be used to measure the discrepancy between predicted extremal contours and observed image contours; moreover we show how the chamfer distance can be used as a differentiable multi-valued function and how the tracker based on this distance can be cast into a continuous non-linear optimization framework. Fourth, we describe implementation issues associated with a practical human-body tracker that may use an arbitrary number of cameras. One great methodological and practical advantage of our method is that it relies neither on model-to-image, nor on image-to-image point matches. In practice we model people with 5 kinematic chains, 19 volumetric primitives, and 54 degrees of freedom; We observe silhouettes in images gathered with several synchronized and calibrated cameras. The tracker has been successfully applied to several complex motions gathered at 30 frames/second.

Keywords

Articulated motion representation Human-body tracking Zero-reference kinematics Developable surfaces Extremal contours Chamfer distance Chamfer matching Multiple-camera motion capture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, A., & Triggs, W. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 28(1), 44–58. CrossRefGoogle Scholar
  2. Balan, A. O., Sigal, L., & Black, M. J. (2005). A quantitative evaluation of video-based 3D person tracking. In PETS’05 (pp. 349–356). Google Scholar
  3. Barrow, H. G., & Tenenbaum, J. M. (1981). Interpreting line drawings as three-dimensional surfaces. Artificial Intelligence, 17(1–3), 75–116. CrossRefGoogle Scholar
  4. Borgefors, G. (1986). Distance transformation in digital images. Computer Vision, Graphics, and Image Processing, 34(3), 344–371. CrossRefGoogle Scholar
  5. Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision, 56(3), 179–194. CrossRefGoogle Scholar
  6. Cheung, K. M., Baker, S., & Kanade, T. (2005a). Shape-from-silhouette across time, part I: theory and algorithms. International Journal of Computer Vision, 62(3), 221–247. CrossRefGoogle Scholar
  7. Cheung, K. M., Baker, S., & Kanade, T. (2005b). Shape-from-silhouette across time, part II: applications to human modeling and markerless motion tracking. International Journal of Computer Vision, 63(3), 225–245. CrossRefGoogle Scholar
  8. David, P., DeMenthon, D. F., Duraiswami, R., & Samet, H. (2004). Softposit: simultaneous pose and correspondence determination. International Journal of Computer Vision, 59(3), 259–284. CrossRefGoogle Scholar
  9. Delamarre, Q., & Faugeras, O. (2001). 3D articulated models and multi-view tracking with physical forces. Computer Vision and Image Understanding, 81(3), 328–357. MATHCrossRefGoogle Scholar
  10. Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In Computer vision and pattern recognition (pp. 2126–2133). Google Scholar
  11. Do Carmo, M. P. (1976). Differential geometry of curves and surfaces. New York: Prentice-Hall. MATHGoogle Scholar
  12. Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In ICCV (pp. 315–320). Google Scholar
  13. Felzenswalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79. CrossRefGoogle Scholar
  14. Forsyth, D. A., & Ponce, J. (2003). Computer vision—a modern approach. New Jersey: Prentice Hall. Google Scholar
  15. Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2006). Computational studies of human motion, part 1: tracking and motion synthesis. Foundations and Trends in Computer Graphics and Vision, 1(2), 77–254. CrossRefGoogle Scholar
  16. Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631. MATHCrossRefMathSciNetGoogle Scholar
  17. Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98. MATHCrossRefGoogle Scholar
  18. Gavrila, D. M., & Davis, L. S. (1996). 3D model-based tracking of humans in action: a multi-view approach. In Conference on computer vision and pattern recognition (pp. 73–80), San Francisco, CA. Google Scholar
  19. Gavrila, D. M., & Philomin, V. (1999). Real-time object detection for smart vehicles. In IEEE Proceedings of the seventh international conference on computer vision (pp. 87–93), Kerkyra, Greece. Google Scholar
  20. Gleicher, G., & Ferrier, N. (2002). Evaluating video-based motion capture. In Proceedings of the computer animation 2002 (pp. 75–80), Geneva, Switzerland, June 2002. Google Scholar
  21. Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. J. (1993). Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863. CrossRefGoogle Scholar
  22. Kakadiaris, I., & Metaxas, D. (2000). Model-based estimation of 3D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1453–1459. CrossRefGoogle Scholar
  23. Kehl, R., & Van Gool, L. J. (2006). Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 103(23), 190–209. CrossRefGoogle Scholar
  24. Knossow, D., Ronfard, R., Horaud, R., & Devernay, F. (2006). Tracking with the kinematics of extremal contours. In Lecture notes in computer science. Computer vision—ACCV 2006 (pp. 664–673), Hyderabad, India, January 2006. Berlin: Springer. CrossRefGoogle Scholar
  25. Koenderink, J. (1990). Solid shape. Cambridge: The MIT Press. Google Scholar
  26. Kreyzig, E. (1991). Differential geometry. New York: Dover. Reprint of a U. of Toronto 1963 edition. Google Scholar
  27. Martin, F., & Horaud, R. (2002). Multiple camera tracking of rigid objects. International Journal of Robotics Research, 21(2), 97–113. CrossRefGoogle Scholar
  28. McCarthy, J. M. (1990). Introduction to theoretical kinematics. Cambridge: MIT Press. Google Scholar
  29. Mikic, I., Trivedi, M. M., Hunter, E., & Cosman, P. C. (2003). Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3), 199–223. CrossRefGoogle Scholar
  30. Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126. CrossRefGoogle Scholar
  31. Mooring, B. W., Roth, Z. S., & Driels, M. R. (1991). Fundamentals of manipulator calibration. New York: Wiley. Google Scholar
  32. Murray, R. M., Li, Z., & Sastry, S. S. (1994). A mathematical introduction to robotic manipulation. Ann Arbor: CRC Press. MATHGoogle Scholar
  33. Plaenkers, R., & Fua, P. (2003). Articulated soft objects for multi-view shape and motion capture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1182–1187. CrossRefGoogle Scholar
  34. Ronfard, R., Schmid, C., & Triggs, W. (2002). Learning to parse pictures of people. In Proceedings of the 7th European conference on computer vision (Vol. 4, pp. 700–714), Copenhagen, Denmark, June 2002. Berlin: Springer. Google Scholar
  35. Sigal, L., & Black, M. J. (2006). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Department of Computer Science, Brown University, Providence, RI 02912, September 2006. Google Scholar
  36. Sim, D. G., Kwon, O. K., & Park, R. H. (1999). Object matching algorithms using robust Hausdorff distance measures. IEEE Transactions on Image Processing, 8(3), 425–429. CrossRefGoogle Scholar
  37. Sminchisescu, C., & Triggs, W. (2003). Kinematic jump processes for monocular 3D human tracking. In International conference on computer vision and pattern recognition (Vol. I, pp. 69–76), June 2003. Google Scholar
  38. Sminchisescu, C., & Triggs, W. (2005). Building roadmaps of minima and transitions in visual models. International Journal of Computer Vision, 61(1), 81–101. CrossRefGoogle Scholar
  39. Song, Y., Goncalves, L., & Perona, P. (2003). Unsupervised learning of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 814–827. CrossRefGoogle Scholar
  40. Toyama, K., & Blake, A. (2002). Probabilistic tracking with exemplars in a metric space. International Journal of Computer Vision, 48(1), 9–19. MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.INRIA Rhône-AlpesMontbonnot Saint-MartinFrance

Personalised recommendations