Advertisement

International Journal of Computer Vision

, Volume 53, Issue 3, pp 199–223 | Cite as

Human Body Model Acquisition and Tracking Using Voxel Data

  • Ivana Mikić
  • Mohan Trivedi
  • Edward Hunter
  • Pamela Cosman
Article

Abstract

We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouettes. These reconstructions are then used as input to the model acquisition and tracking algorithms.

The human body model consists of ellipsoids and cylinders and is described using the twists framework resulting in a non-redundant set of model parameters. Model acquisition starts with a simple body part localization procedure based on template fitting and growing, which uses prior knowledge of average body part shapes and dimensions. The initial model is then refined using a Bayesian network that imposes human body proportions onto the body part size estimates. The tracker is an extended Kalman filter that estimates model parameters based on the measurements made on the labeled voxel data. A voxel labeling procedure that handles large frame-to-frame displacements was designed resulting in very robust tracking performance.

Extensive evaluation shows that the system performs very reliably on sequences that include different types of motion such as walking, sitting, dancing, running and jumping and people of very different body sizes, from a nine year old girl to a tall adult male.

human body model acquisition motion capture pose estimation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bregler, C. 1997. Learning and recognizing human dynamics in video sequences, IEEE International Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico.Google Scholar
  2. Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps, IEEE International Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA.Google Scholar
  3. Cheung, G., Kanade, T., Bouguet, J., and Holler, M. 2000. A real time system for robust 3D voxel reconstruction of human motions. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, vol.2, pp. 714– 720.Google Scholar
  4. Covell, M., Rahimi, A., Harville, M., and Darrell, T. 2000. Articulated-pose estimation using brightness-and depthconstancy constraints. In IEEE Int. Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, pp.438– 445.Google Scholar
  5. Delamarre, Q. and Faugeras, O. 2001. 3D articulated models and multi-view tracking with physical forces, The special issue of the CVIU journal on modeling people, 81(3):328–357.Google Scholar
  6. Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering, IEEE Int. Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC.Google Scholar
  7. Deutscher, J., Davison, A., and Reid, I. 2001. Automatic partitioning of high dimensional search spaces associated with articulated body motion capture, IEEE Int. Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.Google Scholar
  8. DiFranco, D., Cham, T., and Rehg, J. 2001. Reconstruction of 3D figure motion from 2D correspondences. In IEEE Int. Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.Google Scholar
  9. Gavrila, D. 1999. Visual analysis of human movement: A survey. Computer Vision and Image Understanding,73(1):82– 98.Google Scholar
  10. Gavrila, D. and Davis, L. 1996. 3D model-based tracking of humans in action: A multi-view approach. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 73–80.Google Scholar
  11. Hilton, A. 1999. Towards model-based capture of persons shape, appearance and motion. In International Workshop on Modeling People at ICCV'99, Corfu, Greece.Google Scholar
  12. Horprasert, T., Harwood, D., and Davis, L.S. 1999. A statistical approach for real-time robust background subtraction and shadow detection. In Proc. IEEE ICCV'99 FRAME-RATE Workshop, Kerkyra, Greece.Google Scholar
  13. Howe, N., Leventon, M., and Freeman, W. 1999. Bayesian reconstruction of 3D human motion from single-camera video. In Neural Information Processing Systems, Denver, Colorado.Google Scholar
  14. Hunter, E. 1999. Visual estimation of articulated motion using the expectation-constrained maximization algorithm, Ph.D. Dissertation, University of California, San Diego.Google Scholar
  15. Hunter, E., Kelly, P., and Jain, R. 1997. Estimation of articulated motion using kinematically constrained mixture densities.In IEEE Nonrigid and Articulated Motion Workshop, San Juan, Puerto Rico.Google Scholar
  16. Ioffe, S. and Forsyth, D. 2001. Human tracking with mixtures of trees. In IEEE International Conference on Computer Vision, Vancouver, Canada.Google Scholar
  17. Isard, M. and Blake, A. 1996. Visual tracking by stochastic propagation of conditional density. In Proc. 4th European Conference on Computer Vision, Cambridge, England.Google Scholar
  18. Jojić, N., Turk, M., and Huang, T. 1999. Tracking self-occluding articulated objects in dense disparity maps. In IEEE Int. Conference on Computer Vision. Corfu, Greece.Google Scholar
  19. Jung, S. and Wohn, K. 1997. Tracking and motion estimation of the articulated object: A hierarchical Kalman filter approach, Real-Time Imaging, 3:415–432.Google Scholar
  20. Kakadiaris, I. and Metaxas, D. 1996. Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, San Francisco, CA.Google Scholar
  21. Kakadiaris, I. and Metaxas, D. 1998. Three-dimensional human body model acquisition from multiple views, International Journal of Computer Vision, 30(3):191–218.Google Scholar
  22. Metaxas, D. and Terzopoulos, D. 1993. Shape and nonrigid motion estimation through physics-based synthesis, IEEE Trans. Pattern Analysis and Machine Intelligence, 15(6):580–591.Google Scholar
  23. Mikić, I. 2002. Human body model acquisition and tracking using multi-camera voxel data, Ph.D. Dissertation, University of California, San Diego.Google Scholar
  24. Mikić I., Trivedi, M., Hunter, E., and Cosman, P. 2001. Articulated body posture estimation from multi-camera voxel data. In IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.Google Scholar
  25. Moeslund, T. and Granum, E. 2001. A survey of computer visionbased human motion capture, Computer Vision and Image Understanding,81:231–268.Google Scholar
  26. Murray, R., Li, Z., and Sastry, S. 1993. A mathematical introduction to robotic manipulation, CRC Press.Google Scholar
  27. Plankers, R. and Fua, P. 1999. Articulated soft objects for video-based body modeling. In InternationalWorkshop on Modeling People at ICCV'99, Corfu, Greece.Google Scholar
  28. Plankers, R. and Fua, P. 2001. Tracking and modeling people in video sequences, Computer Vision and Image Understanding, 81:285– 302.Google Scholar
  29. Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. 1993. Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press.Google Scholar
  30. Rehg, J. and Kanade, T. 1995. Model-based tracking of selfoccluding articulated objects. In IEEE International Conference on Computer Vision, Cambridge.Google Scholar
  31. Sminchiescu, C. and Triggs, B. 2001. Covariance scaled sampling for monocular 3D body tracking. InIEEE International Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.Google Scholar
  32. Szeliski, R. 1993. Rapid octree construction from image sequences, CVGIP: Image Understanding, 58(1):23–32.Google Scholar
  33. Tsai, R. 1987. A versatile camera calibration technique for highaccuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses, IEEE Journal of Robotics and Automation, RA-3(4):323–344.Google Scholar
  34. Wachter, S. and Nagel, H.1999. Tracking persons in monocular image sequences, Computer Vision and Image Understanding, 74(3):174–192.Google Scholar
  35. Wren, C. 2000. Understanding expressive action, Ph.D. Dissertation, Massachusetts Institute of Technology.Google Scholar
  36. Yamamoto, M., Sato, A., Kawada, S., Kondo, T., and Osaki, Y.1998. Incremental tracking of human actions from multiple views,IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Ivana Mikić
    • 1
  • Mohan Trivedi
    • 2
  • Edward Hunter
    • 1
  • Pamela Cosman
    • 2
  1. 1.Q3DM, Inc.San DiegoUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of California, San DiegoLa JollaUSA

Personalised recommendations