Skip to main content
Log in

Human Body Model Acquisition and Tracking Using Voxel Data

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouettes. These reconstructions are then used as input to the model acquisition and tracking algorithms.

The human body model consists of ellipsoids and cylinders and is described using the twists framework resulting in a non-redundant set of model parameters. Model acquisition starts with a simple body part localization procedure based on template fitting and growing, which uses prior knowledge of average body part shapes and dimensions. The initial model is then refined using a Bayesian network that imposes human body proportions onto the body part size estimates. The tracker is an extended Kalman filter that estimates model parameters based on the measurements made on the labeled voxel data. A voxel labeling procedure that handles large frame-to-frame displacements was designed resulting in very robust tracking performance.

Extensive evaluation shows that the system performs very reliably on sequences that include different types of motion such as walking, sitting, dancing, running and jumping and people of very different body sizes, from a nine year old girl to a tall adult male.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bregler, C. 1997. Learning and recognizing human dynamics in video sequences, IEEE International Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico.

  • Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps, IEEE International Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA.

  • Cheung, G., Kanade, T., Bouguet, J., and Holler, M. 2000. A real time system for robust 3D voxel reconstruction of human motions. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, vol.2, pp. 714– 720.

    Google Scholar 

  • Covell, M., Rahimi, A., Harville, M., and Darrell, T. 2000. Articulated-pose estimation using brightness-and depthconstancy constraints. In IEEE Int. Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, pp.438– 445.

  • Delamarre, Q. and Faugeras, O. 2001. 3D articulated models and multi-view tracking with physical forces, The special issue of the CVIU journal on modeling people, 81(3):328–357.

    Google Scholar 

  • Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering, IEEE Int. Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC.

  • Deutscher, J., Davison, A., and Reid, I. 2001. Automatic partitioning of high dimensional search spaces associated with articulated body motion capture, IEEE Int. Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.

  • DiFranco, D., Cham, T., and Rehg, J. 2001. Reconstruction of 3D figure motion from 2D correspondences. In IEEE Int. Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.

  • Gavrila, D. 1999. Visual analysis of human movement: A survey. Computer Vision and Image Understanding,73(1):82– 98.

    Google Scholar 

  • Gavrila, D. and Davis, L. 1996. 3D model-based tracking of humans in action: A multi-view approach. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 73–80.

  • Hilton, A. 1999. Towards model-based capture of persons shape, appearance and motion. In International Workshop on Modeling People at ICCV'99, Corfu, Greece.

  • Horprasert, T., Harwood, D., and Davis, L.S. 1999. A statistical approach for real-time robust background subtraction and shadow detection. In Proc. IEEE ICCV'99 FRAME-RATE Workshop, Kerkyra, Greece.

  • Howe, N., Leventon, M., and Freeman, W. 1999. Bayesian reconstruction of 3D human motion from single-camera video. In Neural Information Processing Systems, Denver, Colorado.

  • Hunter, E. 1999. Visual estimation of articulated motion using the expectation-constrained maximization algorithm, Ph.D. Dissertation, University of California, San Diego.

    Google Scholar 

  • Hunter, E., Kelly, P., and Jain, R. 1997. Estimation of articulated motion using kinematically constrained mixture densities.In IEEE Nonrigid and Articulated Motion Workshop, San Juan, Puerto Rico.

  • Ioffe, S. and Forsyth, D. 2001. Human tracking with mixtures of trees. In IEEE International Conference on Computer Vision, Vancouver, Canada.

  • Isard, M. and Blake, A. 1996. Visual tracking by stochastic propagation of conditional density. In Proc. 4th European Conference on Computer Vision, Cambridge, England.

  • Jojić, N., Turk, M., and Huang, T. 1999. Tracking self-occluding articulated objects in dense disparity maps. In IEEE Int. Conference on Computer Vision. Corfu, Greece.

  • Jung, S. and Wohn, K. 1997. Tracking and motion estimation of the articulated object: A hierarchical Kalman filter approach, Real-Time Imaging, 3:415–432.

    Google Scholar 

  • Kakadiaris, I. and Metaxas, D. 1996. Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, San Francisco, CA.

  • Kakadiaris, I. and Metaxas, D. 1998. Three-dimensional human body model acquisition from multiple views, International Journal of Computer Vision, 30(3):191–218.

    Google Scholar 

  • Metaxas, D. and Terzopoulos, D. 1993. Shape and nonrigid motion estimation through physics-based synthesis, IEEE Trans. Pattern Analysis and Machine Intelligence, 15(6):580–591.

    Google Scholar 

  • Mikić, I. 2002. Human body model acquisition and tracking using multi-camera voxel data, Ph.D. Dissertation, University of California, San Diego.

    Google Scholar 

  • Mikić I., Trivedi, M., Hunter, E., and Cosman, P. 2001. Articulated body posture estimation from multi-camera voxel data. In IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.

  • Moeslund, T. and Granum, E. 2001. A survey of computer visionbased human motion capture, Computer Vision and Image Understanding,81:231–268.

    Google Scholar 

  • Murray, R., Li, Z., and Sastry, S. 1993. A mathematical introduction to robotic manipulation, CRC Press.

  • Plankers, R. and Fua, P. 1999. Articulated soft objects for video-based body modeling. In InternationalWorkshop on Modeling People at ICCV'99, Corfu, Greece.

  • Plankers, R. and Fua, P. 2001. Tracking and modeling people in video sequences, Computer Vision and Image Understanding, 81:285– 302.

    Google Scholar 

  • Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. 1993. Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press.

  • Rehg, J. and Kanade, T. 1995. Model-based tracking of selfoccluding articulated objects. In IEEE International Conference on Computer Vision, Cambridge.

  • Sminchiescu, C. and Triggs, B. 2001. Covariance scaled sampling for monocular 3D body tracking. InIEEE International Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii.

  • Szeliski, R. 1993. Rapid octree construction from image sequences, CVGIP: Image Understanding, 58(1):23–32.

    Google Scholar 

  • Tsai, R. 1987. A versatile camera calibration technique for highaccuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses, IEEE Journal of Robotics and Automation, RA-3(4):323–344.

    Google Scholar 

  • Wachter, S. and Nagel, H.1999. Tracking persons in monocular image sequences, Computer Vision and Image Understanding, 74(3):174–192.

    Google Scholar 

  • Wren, C. 2000. Understanding expressive action, Ph.D. Dissertation, Massachusetts Institute of Technology.

  • Yamamoto, M., Sato, A., Kawada, S., Kondo, T., and Osaki, Y.1998. Incremental tracking of human actions from multiple views,IEEE International Conference on Computer Vision and Pattern Recognition.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mikić, I., Trivedi, M., Hunter, E. et al. Human Body Model Acquisition and Tracking Using Voxel Data. International Journal of Computer Vision 53, 199–223 (2003). https://doi.org/10.1023/A:1023012723347

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023012723347

Navigation