Skip to main content

Shape-From-Silhouette Across Time Part I: Theory and Algorithms

Abstract

Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two human-related applications: (1) the acquisition of detailed human kinematic models and (2) marker-less motion tracking.

This is a preview of subscription content, access via your institution.

References

  1. Aggarwal, J., Cai, Q., Liao, W., and Sabata, B. 1994. Articulated and elastic non-rigid motion: A review. In Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects’94, pp. 16–22.

  2. Ahuja, N. and Veenstra, J. 1989. Generating octrees from object silhouettes in orthographic views. IEEE Transactions Pattern Analysis and Machine Intelligence, 11(2):137–149.

    Article  Google Scholar 

  3. Baumgart, B.G. 1974. Geometric modeling for computer vision. Ph.D. thesis, Stanford University.

  4. Besl, P. and McKay, N. 1992. A method of registration of 3D shapes. IEEE Transaction on Pattern Analysis and Machine Intelligence, 14(2):239–256.

    Article  Google Scholar 

  5. Bottino, A. and Laurentini, A. 2000. Non-intrusive silhouette based motion capture. In Proceedings of the Fourth World Multiconference on Systemics, Cybernetics and Informatics SCI 2001, pp. 23–26.

  6. Buehler, C., Matusik, W., McMillan, L., and Gortler, S. 1999. Creating and rendering image-based visual hulls. Technical Report MIT-LCS-TR-780, MIT.

  7. Buehler, C., Matusik, W., and McMillan, L. 2001. Polyhedral visual hulls for real-time rendering. In Proceedings of the 12th Eurographics Workshop on Rendering.

  8. Cheung, G., Baker, S., and Kanade, T. 2003. Visual hull alignment and refinement across time:a 3D reconstruction algorithm combining shape-frame-silhouette with stereo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’03), Madison, MI.

  9. Cheung, G. 2003. Visual Hull Construction, Alignment and Refinement for Human Kinematic Modeling, Motion Tracking and Rendering. Ph.D. thesis, Carnegie Mellon University.

  10. Delamarre, Q. and Faugeras, O. 1999. 3D articulated models and multi-view tracking with silhouettes. In Proceedings of International Conference on Computer Vision (ICCV’99), Corfu, Greece.

  11. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of Statistical Society, B 39:1–38.

    Google Scholar 

  12. Dennis, J. and Schnabel, R. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  13. Irani, M., Hassner, T., and Anandan, P. 2002. What does the scene look like from a scene point? In Proceedings of European Conference on Computer Vision (ECCV’02), Copenhagen, Denmark. pp. 883–897.

  14. Jain, A. 1989. Fundamentals of Digital Image Processing. Prentice Hall.

  15. Joshi, T., Ahuja, N., and Ponce, J. 1994. Towards structure and motion estimation from dynamic silhouettes. In Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects, pp. 166–171.

  16. Joshi, T., Ahuja, N., and Ponce, J. 1995. Structure and motion estimation from dynamic silhouettes under perspective projection. Technical Report UIUC-BI-AI-RCV-95-02, University of Illinois Urbana Champaign.

  17. Kakadiaris, I. and Metaxas, D. 1998. 3D human body model acquisition from multiple views. International Journal on Computer Vision, 30(3):191–218.

    Article  Google Scholar 

  18. Ke, Q. and Kanade, T. 2001. A subspace approach to layer extraction. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai, HI.

  19. Kim, Y. and Aggarwal, J. 1986. Rectangular parallelepiped coding: A volumetric representation of three dimensional objects. IEEE Journal of Robotics and Automation, RA-2:127–134.

    Google Scholar 

  20. Krahnstoever, N., Yeasin, M., and Sharma, R. 2001. Automatic acquisition and initialization of kinematic models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Technical Sketches, Kauai, HI.

  21. Krahnstoever, N., Yeasin, M., and Sharma, R. 2003. Automatic acquisition and initialization of articulated models. In To appear in Machine Vision and Applications (to accepted).

  22. Kurazume, R., Nishino, K., Zhang, Z., and Ikeuchi, K. 2002. Simultaneous 2D images and 3D geometric model registration for texture mapping utilizing reflectance attribute. In Proceedings of Asian Conference on Computer Vision (ACCV’02), vol. 1, pp. 99–106.

  23. Kutulakos, K. and Seitz, S. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199–218.

    Article  Google Scholar 

  24. Laurentini, A. 1991. The visual hull: A new tool for contour-based image understanding. In Proceedings of the Seventh Scandinavian Conference on Image Analysis, pp. 993–1002.

  25. Laurentini, A. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions Pattern Analysis and Machine Intelligence, 16(2):150–162.

    Article  Google Scholar 

  26. Laurentini, A. 1995. How far 3D shapes can be understood from 2D silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(2):188–195.

    Article  Google Scholar 

  27. Laurentini, A. 1999. The visual hull of curved objects. In Proceedings of International Conference on Computer Vision (ICCV’99), Corfu, Greece.

  28. Lazebnik, S., Boyer, E., and Ponce, J. 2001. On computing exact visual hulls of solids bounded by smooth surfaces. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai HI.

  29. Martin, W. and Aggarwal, J. 1983. Volumetric descriptions of objects from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):150–174.

    Google Scholar 

  30. Matusik, W. 2001. Image-based visual hulls. Master’s thesis, Massachusetts Institute of Technology.

  31. Matusik, W., Buehler, C., Raskar, R., Gortler, S., and McMillan, L. 2000. Image-based visual hulls. In Computer Graphics Annual Conference Series (SIGGRAPH’00), New Orleans, LA.

  32. Mendonca, P., Wong, K., and Cipolla, R. 2000. Camera pose estimation and reconstruction from image profiles under circular motion. In Proceedings of European Conference on Computer Vision (ECCV’00), Dublin, Ireland, pp. 864–877.

  33. Mendonca, P., Wong, K., and Cipolla, R. 2001. Epipolar geometry from profiles under circular motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):604–616.

    Article  Google Scholar 

  34. Moezzi, S., Tai, L., and Gerard, P. 1997. Virtual view generation for 3D digital video. IEEE Computer Society Multimedia, 4(1).

  35. Noborio, H., Fukuda, S., and Arimoto, S. 1988. Construction of the octree approximating three-dimensional objects by using multiple views. IEEE Transactions Pattern Analysis and Machine Intelligence, 10(6):769–782.

    Article  Google Scholar 

  36. Okutomi, M. and Kanade, T. 1993. A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):353–363.

    Article  Google Scholar 

  37. Poelman, C. and Kanade, T. 1992. A paraperspective factorization method for shape and motion recovery. Technical Report CMU-CS-TR-92-208, Carnegie Mellon University, Pittsburgh, PA.

  38. Potmesil, M. 1987. Generating octree models of 3D objects from their silhouettes in a sequence of images. Computer Vision, Graphics and Image Processing, 40:1–20.

    Google Scholar 

  39. Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. 1993. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.

  40. Quan, L. and Kanade, T. 1996. A factorization method for affine structure from line correspondences. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’96), San Francisco, CA, pp. 803–808.

  41. Rusinkiewicz, S. and Levoy, M. 2001. Efficient variants of the ICP algorithm. In Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152.

  42. Sawhney, H. and Ayer, S. 1996. Compact representations of videos through dominant and multiple motion estimation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 18(8):814–830.

    Article  Google Scholar 

  43. Shanmukh, K. and Pujari, A. 1991. Volume intersection with optimal set of directions. Pattern Recognition Letter, 12:165–170.

    Article  Google Scholar 

  44. Szeliski, R. 1993. Rapid octree construction from image sequences. Computer Vision, Graphics and Image Processing: Image Understanding, 58(1):23–32.

    Google Scholar 

  45. Szeliski, R. 1994. Image mosaicing for tele-reality applications. Technical Report CRL 94/2, Compaq Cambridge Research Laboratory.

  46. Szeliski, R. and Golland, P. 1998. Stereo matching with transparency and matting. In Proceedings of the Sixth International Conference on Computer Vision (ICCV’98), pp. 517–524, Bombay, India.

  47. Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2):137–154.

    Article  Google Scholar 

  48. Vijayakumar, B., Kriegman, D., and Ponce, J. 1996. Structure and motion of curved 3D objects from monocular silhouettes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’96), San Francisco, CA, pp. 327–334.

  49. Wheeler, M. 1996. Automatic Modeling and Localization for Object Recognition. PhD thesis, Carnegie Mellon University.

  50. Wong, K. and Cipolla, R. 2001. Head model acquisition and silhouettes. In Proceedings of International Workshop on Visual Form (IWVF-4).

  51. Wong, K. and Cipolla, R. 2001. Structure and motion from silhouettes. In Proceedings of International Conference on Computer Vision (ICCV’01), Vancouver, Canada.

  52. Zhang, Z. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision, 13(2):119–152.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kong-man (German) Cheung.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cheung, Km., Baker, S. & Kanade, T. Shape-From-Silhouette Across Time Part I: Theory and Algorithms. Int J Comput Vision 62, 221–247 (2005). https://doi.org/10.1007/s11263-005-4881-5

Download citation

Keywords

  • 3D reconstruction
  • Shape-From-Silhouette
  • Visual Hull
  • across time
  • stereo
  • temporal alignment
  • alignment ambiguity
  • visibility