Skip to main content
Log in

Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene's shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene's trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region's dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectancemodel and complex real scenes (clothing, shiny objects, skin) illustrate our method's ability to explain pixels and pixel variations in terms of their underlying causes—shape, reflectance, motion, illumination, and visibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Amenta, N., Bern, M., and Kamvysselis, M. 1998. A new Voronoi-based surface reconstruction algorithm. In Proc. SIGGRAPH'98, pp. 415–421.

  • Anandan, P. 1989. A computational framework and an algorithm for the measurement of visual motion. Int. J. Computer Vision, 2:283–310.

    Google Scholar 

  • Avidan, S. and Shashua, A. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Machine Intell., 22(4):348–357.

    Google Scholar 

  • Baraff, D. and Witkin, A. 1998. Large steps in cloth simulation. In Proc. SIGGRAPH'98, pp. 43–54.

  • Belhumeur, P.N. 1996. A Bayesian approach to binocular stereopsis. Int. J. Computer Vision, 19(3):237–260.

    Google Scholar 

  • Ben-Ezra, M., Peleg, S., and Werman, M. 2000. Real-time motion analysis with linear programming. Computer Vision and Image Understanding, 78(1):32–52.

    Google Scholar 

  • Béréziat, D., Herlin, I., and Younes, L. 2000. A generalized optical flow constraint and its physical interpretation. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 487–492.

    Google Scholar 

  • Black, M.J. 1999. Explaining optical flow events with parameterized spatio-temporal models. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 326–332.

    Google Scholar 

  • Black, M.J. and Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1):75–104.

    Google Scholar 

  • Black, M.J., Fleet, D.J., and Yacoob, Y. 2000. Robustly estimating changes in image appearance. Computer Vision and Image Understanding, 78(1):8–31.

    Google Scholar 

  • Blake, A. and Bulthoff, H. 1991. Shape from specularities: Computation and psychophysics. Phil. Trans. R. Soc. Lond., 331:237–252.

    Google Scholar 

  • Blinn, J.F. 1978. Simulation of wrinkled surfaces. Computer Graphics, 12(3):286–292.

    Google Scholar 

  • Bouguet, J.-Y. and Perona, P. 1998. 3D photography on your desk. In Proc. 6th Int. Conf. on Computer Vision, pp. 43–50.

  • Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering non-rigid 3D shape from image streams. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 690–696.

    Google Scholar 

  • Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In Proc. Computer Vision and Pattern Recognition Conf., pp. 8–15.

  • Brodsky, T., Fermuller, C., and Aloimonos, Y. 1999. Shape from video. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 146–151.

    Google Scholar 

  • Burt, P.J. and Adelson, E.H. 1983. The Laplacian pyramid as a compact image code. IEEE Trans. on Communications, 31(4):532–540.

    Google Scholar 

  • Carceroni, R.L. and Kutulakos, K.N. 1999a. Toward recovering shape and motion of 3D curves from multi-view image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 192–197.

    Google Scholar 

  • Carceroni, R.L. and Kutulakos, K.N. 1999b. Multi-view 3D shape and motion recovery on the spatio-temporal curve manifold. In Proc. 7th Int. Conf. on Computer Vision., vol. 1, pp. 520–527.

    Google Scholar 

  • Caspi, Y. and Irani, M. 2000. A step towards sequence-to-sequence alignment. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 682–689.

    Google Scholar 

  • Chen, Q. and Medioni, G. 1999. A volumetric stereo matching method: Application to image-based modeling. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 29–34.

    Google Scholar 

  • Collins, R.T. 1996. A space-sweep approach to true multi-image matching. In Proc. Computer Vision and Pattern Recognition Conf., pp. 358–363.

  • Cook, R. and Torrance, K.E. 1981. A reflectance model for computer graphics. Computer Graphics, 15:307–316.

    Google Scholar 

  • DeCarlo, D. and Metaxas, D. 1998. Deformable model-based shape and motion analysis from images using motion residual error. In Proc. 6th Int. Conf. on Computer Vision, pp. 113–119.

  • DeCarlo, D. and Metaxas, D. 2000. Optical flow constraints on deformable models with applications to face tracking. Int. J. Computer Vision, 38(2):99–127.

    Google Scholar 

  • Delamare, Q. and Faugeras, O. 1999. 3D articulated models and multi-view tracking with silhouettes. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 716–721.

    Google Scholar 

  • Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 126–133.

    Google Scholar 

  • do Carmo, M.P. 1976. Differential Geometry of Curves and Surfaces. Prentice-Hall: Englewood Cliffs, NJ.

    Google Scholar 

  • Drummond, T. and Cipolla, R. 2000. Real-time tracking of multiple articulated structures in multiple views. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 20–36.

    Google Scholar 

  • Faugeras, O. and Keriven, R. 1998. Complete dense stereovision using level set methods. In Proc. 5th European Conf. on Computer Vision, pp. 379–393.

  • Faugeras, O.D. and Keriven, R. 1998. Variational principles, surface evolution, PDE's, level set methods and the stereo problem. IEEE Trans. Image Processing, 7(3):336–344.

    Google Scholar 

  • Fleet, D.J., Black, M.J., Yacoob, Y., and Jepson, A.D. 2000. Design and use of linear models for image motion analysis. Int. J. Computer Vision, 35(3):169–191.

    Google Scholar 

  • Fleet, D.J. and Jepson, A.D. 1990. Computation of component image velocity from local phase information. Int. J. Computer Vision, 5(1):77–104.

    Google Scholar 

  • Foley, J.D., van Dam, A., Feiner, S.K., and Hughes, J.F. 1990. Computer Graphics Principles and Practice. Addison-Wesley.

  • Forsyth, D. and Zisserman, A. 1991. Reflections on shading. IEEE Trans. Pattern Anal. Machine Intell., 13(7):671–679.

    Google Scholar 

  • Fua, P. 1997. From multiple stereo views to multiple 3-D surfaces. Int. J. Computer Vision, 24(1):19–35.

    Google Scholar 

  • Fua, P. 1999. Using model-driven bundle-adjustment to model heads from raw video image sequences. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 46–53.

    Google Scholar 

  • Fua, P. and Leclerc, Y.G. 1995. Object-centered surface reconstruction: Combining multi-image stereo and shading. Int. J. Computer Vision, 16:35–56.

    Google Scholar 

  • Gaucher, L. and Medioni, G. 1999. Accurate motion flow estimation with discontinuities. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 695–702.

    Google Scholar 

  • Guenter, B., Grimm, C., Malvar, H., and Wood, D. 1998. Making faces. In Proc. SIGGRAPH'98, pp. 55–66.

  • Haussecker, H.W. and Fleet, D.J. 2000. Computing optical flow with physical models of brightness variation. In Proc. Computer Vision and Pattern Recogition Conf., vol. 2, pp. 760–767.

    Google Scholar 

  • Horn, B.K.P. 1986. Robot Vision. MIT Press.

  • Irani, M. 1999. Multi-frame optical flow estimation using subspace constraints. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 626–633.

    Google Scholar 

  • Irani, M. and Peleg, S. 1991. Improving resolution by image registration. CVGIP: Graphical Models and Image Processing, 53:231–239.

    Google Scholar 

  • Irani, M., Rousso, B., and Peleg, S. 1997. Recovery of ego-motion using region alignment. IEEE Trans. Pattern Anal. Machine Intell., 19(3):268–272.

    Google Scholar 

  • Jin, H., Yezzi, A., and Soatto, S. 2000. Integrating multi-frame shape cues in a variational framework. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 169–176.

    Google Scholar 

  • Ju, S.X., Black, M.J., and Jepson, A.D. 1996. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In Proc. Computer Vision Pattern Recognition Conf., pp. 307–314.

  • Kanatani, K. and Ohta, N. 1999. Accuracy bounds and optimal computation of homography for image mosaicing applications. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 73–78.

    Google Scholar 

  • Koenderink, J.J. 1990. Solid Shape. MIT Press.

  • Koenderink, J.J., Doorn, A.J.V., Dana, K.J., and Nayar, S. 1999. Bidirectional reflection distribution of thoroughly pitted surfaces. Int. J. Computer Vision, 31(2/3):129–144.

    Google Scholar 

  • Kutulakos, K.N. 2000. Approximate N-View stereo. In Proc. 6th European Conf. on Computer Vision, vol. 1, pp. 67–83.

    Google Scholar 

  • Kutulakos, K.N. and Seitz, S.M. 2000. A theory of shape by space carving. Int. J. Computer Vision, 38(3):199–218. Marr Prize Special Issue.

    Google Scholar 

  • Lafortune, E.P.F., Foo, S., Torrance, K.E., and Greenberg, D.P. 1997. Non-linear approximation of reflectance functions. In Proc. SIGGRAPH'97, pp. 117–126.

  • Langer, M.S. and Zucker, S.W. 1994. Shape-from-shading on a cloudy day. J. Opt. Soc. Am. A, 11(2):467–478.

    Google Scholar 

  • Lin, S. and Lee, S.W. 1999. A representation of specular appearance. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 849–854.

    Google Scholar 

  • Lin, S. and Lee, S.W. 2000. An appearance representation for multiple reflection components. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 105–110.

    Google Scholar 

  • Loop, C. and Zhang, Z. 1999. Computing rectifying homographies for stereo vision. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 125–131.

    Google Scholar 

  • Lowe, D.G. 1991. Fitting parameterized three-dimensional models to images. IEEE Trans. Pattern Anal. Machine Intell., 13(5):441–449.

    Google Scholar 

  • Lu, R., Koenderinck, J.J., and Cappers, A.M.L. 1999. Specularities on surfaces with tangential hairs or grooves. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 2–7.

    Google Scholar 

  • Narayanan, P.J., Rander, P.W., and Kanade, T. 1998. Constructing virtual worlds using dense stereo. In Proc. 6th Int. Conf. on Computer Vision, pp. 3–10.

  • Nayar, S.K., Fang, X., and Boult, T.E. 1993. Removal of specularities using color and polarization. In Proc. Computer Vision and Pattern Recognition Conf., pp. 583–590.

  • Negahdaripour, S. 1998. Revised definition of optical frow: Integration of radiometric and geometric cues for dynamic scene analysis. IEEE Trans. Pattern Anal. Machine Intell., 20(9):961–979.

    Google Scholar 

  • Ohta, Y. and Kanade, T. 1985. Stereo by intra-and inter-scanline search using dynamic programming. IEEE Trans. Pattern Anal. Machine Intell., 7(2):139–154.

    Google Scholar 

  • Oren, M. and Nayar, S.K. 1997. A theory of specular surface geometry. Int. J. Computer Vision, 24(2):105–124.

    Google Scholar 

  • Papin, C., Bouthemy, P., and Rochard, G. 2000. Tracking and characterization of highly deformable cloud structures. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 428–442.

    Google Scholar 

  • Pratt, W.K. 1991. Digital Image Processing. John Wiley & Sons.

  • Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1988. Numerical Recipies in C. Cambridge University Press.

  • Ramamoorthi, R. and Hanrahan, P. 2001. A signal processing framework for inverse rendering. In Proc. SIGGRAPH'01, pp. 117–128.

  • Roy, S. and Cox, I.J. 1998. A maximum-flow formulation of the N-camera stereo correspondence problem. In Proc. 6th Int. Conf. on Computer Vision, pp. 492–499.

  • Samaras, D. and Metaxas, D. 1998. Incorporating illumination constraints in deformable models. In Proc. Computer Vision and Pattern Recognition Conf., pp. 322–329.

  • Sato, Y. and Ikeuchi, K. 1994. Temporal-color space analysis of reflection. J. Opt. Soc. Am. A, 11(11):2990–3002.

    Google Scholar 

  • Sato, Y., Wheeler, M.D., and Ikeuchi, K. 1997. Object shape and reflectance modeling from observation. In Proc. SIGGRAPH'97, pp. 379–387.

  • Seitz, S.M. and Dyer, C.R. 1999. Photorealistic scene reconstruction by voxel coloring. Int. J. Computer Vision, 35(2):151–173.

    Google Scholar 

  • Sidenbladh, H., Black, M.J., and Fleet, D.J. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 702–718.

    Google Scholar 

  • Silva, C. and Santos-Victor, J. 2000. Intrinsic images for dense stereo matching with occlusions. In Proc. 6th European Conf. on Computer Vision, vol. 1, pp. 100–114.

    Google Scholar 

  • Shashua, A. 1992. Geometry and photometry in 3D visual recognition. Ph.D. Thesis, MIT.

  • Smith, P., Drummond, T., and Cipolla, R. 2000. Motion segmentation by tracking edge information over multiple frames. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 396–410.

    Google Scholar 

  • Snow, D., Viola, P., and Zabih, R. 2000. Exact voxel occupancy with graph cuts. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 345–352.

    Google Scholar 

  • Szeliski, R. 1996. Video mosaics for virtual environments. IEEE Computer Graphics and Applications, 16(2):22–30.

    Google Scholar 

  • Szeliski, R. 1999. A multi-view approach to motion and stereo. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 157–163.

    Google Scholar 

  • Szeliski, R., Avidan, S., and Anandan, P. 2000. Layer extraction from multiple images containing reflections and transparency. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 246–253.

    Google Scholar 

  • Szeliski, R. and Golland, P. 1998. Stereo matching with transparency and matting. In Proc. 6th Int. Conf. on Computer Vision, pp. 517–524.

  • Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. Int. J. Computer Vision, 9(2):137–154.

    Google Scholar 

  • Torrance, K.E. and Sparrow, E.M. 1967. Theory of off-specular reflection from roughened surfaces. J. Opt. Soc. Am., 57:1105–1114.

    Google Scholar 

  • Tzovaras, D. and Grammalidis, N. 1997. Object-based coding of stereo image sequences using joint 3-D motion/disparity compensation. IEEE Trans. on Circuits and Systems for Video Technology, 7(2):312–327.

    Google Scholar 

  • Vedula, S., Baker, S., Rander, P., Collins, R., and Kanade, T. 1999. Three-dimensional scene flow. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 722–729.

    Google Scholar 

  • Vedula, S., Baker, S., Seitz, S., and Kanade, T. 2000. Shape and motion carving in 6D. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 592–598.

    Google Scholar 

  • Wang, J.Y. and Adelson, E.H. 1993. Layered representation for motion analysis. In Proc. Computer Vision and Pattern Recognition Conf., pp. 361–366.

  • Watt, A. 2000. 3D Computer Graphics. 3rd edn., Addison-Wesley.

  • Wexler, Y. and Shashua, A. 1999. Q-warping: Direct computation of quadratic reference surfaces. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 333–338.

    Google Scholar 

  • Wolff, L.B., Nayar, S.K., and Oren, M. 1998. Improved diffuse reflection models for computer vision. Int. J. Computer Vision, 30(1):55–71.

    Google Scholar 

  • Wood, D.N., Azuma, D.I., Aldinger, K., Curless, B., and Duchamp, T. 2000. Surface light fields for 3D photography. In Proc. SIGGRAPH'00, pp. 287–296.

  • Yacoob, Y. and Davis, L.S. 2000. Learned models for estimation of rigid and articulated human motion from stationary or moving camera. Int. J. Computer Vision, 36(1):5–30.

    Google Scholar 

  • Ye, M. and Haralick, R.M. 2000. Two-stage robust optical flow estimation. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 623–628.

    Google Scholar 

  • Yu, Y., Debevec, P., Malik, J., and Hawkins, T. 1999. Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proc. SIGGRAPH'99, pp. 215–224.

  • Zelnik-Manor, L. and Irani, M. 2000. Multi-frame estimation of planar motion. IEEE Trans. Pattern Anal. Machine Intell., 22(10):1105–1116.

    Google Scholar 

  • Zhang, Y. and Kambhamettu, C. 2000. Integrated 3D scene flow and structure recovery from multiview image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 674–681.

    Google Scholar 

  • Zhou, L. and Kambhamettu, C. 2000. Hierarchical structure and nonrigid motion recovery from monocular views. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 752–759.

    Google Scholar 

  • Zhou, L., Kambhamettu, C., and Goldgof, D.B. 2000. Fluid structure and motion analysis from multi-spectrum 2D cloud image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 744–751.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carceroni, R.L., Kutulakos, K.N. Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance. International Journal of Computer Vision 49, 175–214 (2002). https://doi.org/10.1023/A:1020145606604

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020145606604

Navigation