Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance

Carceroni, Rodrigo L.; Kutulakos, Kiriakos N.

doi:10.1023/A:1020145606604

Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance

Published: September 2002

Volume 49, pages 175–214, (2002)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Rodrigo L. Carceroni¹ &
Kiriakos N. Kutulakos²

437 Accesses
71 Citations
Explore all metrics

Abstract

In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene's shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene's trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region's dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectancemodel and complex real scenes (clothing, shiny objects, skin) illustrate our method's ability to explain pixels and pixel variations in terms of their underlying causes—shape, reflectance, motion, illumination, and visibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

Jonathon Luiten, Aljos̆a Os̆ep, … Bastian Leibe

Background-oriented schlieren (BOS) techniques

Article Open access 06 March 2015

Markus Raffel

LSD-SLAM: Large-Scale Direct Monocular SLAM

References

Amenta, N., Bern, M., and Kamvysselis, M. 1998. A new Voronoi-based surface reconstruction algorithm. In Proc. SIGGRAPH'98, pp. 415–421.
Anandan, P. 1989. A computational framework and an algorithm for the measurement of visual motion. Int. J. Computer Vision, 2:283–310.
Google Scholar
Avidan, S. and Shashua, A. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Machine Intell., 22(4):348–357.
Google Scholar
Baraff, D. and Witkin, A. 1998. Large steps in cloth simulation. In Proc. SIGGRAPH'98, pp. 43–54.
Belhumeur, P.N. 1996. A Bayesian approach to binocular stereopsis. Int. J. Computer Vision, 19(3):237–260.
Google Scholar
Ben-Ezra, M., Peleg, S., and Werman, M. 2000. Real-time motion analysis with linear programming. Computer Vision and Image Understanding, 78(1):32–52.
Google Scholar
Béréziat, D., Herlin, I., and Younes, L. 2000. A generalized optical flow constraint and its physical interpretation. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 487–492.
Google Scholar
Black, M.J. 1999. Explaining optical flow events with parameterized spatio-temporal models. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 326–332.
Google Scholar
Black, M.J. and Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1):75–104.
Google Scholar
Black, M.J., Fleet, D.J., and Yacoob, Y. 2000. Robustly estimating changes in image appearance. Computer Vision and Image Understanding, 78(1):8–31.
Google Scholar
Blake, A. and Bulthoff, H. 1991. Shape from specularities: Computation and psychophysics. Phil. Trans. R. Soc. Lond., 331:237–252.
Google Scholar
Blinn, J.F. 1978. Simulation of wrinkled surfaces. Computer Graphics, 12(3):286–292.
Google Scholar
Bouguet, J.-Y. and Perona, P. 1998. 3D photography on your desk. In Proc. 6th Int. Conf. on Computer Vision, pp. 43–50.
Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering non-rigid 3D shape from image streams. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 690–696.
Google Scholar
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In Proc. Computer Vision and Pattern Recognition Conf., pp. 8–15.
Brodsky, T., Fermuller, C., and Aloimonos, Y. 1999. Shape from video. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 146–151.
Google Scholar
Burt, P.J. and Adelson, E.H. 1983. The Laplacian pyramid as a compact image code. IEEE Trans. on Communications, 31(4):532–540.
Google Scholar
Carceroni, R.L. and Kutulakos, K.N. 1999a. Toward recovering shape and motion of 3D curves from multi-view image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 192–197.
Google Scholar
Carceroni, R.L. and Kutulakos, K.N. 1999b. Multi-view 3D shape and motion recovery on the spatio-temporal curve manifold. In Proc. 7th Int. Conf. on Computer Vision., vol. 1, pp. 520–527.
Google Scholar
Caspi, Y. and Irani, M. 2000. A step towards sequence-to-sequence alignment. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 682–689.
Google Scholar
Chen, Q. and Medioni, G. 1999. A volumetric stereo matching method: Application to image-based modeling. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 29–34.
Google Scholar
Collins, R.T. 1996. A space-sweep approach to true multi-image matching. In Proc. Computer Vision and Pattern Recognition Conf., pp. 358–363.
Cook, R. and Torrance, K.E. 1981. A reflectance model for computer graphics. Computer Graphics, 15:307–316.
Google Scholar
DeCarlo, D. and Metaxas, D. 1998. Deformable model-based shape and motion analysis from images using motion residual error. In Proc. 6th Int. Conf. on Computer Vision, pp. 113–119.
DeCarlo, D. and Metaxas, D. 2000. Optical flow constraints on deformable models with applications to face tracking. Int. J. Computer Vision, 38(2):99–127.
Google Scholar
Delamare, Q. and Faugeras, O. 1999. 3D articulated models and multi-view tracking with silhouettes. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 716–721.
Google Scholar
Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 126–133.
Google Scholar
do Carmo, M.P. 1976. Differential Geometry of Curves and Surfaces. Prentice-Hall: Englewood Cliffs, NJ.
Google Scholar
Drummond, T. and Cipolla, R. 2000. Real-time tracking of multiple articulated structures in multiple views. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 20–36.
Google Scholar
Faugeras, O. and Keriven, R. 1998. Complete dense stereovision using level set methods. In Proc. 5th European Conf. on Computer Vision, pp. 379–393.
Faugeras, O.D. and Keriven, R. 1998. Variational principles, surface evolution, PDE's, level set methods and the stereo problem. IEEE Trans. Image Processing, 7(3):336–344.
Google Scholar
Fleet, D.J., Black, M.J., Yacoob, Y., and Jepson, A.D. 2000. Design and use of linear models for image motion analysis. Int. J. Computer Vision, 35(3):169–191.
Google Scholar
Fleet, D.J. and Jepson, A.D. 1990. Computation of component image velocity from local phase information. Int. J. Computer Vision, 5(1):77–104.
Google Scholar
Foley, J.D., van Dam, A., Feiner, S.K., and Hughes, J.F. 1990. Computer Graphics Principles and Practice. Addison-Wesley.
Forsyth, D. and Zisserman, A. 1991. Reflections on shading. IEEE Trans. Pattern Anal. Machine Intell., 13(7):671–679.
Google Scholar
Fua, P. 1997. From multiple stereo views to multiple 3-D surfaces. Int. J. Computer Vision, 24(1):19–35.
Google Scholar
Fua, P. 1999. Using model-driven bundle-adjustment to model heads from raw video image sequences. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 46–53.
Google Scholar
Fua, P. and Leclerc, Y.G. 1995. Object-centered surface reconstruction: Combining multi-image stereo and shading. Int. J. Computer Vision, 16:35–56.
Google Scholar
Gaucher, L. and Medioni, G. 1999. Accurate motion flow estimation with discontinuities. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 695–702.
Google Scholar
Guenter, B., Grimm, C., Malvar, H., and Wood, D. 1998. Making faces. In Proc. SIGGRAPH'98, pp. 55–66.
Haussecker, H.W. and Fleet, D.J. 2000. Computing optical flow with physical models of brightness variation. In Proc. Computer Vision and Pattern Recogition Conf., vol. 2, pp. 760–767.
Google Scholar
Horn, B.K.P. 1986. Robot Vision. MIT Press.
Irani, M. 1999. Multi-frame optical flow estimation using subspace constraints. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 626–633.
Google Scholar
Irani, M. and Peleg, S. 1991. Improving resolution by image registration. CVGIP: Graphical Models and Image Processing, 53:231–239.
Google Scholar
Irani, M., Rousso, B., and Peleg, S. 1997. Recovery of ego-motion using region alignment. IEEE Trans. Pattern Anal. Machine Intell., 19(3):268–272.
Google Scholar
Jin, H., Yezzi, A., and Soatto, S. 2000. Integrating multi-frame shape cues in a variational framework. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 169–176.
Google Scholar
Ju, S.X., Black, M.J., and Jepson, A.D. 1996. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In Proc. Computer Vision Pattern Recognition Conf., pp. 307–314.
Kanatani, K. and Ohta, N. 1999. Accuracy bounds and optimal computation of homography for image mosaicing applications. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 73–78.
Google Scholar
Koenderink, J.J. 1990. Solid Shape. MIT Press.
Koenderink, J.J., Doorn, A.J.V., Dana, K.J., and Nayar, S. 1999. Bidirectional reflection distribution of thoroughly pitted surfaces. Int. J. Computer Vision, 31(2/3):129–144.
Google Scholar
Kutulakos, K.N. 2000. Approximate N-View stereo. In Proc. 6th European Conf. on Computer Vision, vol. 1, pp. 67–83.
Google Scholar
Kutulakos, K.N. and Seitz, S.M. 2000. A theory of shape by space carving. Int. J. Computer Vision, 38(3):199–218. Marr Prize Special Issue.
Google Scholar
Lafortune, E.P.F., Foo, S., Torrance, K.E., and Greenberg, D.P. 1997. Non-linear approximation of reflectance functions. In Proc. SIGGRAPH'97, pp. 117–126.
Langer, M.S. and Zucker, S.W. 1994. Shape-from-shading on a cloudy day. J. Opt. Soc. Am. A, 11(2):467–478.
Google Scholar
Lin, S. and Lee, S.W. 1999. A representation of specular appearance. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 849–854.
Google Scholar
Lin, S. and Lee, S.W. 2000. An appearance representation for multiple reflection components. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 105–110.
Google Scholar
Loop, C. and Zhang, Z. 1999. Computing rectifying homographies for stereo vision. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 125–131.
Google Scholar
Lowe, D.G. 1991. Fitting parameterized three-dimensional models to images. IEEE Trans. Pattern Anal. Machine Intell., 13(5):441–449.
Google Scholar
Lu, R., Koenderinck, J.J., and Cappers, A.M.L. 1999. Specularities on surfaces with tangential hairs or grooves. In Proc. 7th Int. Conf. on Computer Vision, vol. 1, pp. 2–7.
Google Scholar
Narayanan, P.J., Rander, P.W., and Kanade, T. 1998. Constructing virtual worlds using dense stereo. In Proc. 6th Int. Conf. on Computer Vision, pp. 3–10.
Nayar, S.K., Fang, X., and Boult, T.E. 1993. Removal of specularities using color and polarization. In Proc. Computer Vision and Pattern Recognition Conf., pp. 583–590.
Negahdaripour, S. 1998. Revised definition of optical frow: Integration of radiometric and geometric cues for dynamic scene analysis. IEEE Trans. Pattern Anal. Machine Intell., 20(9):961–979.
Google Scholar
Ohta, Y. and Kanade, T. 1985. Stereo by intra-and inter-scanline search using dynamic programming. IEEE Trans. Pattern Anal. Machine Intell., 7(2):139–154.
Google Scholar
Oren, M. and Nayar, S.K. 1997. A theory of specular surface geometry. Int. J. Computer Vision, 24(2):105–124.
Google Scholar
Papin, C., Bouthemy, P., and Rochard, G. 2000. Tracking and characterization of highly deformable cloud structures. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 428–442.
Google Scholar
Pratt, W.K. 1991. Digital Image Processing. John Wiley & Sons.
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1988. Numerical Recipies in C. Cambridge University Press.
Ramamoorthi, R. and Hanrahan, P. 2001. A signal processing framework for inverse rendering. In Proc. SIGGRAPH'01, pp. 117–128.
Roy, S. and Cox, I.J. 1998. A maximum-flow formulation of the N-camera stereo correspondence problem. In Proc. 6th Int. Conf. on Computer Vision, pp. 492–499.
Samaras, D. and Metaxas, D. 1998. Incorporating illumination constraints in deformable models. In Proc. Computer Vision and Pattern Recognition Conf., pp. 322–329.
Sato, Y. and Ikeuchi, K. 1994. Temporal-color space analysis of reflection. J. Opt. Soc. Am. A, 11(11):2990–3002.
Google Scholar
Sato, Y., Wheeler, M.D., and Ikeuchi, K. 1997. Object shape and reflectance modeling from observation. In Proc. SIGGRAPH'97, pp. 379–387.
Seitz, S.M. and Dyer, C.R. 1999. Photorealistic scene reconstruction by voxel coloring. Int. J. Computer Vision, 35(2):151–173.
Google Scholar
Sidenbladh, H., Black, M.J., and Fleet, D.J. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 702–718.
Google Scholar
Silva, C. and Santos-Victor, J. 2000. Intrinsic images for dense stereo matching with occlusions. In Proc. 6th European Conf. on Computer Vision, vol. 1, pp. 100–114.
Google Scholar
Shashua, A. 1992. Geometry and photometry in 3D visual recognition. Ph.D. Thesis, MIT.
Smith, P., Drummond, T., and Cipolla, R. 2000. Motion segmentation by tracking edge information over multiple frames. In Proc. 6th European Conf. on Computer Vision, vol. 2, pp. 396–410.
Google Scholar
Snow, D., Viola, P., and Zabih, R. 2000. Exact voxel occupancy with graph cuts. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 345–352.
Google Scholar
Szeliski, R. 1996. Video mosaics for virtual environments. IEEE Computer Graphics and Applications, 16(2):22–30.
Google Scholar
Szeliski, R. 1999. A multi-view approach to motion and stereo. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 157–163.
Google Scholar
Szeliski, R., Avidan, S., and Anandan, P. 2000. Layer extraction from multiple images containing reflections and transparency. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 246–253.
Google Scholar
Szeliski, R. and Golland, P. 1998. Stereo matching with transparency and matting. In Proc. 6th Int. Conf. on Computer Vision, pp. 517–524.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. Int. J. Computer Vision, 9(2):137–154.
Google Scholar
Torrance, K.E. and Sparrow, E.M. 1967. Theory of off-specular reflection from roughened surfaces. J. Opt. Soc. Am., 57:1105–1114.
Google Scholar
Tzovaras, D. and Grammalidis, N. 1997. Object-based coding of stereo image sequences using joint 3-D motion/disparity compensation. IEEE Trans. on Circuits and Systems for Video Technology, 7(2):312–327.
Google Scholar
Vedula, S., Baker, S., Rander, P., Collins, R., and Kanade, T. 1999. Three-dimensional scene flow. In Proc. 7th Int. Conf. on Computer Vision, vol. 2, pp. 722–729.
Google Scholar
Vedula, S., Baker, S., Seitz, S., and Kanade, T. 2000. Shape and motion carving in 6D. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 592–598.
Google Scholar
Wang, J.Y. and Adelson, E.H. 1993. Layered representation for motion analysis. In Proc. Computer Vision and Pattern Recognition Conf., pp. 361–366.
Watt, A. 2000. 3D Computer Graphics. 3rd edn., Addison-Wesley.
Wexler, Y. and Shashua, A. 1999. Q-warping: Direct computation of quadratic reference surfaces. In Proc. Computer Vision and Pattern Recognition Conf., vol. 1, pp. 333–338.
Google Scholar
Wolff, L.B., Nayar, S.K., and Oren, M. 1998. Improved diffuse reflection models for computer vision. Int. J. Computer Vision, 30(1):55–71.
Google Scholar
Wood, D.N., Azuma, D.I., Aldinger, K., Curless, B., and Duchamp, T. 2000. Surface light fields for 3D photography. In Proc. SIGGRAPH'00, pp. 287–296.
Yacoob, Y. and Davis, L.S. 2000. Learned models for estimation of rigid and articulated human motion from stationary or moving camera. Int. J. Computer Vision, 36(1):5–30.
Google Scholar
Ye, M. and Haralick, R.M. 2000. Two-stage robust optical flow estimation. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 623–628.
Google Scholar
Yu, Y., Debevec, P., Malik, J., and Hawkins, T. 1999. Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proc. SIGGRAPH'99, pp. 215–224.
Zelnik-Manor, L. and Irani, M. 2000. Multi-frame estimation of planar motion. IEEE Trans. Pattern Anal. Machine Intell., 22(10):1105–1116.
Google Scholar
Zhang, Y. and Kambhamettu, C. 2000. Integrated 3D scene flow and structure recovery from multiview image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 674–681.
Google Scholar
Zhou, L. and Kambhamettu, C. 2000. Hierarchical structure and nonrigid motion recovery from monocular views. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 752–759.
Google Scholar
Zhou, L., Kambhamettu, C., and Goldgof, D.B. 2000. Fluid structure and motion analysis from multi-spectrum 2D cloud image sequences. In Proc. Computer Vision and Pattern Recognition Conf., vol. 2, pp. 744–751.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, MG, CEP 31270-010, Brazil
Rodrigo L. Carceroni
Department of Computer Science, University of Toronto, Toronto, ON, M5S3H5, Canada
Kiriakos N. Kutulakos

Authors

Rodrigo L. Carceroni
View author publications
You can also search for this author in PubMed Google Scholar
Kiriakos N. Kutulakos
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carceroni, R.L., Kutulakos, K.N. Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance. International Journal of Computer Vision 49, 175–214 (2002). https://doi.org/10.1023/A:1020145606604

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1020145606604

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Background-oriented schlieren (BOS) techniques

LSD-SLAM: Large-Scale Direct Monocular SLAM

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Background-oriented schlieren (BOS) techniques

LSD-SLAM: Large-Scale Direct Monocular SLAM

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation