3D Reconstruction and Video-Based Rendering of Casually Captured Videos

  • Aparna Taneja
  • Luca Ballan
  • Jens Puwein
  • Gabriel J. Brostow
  • Marc Pollefeys
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7082)


In this chapter we explore the possibility of interactively navigating a collection of casually captured videos of a performance: real-world footage captured on hand held cameras by a few members of the audience. The aim is to navigate the video collection in 3D by generating video based rendering of the performance using the offline pre-computed reconstruction of the event.

We propose two different techniques to obtain this reconstruction, considering that the video collection may have been recorded in complex, uncontrolled outdoor environments. One approach recovers the event geometry by exploring the temporal domain of each video independently, while the other explores the spatial domain of the video collection at each time instant, independently. The pros and cons of the two methods and their applicability to the addressed navigation problem, are also discussed. In the end, we propose an interactive GPU-accelerated viewing tool to navigate the video collection.


Video-Based Rendering Dynamic scene reconstruction Free Viewpoint Video 3DVideo 3D reconstruction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: Exploring photo collections in 3d. In: SIGGRAPH Conference Proceedings, pp. 835–846 (2006)Google Scholar
  2. 2.
    Snavely, N., Garg, R., Seitz, S.M., Szeliski, R.: Finding paths through the world’s photos. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2008) 27, 11–21 (2008)Google Scholar
  3. 3.
    Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., Klowsky, R., Steedly, D., Szeliski, R.: Ambient point coulds for view interpolation. In: SIGGRAPH (2010)Google Scholar
  4. 4.
    Kim, H., Sarim, M., Takai, T., Guillemaut, J.Y., Hilton, A.: Dynamic 3d scene reconstruction in outdoor environments. In: 3DPVT (2010)Google Scholar
  5. 5.
    Guan, L., Franco, J.S., Pollefeys, M.: Multi-object shape estimation and tracking from silhouette cues. In: CVPR (2008)Google Scholar
  6. 6.
    Franco, J.-S., Boyer, E.: Fusion of multi-view silhouette cues using a space occupancy grid. In: ICCV, pp. 1747–1753 (2005)Google Scholar
  7. 7.
    Matusik, W., Buehler, C., Raskar, R., Gortler, S.J., McMillan, L.: Image-based visual hulls. In: Proceedings of ACM SIGGRAPH, pp. 369–374 (2000)Google Scholar
  8. 8.
    Sarim, M., Hilton, A., Guillemaut, J.Y., Kim, H., Takai, T.: Multiple view wide-baseline trimap propagation for natural video matting. In: 2010 Conference on Visual Media Production (CVMP), pp. 82–91 (2010)Google Scholar
  9. 9.
    Seitz, S., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: CVPR (2006)Google Scholar
  10. 10.
    Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. In: CVPR, p. 1067 (1997)Google Scholar
  11. 11.
    Furukawa, Y., Ponce, J.: Dense 3d motion capture for human faces. In: CVPR, pp. 1674–1681 (2009)Google Scholar
  12. 12.
    Vlasic, D., Peers, P., Baran, I., Debevec, P., Popović, J., Rusinkiewicz, S., Matusik, W.: Dynamic shape capture using multi-view photometric stereo. In: SIGGRAPH Asia (2009)Google Scholar
  13. 13.
    Ahmed, N., Theobalt, C., Dobrev, P., Seidel, H.P., Thrun, S.: Robust fusion of dynamic shape and normal capture for high-quality reconstruction of time-varying geometry. In: CVPR (2008)Google Scholar
  14. 14.
    Hernández, C., Vogiatzis, G., Cipolla, R.: Shadows in three-source photometric stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 290–303. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Vedula, S., Baker, S., Seitz, S., Kanade, T.: Shape and motion carving in 6d. In: CVPR (2000)Google Scholar
  16. 16.
    Goldlucke, B., Ihrke, I., Linz, C., Magnor, M.: Weighted minimal hypersurface reconstruction. PAMI, 1194–1208 (2007)Google Scholar
  17. 17.
    Hilton, A., Starck, J.: Multiple view reconstruction of people. In: 3DPVT (2004)Google Scholar
  18. 18.
    Sinha, S.N., Pollefeys, M.: Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In: ICCV, pp. 349–356 (2005)Google Scholar
  19. 19.
    Tung, T., Nobuhara, S., Matsuyama, T.: Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In: ICCV (2009)Google Scholar
  20. 20.
    Waschbüsch, M., Würmlin, S., Gross, M.H.: 3d video billboard clouds. Computer Graphics Forum (Proc. Eurographics EG 2007) 26, 561–569 (2007)CrossRefGoogle Scholar
  21. 21.
    Ballan, L., Cortelazzo, G.M.: Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In: 3DPVT (June 2008)Google Scholar
  22. 22.
    Carranza, J., Theobalt, C., Magnor, M.A., Peter Seidel, H.: Free-viewpoint video of human actors. ACM Transactions on Graphics, 569–577 (2003)Google Scholar
  23. 23.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 97:1–97:9 (2008)Google Scholar
  24. 24.
    de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 1–10 (2008)CrossRefGoogle Scholar
  25. 25.
    Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. ACM Transactions on Graphics 23, 600–608 (2004)CrossRefGoogle Scholar
  26. 26.
    Kanade, T.: Carnegie mellon goes to the superbowl (2001),
  27. 27.
    Würmlin, S., Niederberger, C.: Realistic virtual replays for sports broadcasts (2010),
  28. 28.
    Guillemaut, J.-Y., Kilner, J., Hilton, A.: Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes. In: ICCV (2009)Google Scholar
  29. 29.
    Hayashi, K., Saito, H.: Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In: CGIV, pp. 220–225 (2006)Google Scholar
  30. 30.
    Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: CVPR, pp. 224–231 (2009)Google Scholar
  31. 31.
    Lipski, C., Linz, C., Berger, K., Sellent, A., Magnor, M.: Virtual video camera: Image-based viewpoint navigation through space and time. Computer Graphics Forum 29, 2555–2568 (2010)CrossRefGoogle Scholar
  32. 32.
    Eisemann, M., Decker, B.D., Magnor, M., Bekaert, P., de Aguiar, E., Ahmed, N., Theobalt, C., Sellent, A.: Floating Textures. Computer Graphics Forum (Proc. Eurographics EG 2008) 27, 409–418 (2008)CrossRefGoogle Scholar
  33. 33.
    Seitz, S.M., Dyer, C.R.: View morphing. In: Proceedings of ACM SIGGRAPH, pp. 21–30 (1996)Google Scholar
  34. 34.
    Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. IJCV 59, 207–232 (2004)CrossRefGoogle Scholar
  35. 35.
    Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27, 418–433 (2005)CrossRefGoogle Scholar
  36. 36.
    Ballan, L., Cortelazzo, G.M.: Multimodal 3D shape recovery from texture, silhouette and shadow information. In: 3DPVT. Chapel Hill, USA (2006)Google Scholar
  37. 37.
    Campbell, N.D., Vogiatzis, G., Hernández, C., Cipolla, R.: Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In: 18th British Machine Vision Conference, vol. 1, pp. 530–539 (2007)Google Scholar
  38. 38.
    Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: ICCV, pp. 1–8 (2007)Google Scholar
  39. 39.
    Ballan, L., Brusco, N., Cortelazzo, G.M.: 3D Content Creation by Passive Optical Methods. In: 3D Online Multimedia and Games: Processing, Visualization and Transmission. World Scientific Publishing, Singapore (2008)Google Scholar
  40. 40.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: CVPR, pp. 519–528 (2006)Google Scholar
  41. 41.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  42. 42.
    Gallup, D., Frahm, J.M., Mordohai, P., Yang, Q., Pollefeys, M.: Real-time plane-sweeping stereo with multiple sweeping directions. In: CVPR (2007)Google Scholar
  43. 43.
    Zach, C., Pock, T., Bischof, H.: A globally optimal algorithm for robust tv-l1 range image integration. In: ICCV (2007)Google Scholar
  44. 44.
    Sheffer, A., Praun, E., Rose, K.: Mesh parameterization methods and their applications. Foundations and Trends in Computer Graphics and Vision 2, 105–171 (2006)CrossRefzbMATHGoogle Scholar
  45. 45.
    Brusco, N., Ballan, L., Cortelazzo, G.M.: Passive reconstruction of high quality textured 3D models of works of art. In: 6th International Symposium on Virtual Reality, Archeology and Cultural Heritage, VAST (2005)Google Scholar
  46. 46.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000); ISBN: 0521623049 zbMATHGoogle Scholar
  47. 47.
    Arulampalam, M.S., Maskell, S., Gordon, N.: A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Processing 50, 174–188 (2002)CrossRefGoogle Scholar
  48. 48.
    Sinha, S.N., Pollefeys, M.: Synchronization and calibration of camera networks from silhouettes. In: ICPR 2004: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR 2004), vol. 1, pp. 116–119 (2004)Google Scholar
  49. 49.
    Tuytelaars, T., Van Gool, L.: Synchronizing video sequences. In: CVPR, vol. 1, pp. 762–768 (2004)Google Scholar
  50. 50.
    Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Computer Graphics and Applications 21, 34–41 (2001)CrossRefGoogle Scholar
  51. 51.
    Baumberg, A., Hogg, D.: An efficient method for contour tracking using active shape models. In: Motion of Non-Rigid and Articulated Objects, pp. 194–199 (1994)Google Scholar
  52. 52.
    Leibe, B., Cornelis, N., Cornelis, K., Gool, L.V.: Dynamic 3d scene analysis from a moving vehicle. In: CVPR (2007)Google Scholar
  53. 53.
    Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.S.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE 90, 1151–1163 (2002)CrossRefGoogle Scholar
  54. 54.
    Sheikh, Y., Javed, O., Kanade, T.: Background subtraction for freely moving cameras. In: ICCV (2009)Google Scholar
  55. 55.
    Bai, X., Wang, J., Simons, D., Sapiro, G.: Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. 28 (2009)Google Scholar
  56. 56.
    Wang, J., Bhat, P., Colburn, R.A., Agrawala, M., Cohen, M.F.: Interactive video cutout. ACM Trans. Graph. 24, 585–594 (2005)CrossRefGoogle Scholar
  57. 57.
    Sun, J., Zhang, W., Tang, X., Shum, H.Y.: Background cut. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 628–641. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  58. 58.
    Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Transactions on Graphics (Proceedings of SIGGRAPH), 1–11 (2010),
  59. 59.
    Taneja, A., Ballan, L., Pollefeys, M.: Modeling dynamic scenes recorded with freely moving cameras. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 613–626. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  60. 60.
    Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 17, 790–799 (1995)CrossRefGoogle Scholar
  61. 61.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI 26, 1124–1137 (2004)CrossRefzbMATHGoogle Scholar
  62. 62.
    Rav-Acha, A., Kohli, P., Rother, C., Fitzgibbon, A.: Unwrap mosaics: A new representation for video editing. ACM Transactions on Graphics (SIGGRAPH 2008) (2008)Google Scholar
  63. 63.
    Chuang, Y.Y., Curless, B., Salesin, D.H., Szeliski, R.: A bayesian approach to digital matting. In: Proceedings of IEEE CVPR 2001, Kauai, Hawaii, vol. 2, pp. 264–271 (2001)Google Scholar
  64. 64.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23, 1222–1239 (2001)CrossRefGoogle Scholar
  65. 65.
    Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? PAMI 26, 147–159 (2004)CrossRefGoogle Scholar
  66. 66.
    Lorensen, W.E., Cline, H.E.: Marching cubes: A high resolution 3d surface construction algorithm. SIGGRAPH 21, 163–169 (1987)CrossRefGoogle Scholar
  67. 67.
    Buehler, C., Bosse, M., McMillan, L., Gortler, S.J., Cohen, M.F.: Unstructured lumigraph rendering. In: SIGGRAPH, pp. 425–432 (2001)Google Scholar
  68. 68.
    Grundland, M., Vohra, R., Williams, G.P., Dodgson, N.A.: Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In: Proceedings of EUROGRAPHICS, Computer Graphics Forum, pp. 577–586 (2006)Google Scholar
  69. 69.
    Schödl, A., Szeliski, R., Salesin, D.H., Essa, I.: Video textures. In: SIGGRAPH 2000: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 489–498 (2000)Google Scholar
  70. 70.
    Rong, G., Tan, T.S.: Jump flooding in gpu with applications to voronoi diagram and distance transform. In: ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), pp. 109–116. ACM, New York (2006)Google Scholar
  71. 71.
    Wang, J., Bodenheimer, B.: Synthesis and evaluation of linear motion transitions. ACM Trans. Graph 27, 1–15 (2008)Google Scholar
  72. 72.
    Debevec, P., Borshukov, G., Yu, Y.: Efficient view-dependent image-based rendering with projective texture-mapping. In: 9th Eurographics Workshop on Rendering (1998)Google Scholar
  73. 73.
  74. 74.
    Kilner, J., Starck, J., Hilton, A.: A comparative study of free-viewpoint video techniques for sports events. In: CVMP (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Aparna Taneja
    • 1
  • Luca Ballan
    • 1
  • Jens Puwein
    • 1
  • Gabriel J. Brostow
    • 2
  • Marc Pollefeys
    • 1
  1. 1.Computer Vision and Geometry GroupETH ZurichSwitzerland
  2. 2.Department of Computer ScienceUniversity College LondonUK

Personalised recommendations