Shape Reconstruction Using Volume Sweeping and Learned Photoconsistency

  • Vincent LeroyEmail author
  • Jean-Sébastien Franco
  • Edmond Boyer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


The rise of virtual and augmented reality fuels an increased need for content suitable to these new technologies including 3D contents obtained from real scenes. We consider in this paper the problem of 3D shape reconstruction from multi-view RGB images. We investigate the ability of learning-based strategies to effectively benefit the reconstruction of arbitrary shapes with improved precision and robustness. We especially target real life performance capture, containing complex surface details that are difficult to recover with existing approaches. A key step in the multi-view reconstruction pipeline lies in the search for matching features between viewpoints in order to infer depth information. We propose to cast the matching on a 3D receptive field along viewing lines and to learn a multi-view photoconsistency measure for that purpose. The intuition is that deep networks have the ability to learn local photometric configurations in a broad way, even with respect to different orientations along various viewing lines of the same surface point. Our results demonstrate this ability, showing that a CNN, trained on a standard static dataset, can help recover surface details on dynamic scenes that are not perceived by traditional 2D feature based methods. Our evaluation also shows that our solution compares on par to state-of-the-art-reconstruction pipelines on standard evaluation datasets, while yielding significantly better results and generalization with realistic performance capture data.


Multi view Stereo reconstruction Learned photoconsistency Performance capture Volume sweeping 



Funded by France National Research grant ANR-14-CE24-0030 ACHMOV. Images 1-2-6-8 courtesy of Anja Rubik.


  1. 1.
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). Scholar
  3. 3.
    Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). Scholar
  4. 4.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Technical report. arXiv:1512.03012 [cs.GR] (2015)
  5. 5.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). Scholar
  6. 6.
    Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. (2015)Google Scholar
  7. 7.
    Collins, R.T.: A space-sweep approach to true multi-image matching. In: CVPR (1996)Google Scholar
  8. 8.
    Cremers, D., Kolev, K.: Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans. Pattern Anal. Mach. Intell. (2011)Google Scholar
  9. 9.
    Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: SIGGRAPH (1996)Google Scholar
  10. 10.
    Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM Trans. Graph. (2016)Google Scholar
  11. 11.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR (2016)Google Scholar
  12. 12.
    Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. In: CVPR (2007)Google Scholar
  13. 13.
    Gall, J., Stoll, C., Aguiar, E.D., Theobalt, C., Rosenhahn, B., Seidel, H.-P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR (2009)Google Scholar
  14. 14.
    Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: ICCV (2017)Google Scholar
  15. 15.
    Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016). Scholar
  16. 16.
    Jensen, R.R., Dahl, A.L., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: CVPR (2014)Google Scholar
  17. 17.
    Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: ICCV (2017)Google Scholar
  18. 18.
    Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: NIPS (2017)Google Scholar
  19. 19.
    Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: ICCV (2017)Google Scholar
  20. 20.
    Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. IJCV 38, 199–218 (2000)CrossRefGoogle Scholar
  21. 21.
    Labatut, P., Pons, J., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV (2007)Google Scholar
  22. 22.
    Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV (2017)Google Scholar
  23. 23.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  24. 24.
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: CVPR (2016)Google Scholar
  25. 25.
    Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: CVPR (2007)Google Scholar
  26. 26.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR (2003)Google Scholar
  27. 27.
    Mustafa, A., Kim, H., Guillemaut, J., Hilton, A.: Temporally coherent 4D reconstruction of complex dynamic scenes. In: CVPR (2016)Google Scholar
  28. 28.
    Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR (2015)Google Scholar
  29. 29.
    Oswald, M.R., Cremers, D.: A convex relaxation approach to space time multi-view 3D reconstruction. In: ICCV Workshop on Dynamic Shape Capture and Analysis (4DMOD) (2013)Google Scholar
  30. 30.
    Pons, J.P., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV 72, 179–193 (2007)CrossRefGoogle Scholar
  31. 31.
    Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)Google Scholar
  32. 32.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: CVPR (2006)Google Scholar
  33. 33.
    Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 21–31 (2007)CrossRefGoogle Scholar
  34. 34.
    Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: CVPR (2008)Google Scholar
  35. 35.
    Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR (2008)Google Scholar
  36. 36.
    Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32, 815–830 (2010)CrossRefGoogle Scholar
  37. 37.
    Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23, 903–920 (2012)CrossRefGoogle Scholar
  38. 38.
    Ulusoy, A.O., Geiger, A., Black, M.J.: Towards probabilistic volumetric reconstruction using ray potentials. In: 3DV (2015)Google Scholar
  39. 39.
    Ummenhofer, B., et al.: Demon: depth and motion network for learning monocular stereo. In: CVPR (2017)Google Scholar
  40. 40.
    Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2287–2318 (2016)zbMATHGoogle Scholar
  41. 41.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Vincent Leroy
    • 1
    Email author
  • Jean-Sébastien Franco
    • 1
  • Edmond Boyer
    • 1
  1. 1.Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), LJKGrenobleFrance

Personalised recommendations