Shape Reconstruction Using Volume Sweeping and Learned Photoconsistency
Abstract
The rise of virtual and augmented reality fuels an increased need for content suitable to these new technologies including 3D contents obtained from real scenes. We consider in this paper the problem of 3D shape reconstruction from multi-view RGB images. We investigate the ability of learning-based strategies to effectively benefit the reconstruction of arbitrary shapes with improved precision and robustness. We especially target real life performance capture, containing complex surface details that are difficult to recover with existing approaches. A key step in the multi-view reconstruction pipeline lies in the search for matching features between viewpoints in order to infer depth information. We propose to cast the matching on a 3D receptive field along viewing lines and to learn a multi-view photoconsistency measure for that purpose. The intuition is that deep networks have the ability to learn local photometric configurations in a broad way, even with respect to different orientations along various viewing lines of the same surface point. Our results demonstrate this ability, showing that a CNN, trained on a standard static dataset, can help recover surface details on dynamic scenes that are not perceived by traditional 2D feature based methods. Our evaluation also shows that our solution compares on par to state-of-the-art-reconstruction pipelines on standard evaluation datasets, while yielding significantly better results and generalization with realistic performance capture data.
Keywords
Multi view Stereo reconstruction Learned photoconsistency Performance capture Volume sweepingNotes
Acknowledgements
Funded by France National Research grant ANR-14-CE24-0030 ACHMOV. Images 1-2-6-8 courtesy of Anja Rubik.
References
- 1.Kinovis INRIA Platform. https://kinovis.inria.fr/inria-platform/
- 2.Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32CrossRefGoogle Scholar
- 3.Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58CrossRefGoogle Scholar
- 4.Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Technical report. arXiv:1512.03012 [cs.GR] (2015)
- 5.Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
- 6.Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. (2015)Google Scholar
- 7.Collins, R.T.: A space-sweep approach to true multi-image matching. In: CVPR (1996)Google Scholar
- 8.Cremers, D., Kolev, K.: Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans. Pattern Anal. Mach. Intell. (2011)Google Scholar
- 9.Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: SIGGRAPH (1996)Google Scholar
- 10.Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM Trans. Graph. (2016)Google Scholar
- 11.Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR (2016)Google Scholar
- 12.Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. In: CVPR (2007)Google Scholar
- 13.Gall, J., Stoll, C., Aguiar, E.D., Theobalt, C., Rosenhahn, B., Seidel, H.-P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR (2009)Google Scholar
- 14.Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: ICCV (2017)Google Scholar
- 15.Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_22CrossRefGoogle Scholar
- 16.Jensen, R.R., Dahl, A.L., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: CVPR (2014)Google Scholar
- 17.Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: ICCV (2017)Google Scholar
- 18.Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: NIPS (2017)Google Scholar
- 19.Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: ICCV (2017)Google Scholar
- 20.Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. IJCV 38, 199–218 (2000)CrossRefGoogle Scholar
- 21.Labatut, P., Pons, J., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV (2007)Google Scholar
- 22.Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV (2017)Google Scholar
- 23.Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
- 24.Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: CVPR (2016)Google Scholar
- 25.Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: CVPR (2007)Google Scholar
- 26.Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR (2003)Google Scholar
- 27.Mustafa, A., Kim, H., Guillemaut, J., Hilton, A.: Temporally coherent 4D reconstruction of complex dynamic scenes. In: CVPR (2016)Google Scholar
- 28.Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR (2015)Google Scholar
- 29.Oswald, M.R., Cremers, D.: A convex relaxation approach to space time multi-view 3D reconstruction. In: ICCV Workshop on Dynamic Shape Capture and Analysis (4DMOD) (2013)Google Scholar
- 30.Pons, J.P., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV 72, 179–193 (2007)CrossRefGoogle Scholar
- 31.Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)Google Scholar
- 32.Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: CVPR (2006)Google Scholar
- 33.Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 21–31 (2007)CrossRefGoogle Scholar
- 34.Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: CVPR (2008)Google Scholar
- 35.Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR (2008)Google Scholar
- 36.Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32, 815–830 (2010)CrossRefGoogle Scholar
- 37.Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23, 903–920 (2012)CrossRefGoogle Scholar
- 38.Ulusoy, A.O., Geiger, A., Black, M.J.: Towards probabilistic volumetric reconstruction using ray potentials. In: 3DV (2015)Google Scholar
- 39.Ummenhofer, B., et al.: Demon: depth and motion network for learning monocular stereo. In: CVPR (2017)Google Scholar
- 40.Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2287–2318 (2016)zbMATHGoogle Scholar
- 41.Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)Google Scholar