Abstract
Sparse view 3D reconstruction has attracted increasing attention with the development of neural implicit 3D representation. Existing methods usually only make use of 2D views, requiring a dense set of input views for accurate 3D reconstruction. In this paper, we show that accurate 3D reconstruction can be achieved by incorporating geometric priors into neural implicit 3D reconstruction. Our method adopts the signed distance function as the 3D representation, and learns a generalizable 3D surface reconstruction model from sparse views. Specifically, we build a more effective and sparse feature volume from the input views by using corresponding depth maps, which can be provided by depth sensors or directly predicted from the input views. We recover better geometric details by imposing both depth and surface normal constraints in addition to the color loss when training the neural implicit 3D representation. Experiments demonstrate that our method both outperforms state-of-the-art approaches, and achieves good generalizability.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Mildenhall, B.; Srinivasan, P. P.; Tancik, M.; Barron, J. T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM Vol. 65, No. 1, 99–106, 2022.
Eftekhar, A.; Sax, A.; Malik, J.; Zamir, A. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3D scans. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10766–10776, 2021.
Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. MVSNet: Depth inference for unstructured multi-view stereo. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11212. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 785–801, 2018.
Long, X.; Lin, C.; Wang, P.; Komura, T.; Wang, W. SparseNeuS: Fast generalizable neural surface reconstruction from sparse views. In: Computer Vision — ECCV 2022. Lecture Notes in Computer Science, Vol. 13692. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 210–227, 2022.
Seitz, S. M.; Dyer, C. R. Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision Vol. 35, No. 2, 151–173, 1999.
Kar, A.; Häne, C.; Malik, J. Learning a multiview stereo machine. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 364–375, 2017.
Sun, J. M.; Xie, Y. M.; Chen, L. H.; Zhou, X. W.; Bao, H. J. NeuralRecon: Real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15593–15602, 2021.
Ji, M. Q.; Zhang, J. Z.; Dai, Q. H.; Fang, L. SurfaceNet: An end-to-end 3D neural network for very sparse multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 11, 4078–4093, 2021.
Lhuillier, M.; Quan, L. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 27, No. 3, 418–433, 2005.
Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 8, 1362–1376, 2010.
Gu, X. D.; Fan, Z. W.; Zhu, S. Y.; Dai, Z. Z.; Tan, F. T.; Tan, P. Cascade cost volume for high-resolution multiview stereo and stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2492–2501, 2020.
Long, X.; Liu, L.; Theobalt, C.; Wang, W. Occlusion-aware depth estimation with adaptive normal constraints. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12354. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 640–657, 2020.
Long, X. X.; Lin, C.; Liu, L. J.; Li, W.; Theobalt, C.; Yang, R. G.; Wang, W. Adaptive surface normal constraint for depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 12829–12838, 2021.
Long, X. X.; Liu, L. J.; Li, W.; Theobalt, C.; Wang, W. P. Multi-view depth estimation using epipolar spatio-temporal networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8254–8263, 2021.
Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendón-Mancha, J. M. Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review Vol. 43, No. 1, 55–81, 2015.
Xu, Z. W.; Rong, Z.; Wu, Y. H. A survey: Which features are required for dynamic visual simultaneous localization and mapping? Visual Computing for Industry, Biomedicine, and Art Vol. 4, No. 1, Article No. 20, 2021.
Özyeşil, O.; Voroninski, V.; Basri, R.; Singer, A. A survey of structure from motion. Acta Numerica Vol. 26, 305–364, 2017.
Li, Y. Z.; Luo, F.; Xiao, C. X. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module. Computational Visual Media Vol. 8, No. 4, 631–647, 2022.
Schönberger, J. L.; Frahm, J. M. Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4104–4113, 2016.
Choy, C. B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3D-R2N2: A unified approach for single and multiview 3D object reconstruction. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 628–644, 2016.
Huang, P. H.; Matzen, K.; Kopf, J.; Ahuja, N.; Huang, J. B. DeepMVS: Learning multi-view stereopsis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2821–2830, 2018.
Wang, D.; Cui, X. R.; Chen, X.; Zou, Z. X.; Shi, T. Y.; Salcudean, S.; Wang, Z. J.; Ward, R. Multi-view 3D reconstruction with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5702–5711, 2021.
Liu, L.; Gu, J.; Lin, K. Z.; Chua, T.; Theobalt, C. Neural sparse voxel fields. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 1313, 15651–15663, 2020.
Trevithick, A.; Yang, B. GRF: Learning a general radiance field for 3D representation and rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 15162–15172, 2021.
Barron, J. T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P. P. Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5835–5844, 2021.
Verbin, D.; Hedman, P.; Mildenhall, B.; Zickler, T.; Barron, J. T.; Srinivasan, P. P. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5481–5490, 2022.
Guo, Y. C.; Kang, D.; Bao, L. C.; He, Y.; Zhang, S. H. NeRFReN: Neural radiance fields with reflections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18388–18397, 2022.
Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 27171–27183, 2021.
Zhang, J. Y.; Yao, Y.; Quan, L. Learning signed distance field for multi-view surface reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6505–6514, 2021.
Niemeyer, M.; Mescheder, L.; Oechsle, M.; Geiger, A. Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3501–3512, 2020.
Oechsle, M.; Peng, S. Y.; Geiger, A. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5569–5579, 2021.
Darmon, F.; Bascle, B.; Devaux, J. C.; Monasse, P.; Aubry, M. Improving neural implicit surfaces geometry with patch warping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6250–6259, 2022.
Yariv, L.; Gu, J.; Kasten, Y.; Lipman, Y. Volume rendering of neural implicit surfaces. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 4805–4815, 2021.
Yariv, L.; Kasten, Y.; Moran, D.; Galun, M.; Atzmon, M.; Basri, R.; Lipman, Y. Multiview neural surface reconstruction by disentangling geometry and appearance. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 210, 2492–2502, 2020.
Liu, S. H.; Zhang, Y. D.; Peng, S. Y.; Shi, B. X.; Pollefeys, M.; Cui, Z. P. DIST: Rendering deep implicit signed distance function with differentiable sphere tracing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016–2025, 2020.
Kellnhofer, P.; Jebe, L. C.; Jones, A.; Spicer, R.; Pulli, K.; Wetzstein, G. Neural lumigraph rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4285–4295, 2021.
Insafutdinov, E.; Campbell, D.; Henriques, J. F.; Vedaldi, A. SNeS: Learning probably symmetric neural surfaces from incomplete data. In: Computer Vision — ECCV 2022. Lecture Notes in Computer Science, Vol. 13692. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 367–383, 2022.
Azinović, D.; Martin-Brualla, R.; Goldman, D. B.; Nießner, M.; Thies, J. Neural RGB-D surface reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6280–6291, 2022.
Guo, H. Y.; Peng, S. D.; Lin, H. T.; Wang, Q. Q.; Zhang, G. F.; Bao, H. J.; Zhou, X. Neural 3D scene reconstruction with the Manhattan-world assumption. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5501–5510, 2022.
Wang, J. P.; Wang, P.; Long, X. X.; Theobalt, C.; Komura, T.; Liu, L. J.; Wang, W. NeuRIS: Neural reconstruction of indoor scenes using normal priors. In: Computer Vision — ECCV 2022. Lecture Notes in Computer Science, Vol. 13692. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 139–155, 2022.
Chen, A. P.; Xu, Z. X.; Zhao, F. Q.; Zhang, X. S.; Xiang, F. B.; Yu, J. Y.; Su, H. MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14104–14113, 2021.
Chibane, J.; Bansal, A.; Lazova, V.; Pons-Moll, G. Stereo radiance fields (SRF): Learning view synthesis for sparse views of novel scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7907–7916, 2021.
Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelNeRF: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4576–4585, 2021.
Wang, Q. Q.; Wang, Z. C.; Genova, K.; Srinivasan, P.; Zhou, H.; Barron, J. T.; Martin-Brualla, R.; Snavely, N.; Funkhouser T. A. IBRNet: Learning multi-view image-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4688–4697, 2021.
Roessle, B.; Barron, J. T.; Mildenhall, B.; Srinivasan, P. P.; Nießner, M. Dense depth priors for neural radiance fields from sparse input views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12882–12891, 2022.
Johari, M. M.; Lepoittevin, Y.; Fleuret, F. GeoNeRF: Generalizing NeRF with geometry priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18344–18347, 2022.
Yu, Z.; Peng, S.; Niemeyer, M.; Sattler, T.; Geiger, A. MonoSDF: Exploring monocular geometric cues for neural implicit surface reconstruction. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 2022.
Jensen, R.; Dahl, A.; Vogiatzis, G.; Tola, E.; Aanæs, H. Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 406–413, 2014.
Lin, T. Y.; Dollár, P.; Girshick, R.; He, K. M.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 936–944, 2017.
Tang, H. T.; Liu, Z. J.; Zhao, S. Y.; Lin, Y. J.; Lin, J.; Wang, H. R.; Han, S. Searching efficient 3D architectures with sparse point-voxel convolution. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12373. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 685–702, 2020.
Hu, S. M.; Liang, D.; Yang, G. Y.; Yang, G. W.; Zhou, W. Y. Jittor: A novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences Vol. 63, No. 12, Article No. 222103, 2020.
Acknowledgements
We thank the anonymous reviewers for their valuable comments on this paper. This work was supported by the National Natural Science Foundation of China (Grant No. 61902210).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Tai-Jiang Mu is an assistant researcher in the Department of Computer Science and Technology at Tsinghua University. He received his bachelor degree and Ph.D. degree in computer science and technology from Tsinghua University in 2011 and 2016, respectively. His research interests include computer graphics, visual media learning, 3D reconstruction, and 3D understanding.
Hao-Xiang Chen received his bachelor degree in computer science from Jilin University in 2020. He is currently a Ph.D. candidate in the Department of Computer Science and Technology, Tsinghua University. His research interests include 3D reconstruction and 3D computer vision.
Jun-Xiong Cai is currently a postdoctoral researcher at Tsinghua University, where he received his Ph.D. degree in computer science and technology in 2020. His research interests include computer graphics, computer vision, and 3D geometry processing.
Ning Guo is an assistant researcher at the Academy of Military Sciences. He received his bachelor degree, master degree, and Ph.D. degree in information and communication engineering from the National University of Defense Technology in 2014, 2016, and 2020, respectively. His research interests include digital earth, 3D GIS, 3D reconstruction, and spatial databases.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Mu, TJ., Chen, HX., Cai, JX. et al. Neural 3D reconstruction from sparse views using geometric priors. Comp. Visual Media 9, 687–697 (2023). https://doi.org/10.1007/s41095-023-0337-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-023-0337-5