NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors

Wang, Jiepeng; Wang, Peng; Long, Xiaoxiao; Theobalt, Christian; Komura, Taku; Liu, Lingjie; Wang, Wenping

doi:10.1007/978-3-031-19824-3_9

Jiepeng Wang¹²,
Peng Wang¹²,
Xiaoxiao Long¹²,
Christian Theobalt¹³,
Taku Komura¹²,
Lingjie Liu¹³ &
…
Wenping Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

2885 Accesses
27 Citations

Abstract

Reconstructing 3D indoor scenes from 2D images is an important task in many computer vision and graphics applications. A main challenge in this task is that large texture-less areas in typical indoor scenes make existing methods struggle to produce satisfactory reconstruction results. We propose a new method, named NeuRIS, for high-quality reconstruction of indoor scenes. The key idea of NeuRIS is to integrate estimated normal of indoor scenes as a prior in a neural rendering framework for reconstructing large texture-less shapes and, importantly, to do this in an adaptive manner to also enable the reconstruction of irregular shapes with fine details. Specifically, we evaluate the faithfulness of the normal priors on-the-fly by checking the multi-view consistency of reconstruction during the optimization process. Only the normal priors accepted as faithful will be utilized for 3D reconstruction, which typically happens in the regions of smooth shapes possibly with weak texture. However, for those regions with small objects or thin structures, for which the normal priors are usually unreliable, we will only rely on visual features of the input images, since such regions typically contain relatively rich visual features (e.g., shade changes and boundary contours). Extensive experiments show that NeuRIS significantly outperforms the state-of-the-art methods in terms of reconstruction quality. Our project page: https://jiepengwang.github.io/NeuRIS/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atzmon, M., Lipman, Y.: SAL: sign agnostic learning of shapes from raw data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2565–2574 (2020)
Google Scholar
Bae, G., Budvytis, I., Cipolla, R.: Estimating and exploiting the aleatoric uncertainty in surface normal estimation. In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Darmon, F., Bascle, B., Devaux, J.C., Monasse, P., Aubry, M.: Improving neural implicit surfaces geometry with patch warping. arXiv preprint arXiv:2112.09648 (2021)
Do, T., Vuong, K., Roumeliotis, S.I., Park, H.S.: Surface normal estimation of Tilted images via spatial rectifier. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 265–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_16
Chapter Google Scholar
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020)
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. In: 2020 Proceedings of Machine Learning and Systems, pp. 3569–3579 (2020)
Google Scholar
Huang, J., Zhou, Y., Funkhouser, T., Guibas, L.J.: FrameNet: learning local canonical frames of 3D surfaces from a single RGB image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8638–8647 (2019)
Google Scholar
Im, S., Jeon, H.G., Lin, S., Kweon, I.S.: DPSNET: end-to-end deep plane sweep stereo. arXiv preprint arXiv:1905.00538 (2019)
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (ToG) 32(3), 1–13 (2013)
Article MATH Google Scholar
Liu, L., Ceylan, D., Lin, C., Wang, W., Mitra, N.J.: Image-based reconstruction of wire art. ACM Trans. Graph. (ToG) 36(4), 1–11 (2017)
Google Scholar
Liu, L., Chen, N., Ceylan, D., Theobalt, C., Wang, W., Mitra, N.J.: CurveFusion: reconstructing thin structures from RGBD sequences 37(6) (2018)
Google Scholar
Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: NeurIPS (2020)
Google Scholar
Long, X., et al.: Adaptive surface normal constraint for depth estimation. In: ICCV (2021)
Google Scholar
Long, X., Liu, L., Li, W., Theobalt, C., Wang, W.: Multi-view depth estimation using epipolar spatio-temporal network. In: CVPR (2021)
Google Scholar
Long, X., Liu, L., Theobalt, C., Wang, W.: Occlusion-aware depth estimation with adaptive normal constraints. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 640–657. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_37
Chapter Google Scholar
Luo, X., Huang, J., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (ToG) 39(4), 71:1–71:13 (2020)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. arXiv:2201.05989 (2022)
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Chapter Google Scholar
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3504–3515 (2020)
Google Scholar
Oechsle, M., Peng, S., Geiger, A.: UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Google Scholar
Ramasinghe, S., Lucey, S.: Beyond periodicity: towards a unifying framework for activations in coordinate-MLPs. arXiv preprint arXiv:2111.15135 (2021)
Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. arXiv preprint arXiv:2112.03288 (2021)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Shen, S.: Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 22(5), 1901–1914 (2013)
Article MathSciNet MATH Google Scholar
Sitzmann, V., Martel, J.N., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Implicit neural representations with periodic activation functions. arXiv preprint arXiv:2006.09661 (2020)
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: CVPR (2021)
Google Scholar
Tabb, A.: Shape from silhouette probability maps: reconstruction of thin objects in the presence of silhouette extraction and calibration error, pp. 161–168, June 2013. https://doi.org/10.1109/CVPR.2013.28
Teed, Z., Deng, J.: DeepV2D: video to depth with differentiable structure from motion. arXiv preprint arXiv:1812.04605 (2018)
Wang, K., Shen, S.: MVDepthNet: real-time multiview depth estimation neural network. In: International Conference on 3D Vision (3DV) (2018)
Google Scholar
Wang, P., Liu, L., Chen, N., Chu, H.K., Theobalt, C., Wang, W.: Vid2Curve: simultaneous camera motion estimation and thin structure reconstruction from an RGB video. ACM Trans. Graph. 39(4), 1–12 (2020)
Article Google Scholar
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021)
Wang, R., Geraghty, D., Matzen, K., Szeliski, R., Frahm, J.M.: VPLNet: deep single view normal estimation with vanishing points and lines. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 689–698 (2020)
Google Scholar
Wei, Y., Liu, S., Rao, Y., Zhao, W., Lu, J., Zhou, J.: NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo. In: ICCV (2021)
Google Scholar
Xiangli, Y., et al.: CityNeRF: building NeRF at city scale. arXiv preprint arXiv:2112.05504 (2021)
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Yariv, L., et al.: Multiview neural surface reconstruction by disentangling geometry and appearance. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5684–5693 (2019)
Google Scholar
Zhao, W., Liu, S., Wei, Y., Guo, H., Liu, Y.J.: A confidence-based iterative solver of depths and surface normals for deep multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6168–6177 (2021)
Google Scholar
Zheng, E., Dunn, E., Jojic, V., Frahm, J.M.: PatchMatch based joint view selection and depthmap estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1517 (2014)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Google Scholar

Download references

Acknowlegements

We thank Yuan Liu and Nenglun Chen for the help with experiments. Christian Theobalt was supported by ERC Consolidator Grant 770784. Lingjie Liu was supported by Lise Meitner Postdoctoral Fellowship. Computational resources are mainly provided by HKU GPU Farm.

Author information

Authors and Affiliations

The University of Hong Kong, Hong Kong, China
Jiepeng Wang, Peng Wang, Xiaoxiao Long & Taku Komura
Max Planck Institute for Informatics, Saarbrücken, Germany
Christian Theobalt & Lingjie Liu
Texas A &M University, College Station, USA
Wenping Wang

Authors

Jiepeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Long
View author publications
You can also search for this author in PubMed Google Scholar
Christian Theobalt
View author publications
You can also search for this author in PubMed Google Scholar
Taku Komura
View author publications
You can also search for this author in PubMed Google Scholar
Lingjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenping Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenping Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1111 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J. et al. (2022). NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_9
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors