Advertisement

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)

Abstract

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (xyz) and viewing direction \((\theta ,\phi )\)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

Keywords

Scene representation View synthesis Image-based rendering Volume rendering 3D deep learning 

Supplementary material

500725_1_En_24_MOESM1_ESM.pdf (1.8 mb)
Supplementary material 1 (pdf 1880 KB)

References

  1. 1.
    Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: SIGGRAPH (2001)Google Scholar
  2. 2.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv:1512.03012 (2015)
  3. 3.
    Chen, W., et al.: Learning to predict 3D objects with an interpolation-based differentiable renderer. In: NeurIPS (2019)Google Scholar
  4. 4.
    Cohen, M., Gortler, S.J., Szeliski, R., Grzeszczuk, R., Szeliski, R.: The lumigraph. In: SIGGRAPH (1996)Google Scholar
  5. 5.
    Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: SIGGRAPH (1996)Google Scholar
  6. 6.
    Davis, A., Levoy, M., Durand, F.: Unstructured light fields. In: Eurographics (2012)Google Scholar
  7. 7.
    Debevec, P., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: SIGGRAPH (1996)Google Scholar
  8. 8.
    Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: CVPR (2019)Google Scholar
  9. 9.
    Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: CVPR (2018)Google Scholar
  10. 10.
    Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3D shape. In: CVPR (2020)Google Scholar
  11. 11.
    Henzler, P., Mitra, N.J., Ritschel, T.: Learning a neural 3D texture space from 2D exemplars. In: CVPR (2020)Google Scholar
  12. 12.
    Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial X-rays. In: Eurographics (2018)Google Scholar
  13. 13.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989) CrossRefGoogle Scholar
  14. 14.
    Jiang, C., Sud, A., Makadia, A., Huang, J., Nießner, M., Funkhouser, T.: Local implicit grid representations for 3D scenes. In: CVPR (2020)Google Scholar
  15. 15.
    Kajiya, J.T., Herzen, B.P.V.: Ray tracing volume densities. Comput. Graph. (SIGGRAPH) 18(3), 165–174 (1984)CrossRefGoogle Scholar
  16. 16.
    Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: NeurIPS (2017)Google Scholar
  17. 17.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  18. 18.
    Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. 1, 307–314 (2000)zbMATHGoogle Scholar
  19. 19.
    Levoy, M.: Efficient ray tracing of volume data. ACM Trans. Graph. 9(3), 245–261 (1990)CrossRefGoogle Scholar
  20. 20.
    Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH (1996)Google Scholar
  21. 21.
    Li, T.M., Aittala, M., Durand, F., Lehtinen, J.: Differentiable Monte Carlo ray tracing through edge sampling. ACM Trans. Graph. (SIGGRAPH Asia) 37(6), 1–11 (2018)Google Scholar
  22. 22.
    Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: ICCV (2019)Google Scholar
  23. 23.
    Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. (SIGGRAPH) (2019) Google Scholar
  24. 24.
    Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: ECCV (2014)Google Scholar
  25. 25.
    Max, N.: Optical models for direct volume rendering. IEEE Trans. Visual. Comput. Graph. 1(2), 99–108 (1995)CrossRefGoogle Scholar
  26. 26.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)Google Scholar
  27. 27.
    Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (SIGGRAPH) 38(4), 1–14 (2019)CrossRefGoogle Scholar
  28. 28.
    Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision. In: CVPR (2019)Google Scholar
  29. 29.
    Nimier-David, M., Vicini, D., Zeltner, T., Jakob, W.: Mitsuba 2: a retargetable forward and inverse renderer. ACM Trans. Graph. (SIGGRAPH Asia) 38(6), 1–17 (2019)CrossRefGoogle Scholar
  30. 30.
    Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.: Texture fields: learning texture representations in function space. In: ICCV (2019)Google Scholar
  31. 31.
    Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)Google Scholar
  32. 32.
    Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. (SIGGRAPH Asia) 36(6), 1–11 (2017)CrossRefGoogle Scholar
  33. 33.
    Porter, T., Duff, T.: Compositing digital images. Comput. Graph (SIGGRAPH) (1984)Google Scholar
  34. 34.
    Rahaman, N., et al.: On the spectral bias of neural networks. In: ICML (2018)Google Scholar
  35. 35.
    Rainer, G., Ghosh, A., Jakob, W., Weyrich, T.: Unified neural encoding of BTFs. Comput. Graph. Forum (Eurographics) (2020)Google Scholar
  36. 36.
    Rainer, G., Jakob, W., Ghosh, A., Weyrich, T.: Neural BTF compression and interpolation. Comput. Graph. Forum (Eurographics) 38(2), 235–244 (2019)CrossRefGoogle Scholar
  37. 37.
    Ren, P., Wang, J., Gong, M., Lin, S., Tong, X., Guo, B.: Global illumination with radiance regression functions. ACM Trans. Graph. 32(4), 1–12 (2013)CrossRefGoogle Scholar
  38. 38.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)Google Scholar
  39. 39.
    Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. 35, 151–173 (1999).  https://doi.org/10.1023/A:1008176507526
  40. 40.
    Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: CVPR (2019)Google Scholar
  41. 41.
    Sitzmann, V., Zollhoefer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: NeurIPS (2019)Google Scholar
  42. 42.
    Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: CVPR (2019)Google Scholar
  43. 43.
    Stanley, K.O.: Compositional pattern producing networks: a novel abstraction of development. Genet. Program. Evolvable Mach. 8, 131–162 (2007).  https://doi.org/10.1007/s10710-007-9028-8
  44. 44.
    Szeliski, R., Golland, P.: Stereo matching with transparency and matting. In: ICCV (1998)Google Scholar
  45. 45.
    Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)Google Scholar
  46. 46.
    Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)Google Scholar
  47. 47.
    Waechter, M., Moehrle, N., Goesele, M.: Let there be color! large-scale texturing of 3D reconstructions. In: ECCV (2014)Google Scholar
  48. 48.
    Wood, D.N., et al.: Surface light fields for 3D photography. In: SIGGRAPH (2000)Google Scholar
  49. 49.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)Google Scholar
  50. 50.
    Zhong, E.D., Bepler, T., Davis, J.H., Berger, B.: Reconstructing continuous distributions of 3D protein structure from cryo-EM images. In: ICLR (2020)Google Scholar
  51. 51.
    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. (SIGGRAPH) (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.UC BerkeleyBerkeleyUSA
  2. 2.Google ResearchNew YorkUSA
  3. 3.UC San DiegoSan DiegoUSA

Personalised recommendations