Generalizable Patch-Based Neural Rendering

Suhail, Mohammed; Esteves, Carlos; Sigal, Leonid; Makadia, Ameesh

doi:10.1007/978-3-031-19824-3_10

Mohammed Suhail¹²,
Carlos Esteves¹⁵,
Leonid Sigal^12,13,14 &
…
Ameesh Makadia¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

2812 Accesses
12 Citations

Abstract

Neural rendering has received tremendous attention since the advent of Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on novel-view synthesis considerably. The recent focus has been on models that overfit to a single scene, and the few attempts to learn models that can synthesize novel views of unseen scenes mostly consist of combining deep convolutional features with a NeRF-like model. We propose a different paradigm, where no deep visual features and no NeRF-like volume rendering are needed. Our method is capable of predicting the color of a target ray in a novel scene directly, just from a collection of patches sampled from the scene. We first leverage epipolar geometry to extract patches along the epipolar lines of each reference view. Each patch is linearly projected into a 1D feature vector and a sequence of transformers process the collection. For positional encoding, we parameterize rays as in a light field representation, with the crucial difference that the coordinates are canonicalized with respect to the target ray, which makes our method independent of the reference frame and improves generalization. We show that our approach outperforms the state-of-the-art on novel view synthesis of unseen scenes even when being trained with considerably less data than prior work. Our code is available at https://mohammedsuhail.net/gen_patch_neural_rendering/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aliev, K.-A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_42
Chapter Google Scholar
Attal, B., Huang, J.B., Zollhöfer, M., Kopf, J., Kim, C.: Learning neural light fields with ray-space embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19819–19829 (2022)
Google Scholar
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5855–5864 (2021)
Google Scholar
Burov, A., Nießner, M., Thies, J.: Dynamic surface function networks for clothed human bodies. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10754–10764 (2021)
Google Scholar
Camahort, E., Lerios, A., Fussell, D.: Uniformly sampled light fields. In: Drettakis, G., Max, N. (eds.) EGSR 1998. E, pp. 117–130. Springer, Vienna (1998). https://doi.org/10.1007/978-3-7091-6453-2_11
Chapter Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: MaskGIT: masked generative image transformer. arXiv preprint arXiv:2202.04200 (2022)
Chen, A., et al.: MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14124–14133 (2021)
Google Scholar
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1993, pp. 279–288. Association for Computing Machinery (1993)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5939–5948 (2019)
Google Scholar
Chernyavskiy, A., Ilvovsky, D., Nakov, P.: Transformers: “the end of history” for NLP? arXiv preprint arXiv:2105.00813 (2021)
Chibane, J., Bansal, A., Lazova, V., Pons-Moll, G.: Stereo radiance fields (SRF): learning view synthesis for sparse views of novel scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7911–7920 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Feng, B.Y., Varshney, A.: SIGNET: efficient neural representation for light fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14224–14233 (2021)
Google Scholar
Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2367–2376 (2019)
Google Scholar
Genova, K., Cole, F., Vlasic, D., Sarna, A., Freeman, W.T., Funkhouser, T.: Learning shape templates with structured implicit functions. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7154–7164 (2019)
Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54 (1996)
Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of Fourth Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar
Hedman, P., Alsisan, S., Szeliski, R., Kopf, J.: Casual 3D photography. ACM Trans. Graph. (TOG) 36(6), 1–15 (2017)
Article Google Scholar
Hedman, P., Kopf, J.: Instant 3D photography. ACM Trans. Graph. (TOG) 37(4), 1–12 (2018)
Article Google Scholar
Hu, R., Ravi, N., Berg, A.C., Pathak, D.: Worldsheet: wrapping the world in a 3D sheet for view synthesis from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12528–12537 (2021)
Google Scholar
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 406–413 (2014)
Google Scholar
Johari, M.M., Lepoittevin, Y., Fleuret, F.: GeoNeRF: generalizing nerf with geometry priors. arXiv preprint arXiv:2111.13539 (2021)
Kellnhofer, P., Jebe, L.C., Jones, A., Spicer, R., Pulli, K., Wetzstein, G.: Neural lumigraph rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4287–4297 (2021)
Google Scholar
Lassner, C., Zollhofer, M.: Pulsar: efficient sphere-based neural rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1440–1449 (2021)
Google Scholar
Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42 (1996)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Google Scholar
Liu, L., Gu, J., Zaw Lin, K., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: Advances in Neural Information Processing Systems, vol. 33, pp. 15651–15663 (2020)
Google Scholar
Liu, S., Zhang, Y., Peng, S., Shi, B., Pollefeys, M., Cui, Z.: DIST: rendering deep implicit signed distance function with differentiable sphere tracing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2019–2028 (2020)
Google Scholar
Liu, Y., et al.: Neural rays for occlusion-aware image-based rendering. arXiv preprint arXiv:2107.13421 (2021)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, IJCAI 1981, San Francisco, CA, USA, vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc. (1981)
Google Scholar
Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P., Barron, J.T.: NeRF in the dark: high dynamic range view synthesis from noisy raw images. arXiv preprint arXiv:2111.13679 (2021)
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989 (2022)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. CoRR (2022). http://arxiv.org/abs/2201.05989v1
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7588–7597 (2019)
Google Scholar
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3504–3515 (2020)
Google Scholar
Oechsle, M., Peng, S., Geiger, A.: UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5589–5599 (2021)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 165–174 (2019)
Google Scholar
Pfister, H., Zwicker, M., Van Baar, J., Gross, M.: Surfels: surface elements as rendering primitives. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 335–342 (2000)
Google Scholar
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: International Conference on Computer Vision (2021)
Google Scholar
Google Research: Google scanned objects. https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/GoogleScannedObjects
Riegler, G., Koltun, V.: Free view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 623–640. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_37
Chapter Google Scholar
Riegler, G., Koltun, V.: Stable view synthesis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12216–12225 (2021)
Google Scholar
Rombach, R., Esser, P., Ommer, B.: Geometry-free view synthesis: transformers and no 3D priors. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14356–14366 (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Rückert, D., Franke, L., Stamminger, M.: ADOP: approximate differentiable one-pixel point rendering. arXiv preprint arXiv:2110.06635 (2021)
Sajjadi, M.S., et al.: Scene representation transformer: geometry-free novel view synthesis through set-latent scene representations. arXiv preprint arXiv:2111.13152 (2021)
Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6959–6968 (2017)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Seitz, S.M., Dyer, C.R.: View morphing. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 21–30. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237170.237196
Shi, J., Tomasi: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600 (1994)
Google Scholar
Shum, H., Kang, S.B.: Review of image-based rendering techniques. In: Visual Communications and Image Processing 2000, vol. 4067, pp. 2–13. SPIE (2000)
Google Scholar
Shum, H.Y., Chan, S.C., Kang, S.B.: Image-Based Rendering. Springer, New York (2007). https://doi.org/10.1007/978-0-387-32668-9
Book MATH Google Scholar
Sitzmann, V., Rezchikov, S., Freeman, W.T., Tenenbaum, J.B., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2437–2446 (2019)
Google Scholar
Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Light field neural rendering. CoRR (2021). http://arxiv.org/abs/2112.09687v1
Takikawa, T., et al.: Neural geometric level of detail: real-time rendering with implicit 3D shapes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11358–11367 (2021)
Google Scholar
Tewari, A., et al.: Advances in neural rendering. arXiv preprint arXiv:2111.05849 (2021)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Article Google Scholar
Trevithick, A., Yang, B.: GRF: learning a general radiance field for 3D representation and rendering. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15182–15192 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699 (2021)
Google Scholar
Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: SynSin: end-to-end view synthesis from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Wizadwongsa, S., Phongthawee, P., Yenphraphai, J., Suwajanakorn, S.: NeX: real-time view synthesis with neural basis expansion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8534–8543 (2021)
Google Scholar
Xie, Y., et al.: Neural fields in visual computing and beyond (2021). https://neuralfields.cs.brown.edu/
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Yenamandra, T., et al.: i3DMM: deep implicit 3D morphable model of human heads. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12803–12813 (2021)
Google Scholar
Yifan, W., Serena, F., Wu, S., Öztireli, C., Sorkine-Hornung, O.: Differentiable surface splatting for point-based geometry processing. ACM Trans. Graph. (TOG) 38(6), 1–14 (2019)
Article Google Scholar
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. CoRR (2021). http://arxiv.org/abs/2112.05131v1
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: PlenOctrees for real-time rendering of neural radiance fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5752–5761 (2021)
Google Scholar
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4578–4587 (2021)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018)

Download references

Author information

Authors and Affiliations

University of British Columbia, Vancouver, Canada
Mohammed Suhail & Leonid Sigal
Vector Institute for AI, Toronto, Canada
Leonid Sigal
Canada CIFAR AI Chair, Vancouver, Canada
Leonid Sigal
Google, Vancouver, Canada
Carlos Esteves & Ameesh Makadia

Authors

Mohammed Suhail
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Esteves
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Sigal
View author publications
You can also search for this author in PubMed Google Scholar
Ameesh Makadia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Suhail .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8139 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suhail, M., Esteves, C., Sigal, L., Makadia, A. (2022). Generalizable Patch-Based Neural Rendering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_10
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics