Abstract
Neural rendering has received tremendous attention since the advent of Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on novel-view synthesis considerably. The recent focus has been on models that overfit to a single scene, and the few attempts to learn models that can synthesize novel views of unseen scenes mostly consist of combining deep convolutional features with a NeRF-like model. We propose a different paradigm, where no deep visual features and no NeRF-like volume rendering are needed. Our method is capable of predicting the color of a target ray in a novel scene directly, just from a collection of patches sampled from the scene. We first leverage epipolar geometry to extract patches along the epipolar lines of each reference view. Each patch is linearly projected into a 1D feature vector and a sequence of transformers process the collection. For positional encoding, we parameterize rays as in a light field representation, with the crucial difference that the coordinates are canonicalized with respect to the target ray, which makes our method independent of the reference frame and improves generalization. We show that our approach outperforms the state-of-the-art on novel view synthesis of unseen scenes even when being trained with considerably less data than prior work. Our code is available at https://mohammedsuhail.net/gen_patch_neural_rendering/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aliev, K.-A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_42
Attal, B., Huang, J.B., Zollhöfer, M., Kopf, J., Kim, C.: Learning neural light fields with ray-space embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19819–19829 (2022)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5855–5864 (2021)
Burov, A., Nießner, M., Thies, J.: Dynamic surface function networks for clothed human bodies. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10754–10764 (2021)
Camahort, E., Lerios, A., Fussell, D.: Uniformly sampled light fields. In: Drettakis, G., Max, N. (eds.) EGSR 1998. E, pp. 117–130. Springer, Vienna (1998). https://doi.org/10.1007/978-3-7091-6453-2_11
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: MaskGIT: masked generative image transformer. arXiv preprint arXiv:2202.04200 (2022)
Chen, A., et al.: MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14124–14133 (2021)
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1993, pp. 279–288. Association for Computing Machinery (1993)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5939–5948 (2019)
Chernyavskiy, A., Ilvovsky, D., Nakov, P.: Transformers: “the end of history” for NLP? arXiv preprint arXiv:2105.00813 (2021)
Chibane, J., Bansal, A., Lazova, V., Pons-Moll, G.: Stereo radiance fields (SRF): learning view synthesis for sparse views of novel scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7911–7920 (2021)
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Feng, B.Y., Varshney, A.: SIGNET: efficient neural representation for light fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14224–14233 (2021)
Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2367–2376 (2019)
Genova, K., Cole, F., Vlasic, D., Sarna, A., Freeman, W.T., Funkhouser, T.: Learning shape templates with structured implicit functions. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7154–7164 (2019)
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54 (1996)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of Fourth Alvey Vision Conference, pp. 147–151 (1988)
Hedman, P., Alsisan, S., Szeliski, R., Kopf, J.: Casual 3D photography. ACM Trans. Graph. (TOG) 36(6), 1–15 (2017)
Hedman, P., Kopf, J.: Instant 3D photography. ACM Trans. Graph. (TOG) 37(4), 1–12 (2018)
Hu, R., Ravi, N., Berg, A.C., Pathak, D.: Worldsheet: wrapping the world in a 3D sheet for view synthesis from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12528–12537 (2021)
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 406–413 (2014)
Johari, M.M., Lepoittevin, Y., Fleuret, F.: GeoNeRF: generalizing nerf with geometry priors. arXiv preprint arXiv:2111.13539 (2021)
Kellnhofer, P., Jebe, L.C., Jones, A., Spicer, R., Pulli, K., Wetzstein, G.: Neural lumigraph rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4287–4297 (2021)
Lassner, C., Zollhofer, M.: Pulsar: efficient sphere-based neural rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1440–1449 (2021)
Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42 (1996)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Liu, L., Gu, J., Zaw Lin, K., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: Advances in Neural Information Processing Systems, vol. 33, pp. 15651–15663 (2020)
Liu, S., Zhang, Y., Peng, S., Shi, B., Pollefeys, M., Cui, Z.: DIST: rendering deep implicit signed distance function with differentiable sphere tracing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2019–2028 (2020)
Liu, Y., et al.: Neural rays for occlusion-aware image-based rendering. arXiv preprint arXiv:2107.13421 (2021)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, IJCAI 1981, San Francisco, CA, USA, vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc. (1981)
Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P., Barron, J.T.: NeRF in the dark: high dynamic range view synthesis from noisy raw images. arXiv preprint arXiv:2111.13679 (2021)
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989 (2022)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. CoRR (2022). http://arxiv.org/abs/2201.05989v1
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7588–7597 (2019)
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3504–3515 (2020)
Oechsle, M., Peng, S., Geiger, A.: UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5589–5599 (2021)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 165–174 (2019)
Pfister, H., Zwicker, M., Van Baar, J., Gross, M.: Surfels: surface elements as rendering primitives. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 335–342 (2000)
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: International Conference on Computer Vision (2021)
Google Research: Google scanned objects. https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/GoogleScannedObjects
Riegler, G., Koltun, V.: Free view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 623–640. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_37
Riegler, G., Koltun, V.: Stable view synthesis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12216–12225 (2021)
Rombach, R., Esser, P., Ommer, B.: Geometry-free view synthesis: transformers and no 3D priors. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14356–14366 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rückert, D., Franke, L., Stamminger, M.: ADOP: approximate differentiable one-pixel point rendering. arXiv preprint arXiv:2110.06635 (2021)
Sajjadi, M.S., et al.: Scene representation transformer: geometry-free novel view synthesis through set-latent scene representations. arXiv preprint arXiv:2111.13152 (2021)
Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6959–6968 (2017)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Seitz, S.M., Dyer, C.R.: View morphing. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 21–30. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237170.237196
Shi, J., Tomasi: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600 (1994)
Shum, H., Kang, S.B.: Review of image-based rendering techniques. In: Visual Communications and Image Processing 2000, vol. 4067, pp. 2–13. SPIE (2000)
Shum, H.Y., Chan, S.C., Kang, S.B.: Image-Based Rendering. Springer, New York (2007). https://doi.org/10.1007/978-0-387-32668-9
Sitzmann, V., Rezchikov, S., Freeman, W.T., Tenenbaum, J.B., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2437–2446 (2019)
Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Light field neural rendering. CoRR (2021). http://arxiv.org/abs/2112.09687v1
Takikawa, T., et al.: Neural geometric level of detail: real-time rendering with implicit 3D shapes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11358–11367 (2021)
Tewari, A., et al.: Advances in neural rendering. arXiv preprint arXiv:2111.05849 (2021)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Trevithick, A., Yang, B.: GRF: learning a general radiance field for 3D representation and rendering. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15182–15192 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699 (2021)
Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: SynSin: end-to-end view synthesis from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wizadwongsa, S., Phongthawee, P., Yenphraphai, J., Suwajanakorn, S.: NeX: real-time view synthesis with neural basis expansion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8534–8543 (2021)
Xie, Y., et al.: Neural fields in visual computing and beyond (2021). https://neuralfields.cs.brown.edu/
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Yenamandra, T., et al.: i3DMM: deep implicit 3D morphable model of human heads. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12803–12813 (2021)
Yifan, W., Serena, F., Wu, S., Öztireli, C., Sorkine-Hornung, O.: Differentiable surface splatting for point-based geometry processing. ACM Trans. Graph. (TOG) 38(6), 1–14 (2019)
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. CoRR (2021). http://arxiv.org/abs/2112.05131v1
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: PlenOctrees for real-time rendering of neural radiance fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5752–5761 (2021)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4578–4587 (2021)
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Suhail, M., Esteves, C., Sigal, L., Makadia, A. (2022). Generalizable Patch-Based Neural Rendering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-19824-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)