Advertisement

Crowdsampling the Plenoptic Function

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)

Abstract

Many popular tourist landmarks are captured in a multitude of online, public photos. These photos represent a sparse and unstructured sampling of the plenoptic function for a particular scene. In this paper, we present a new approach to novel view synthesis under time-varying illumination from such data. Our approach builds on the recent multi-plane image (MPI) format for representing local light fields under fixed viewing conditions. We introduce a new DeepMPI representation, motivated by observations on the sparsity structure of the plenoptic function, that allows for real-time synthesis of photorealistic views that are continuous in both space and across changes in lighting. Our method can synthesize the same compelling parallax and view-dependent effects as previous MPI methods, while simultaneously interpolating along changes in reflectance and illumination with time. We show how to learn a model of these effects in an unsupervised way from an unstructured collection of photos without temporal registration, demonstrating significant improvements over recent work in neural rendering. More information can be found at crowdsampling.io.

Notes

Acknowledgements

We thank Kai Zhang, Jin Sun, and Qianqian Wang for helpful discussions. This research was supported in part by the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program.

Supplementary material

500725_1_En_11_MOESM1_ESM.zip (46.2 mb)
Supplementary material 1 (zip 47350 KB)

References

  1. 1.
    Adelson, E.H., Bergen, J.R.: The plenoptic function and the elements of early vision. In: Computational Models of Visual Processing, pp. 3–20. MIT Press (1991)Google Scholar
  2. 2.
    Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 425–432 (2001)Google Scholar
  3. 3.
    Chai, J.X., Tong, X., Chan, S.C., Shum, H.Y.: Plenoptic sampling. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2000, pp. 307–318. ACM Press/Addison-Wesley Publishing Co., USA (2000).  https://doi.org/10.1145/344779.344932
  4. 4.
    Chaurasia, G., Duchene, S., Sorkine-Hornung, O., Drettakis, G.: Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. 32(3), 1–12 (2013)CrossRefGoogle Scholar
  5. 5.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  6. 6.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1511–1520 (2017)Google Scholar
  7. 7.
    Chen, Z., et al.: A neural rendering framework for free-viewpoint relighting. arXiv preprint arXiv:1911.11530 (2019)
  8. 8.
    Choi, I., Gallo, O., Troccoli, A., Kim, M.H., Kautz, J.: Extreme view synthesis. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 7781–7790 (2019)Google Scholar
  9. 9.
    Davis, A., Levoy, M., Durand, F.: Unstructured light fields. Comput. Graph. Forum 31, 305–314 (2012)CrossRefGoogle Scholar
  10. 10.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 11–20 (1996)Google Scholar
  11. 11.
    Eslami, S.A., et al.: Neural scene representation and rendering. Science 360(6394), 1204–1210 (2018)CrossRefGoogle Scholar
  12. 12.
    Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 2367–2376 (2019)Google Scholar
  13. 13.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 5515–5524 (2016)Google Scholar
  14. 14.
    Garg, R., Du, H., Seitz, S.M., Snavely, N.: The dimensionality of scene appearance. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1917–1924. IEEE (2009)Google Scholar
  15. 15.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proc. Computer Vision and Pattern Recognition (CVPR). pp. 2414–2423 (2016)Google Scholar
  16. 16.
    Goodfellow, I., et al.: Generative adversarial nets. In: Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  17. 17.
    Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54 (1996)Google Scholar
  18. 18.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Neural Information Processing Systems, pp. 5767–5777 (2017)Google Scholar
  19. 19.
    Hauagge, D.C., Wehrwein, S., Upchurch, P., Bala, K., Snavely, N.: Reasoning about photo collections using models of outdoor illumination. In: Proceedings of the British Machine Vision Conference (BMVC) (2014)Google Scholar
  20. 20.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)Google Scholar
  21. 21.
    Hedman, P., Alsisan, S., Szeliski, R., Kopf, J.: Casual 3D photography. ACM Trans. Graph. 36, 234:1–234:15 (2017)CrossRefGoogle Scholar
  22. 22.
    Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. 37(6), 1–15 (2018)CrossRefGoogle Scholar
  23. 23.
    Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1501–1510 (2017)Google Scholar
  24. 24.
    Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_11CrossRefGoogle Scholar
  25. 25.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134 (2017)Google Scholar
  26. 26.
    Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)CrossRefGoogle Scholar
  27. 27.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
  28. 28.
    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4401–4410 (2019)Google Scholar
  29. 29.
    Laffont, P.Y., Bousseau, A., Paris, S., Durand, F., Drettakis, G.: Coherent intrinsic images from photo collections. ACM Trans. Graph. 31, 202:1–202:11 (2012)CrossRefGoogle Scholar
  30. 30.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)Google Scholar
  31. 31.
    Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_3CrossRefGoogle Scholar
  32. 32.
    Levin, A., Durand, F.: Linear view synthesis using a dimensionality gap light field prior. In: Proceedings Computer Vision and Pattern Recognition (CVPR), pp. 1831–1838 (2010)Google Scholar
  33. 33.
    Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42 (1996)Google Scholar
  34. 34.
    Li, Z., et al.: Learning the depths of moving people by watching Frozen people. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4521–4530 (2019)Google Scholar
  35. 35.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 2041–2050 (2018)Google Scholar
  36. 36.
    Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. 38(4), 65 (2019)CrossRefGoogle Scholar
  37. 37.
    Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2794–2802 (2017)Google Scholar
  38. 38.
    Martin-Brualla, R., Gallup, D., Seitz, S.M.: 3D time-lapse reconstruction from internet photos. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1332–1340 (2015)Google Scholar
  39. 39.
    Martin-Brualla, R., Gallup, D., Seitz, S.M.: Time-lapse mining from internet photos. ACM Trans. Graph. 34(4), 1–8 (2015)CrossRefGoogle Scholar
  40. 40.
    Matzen, K., Snavely, N.: Scene chronology. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 615–630. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_40CrossRefGoogle Scholar
  41. 41.
    Meshry, M., et al.: Neural rerendering in the wild. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 6871–6880 (2019)Google Scholar
  42. 42.
    Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. 38(4), 1–14 (2019)CrossRefGoogle Scholar
  43. 43.
    Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 2337–2346 (2019)Google Scholar
  44. 44.
    Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 1–11 (2017)CrossRefGoogle Scholar
  45. 45.
    Philip, J., Gharbi, M., Zhou, T., Efros, A.A., Drettakis, G.: Multi-view relighting using a geometry-aware network. ACM Trans. Graph. 38(4), 1–14 (2019)CrossRefGoogle Scholar
  46. 46.
    Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 5400–5409 (2017)Google Scholar
  47. 47.
    Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)Google Scholar
  48. 48.
    Shan, Q., Adams, R., Curless, B., Furukawa, Y., Seitz, S.M.: The visual turing test for scene reconstruction. In: International Conference on 3D Vision (3DV), pp. 25–32 (2013)Google Scholar
  49. 49.
    Sheng, L., Lin, Z., Shao, J., Wang, X.: Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2018)Google Scholar
  50. 50.
    Shi, L., Hassanieh, H., Davis, A., Katabi, D., Durand, F.: Light field reconstruction using sparsity in the continuous Fourier domain. ACM Trans. Graph. 34, 12:1–12:13 (2014)CrossRefGoogle Scholar
  51. 51.
    Shi, L., Hassanieh, H., Davis, A., Katabi, D., Durand, F.: Light field reconstruction using sparsity in the continuous Fourier domain. ACM Trans. Graph. 34(1) (2015).  https://doi.org/10.1145/2682631
  52. 52.
    Simon, I., Snavely, N., Seitz, S.M.: Scene summarization for online image collections. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1–8. IEEE (2007)Google Scholar
  53. 53.
    Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 2437–2446 (2019)Google Scholar
  54. 54.
    Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: Neural Information Processing Systems, pp. 1119–1130 (2019)Google Scholar
  55. 55.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (SIGGRAPH) (2006) Google Scholar
  56. 56.
    Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 175–184 (2019)Google Scholar
  57. 57.
    Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4D RGBD light field from a single image. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2243–2251 (2017)Google Scholar
  58. 58.
    Szeliski, R., Golland, P.: Stereo matching with transparency and matting. Int. J. Comput. Vis. 32, 45–61 (1998) CrossRefGoogle Scholar
  59. 59.
    Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. 38(4), 1–12 (2019)CrossRefGoogle Scholar
  60. 60.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 6924–6932 (2017)Google Scholar
  61. 61.
    Vagharshakyan, S., Bregovic, R., Gotchev, A.P.: Light field reconstruction using shearlet transform. Trans. Pattern Anal. Mach. Intell. 40, 133–147 (2015)CrossRefGoogle Scholar
  62. 62.
    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 8798–8807 (2018)Google Scholar
  63. 63.
    Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_20CrossRefGoogle Scholar
  64. 64.
    Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 8456–8465 (2018)Google Scholar
  65. 65.
    Xu, Z., Bi, S., Sunkavalli, K., Hadap, S., Su, H., Ramamoorthi, R.: Deep view synthesis from sparse photometric images. ACM Trans. Graph. 38(4) (2019) Google Scholar
  66. 66.
    Yu, Y., Smith, W.A.: InverseRenderNet: learning single image inverse rendering. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 3155–3164 (2019)Google Scholar
  67. 67.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)Google Scholar
  68. 68.
    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37, 1–12 (2018)Google Scholar
  69. 69.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
  70. 70.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)Google Scholar
  71. 71.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Neural Information Processing Systems, pp. 465–476 (2017)Google Scholar
  72. 72.
    Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S.A.J., Szeliski, R.: High-quality video view interpolation using a layered representation. In: SIGGRAPH 2004 (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Cornell TechCornell UniversityIthacaUSA

Personalised recommendations