Learning to Factorize and Relight a City

Liu, Andrew; Ginosar, Shiry; Zhou, Tinghui; Efros, Alexei A.; Snavely, Noah

doi:10.1007/978-3-030-58548-8_32

Andrew Liu¹²,
Shiry Ginosar¹³,
Tinghui Zhou¹⁴,
Alexei A. Efros¹³ &
…
Noah Snavely¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12349))

Included in the following conference series:

European Conference on Computer Vision

4904 Accesses
15 Citations

Abstract

We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. Inspired by the classic intrinsic image decomposition, our learning signal builds upon two insights: 1) combining the disentangled factors should reconstruct the original image, and 2) the permanent factors should stay constant across multiple temporal samples of the same scene. To facilitate training, we assemble a city-scale dataset of outdoor timelapse imagery from Google Street View, where the same locations are captured repeatedly through time. This data represents an unprecedented scale of spatio-temporal outdoor imagery. We show that our learned disentangled factors can be used to manipulate novel images in realistic ways, such as changing lighting effects and scene geometry. Please visit http://factorize-a-city.github.io/ for animated results.

“The city of Sophronia is made up of two half-cities... One

of the half-cities is permanent, the other is temporary.”

—Italo Calvino, Invisible Cities

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adelson, E.H., Pentland, A.P.: The perception of shading and reflectance. In: Knill, D.C., Richards, W. (eds.) Perception as Bayesian Inference, pp. 409–423. Cambridge University Press, New York (1996)
Chapter Google Scholar
Adelson, E.H., Bergen, J.R.: The plenoptic function and the elements of early vision. In: Landy, M., Movshon, J.A. (eds.) Computational Models of Visual Processing, pp. 3–20. MIT Press, Cambridge (1991)
Google Scholar
Arietta, S.M., Efros, A.A., Ramamoorthi, R., Agrawala, M.: City forensics: using visual elements to predict non-visual city attributes. IEEE Trans. Visual Comput. Graphics 20(12), 2624–2633 (2014). https://doi.org/10.1109/TVCG.2014.2346446
Article Google Scholar
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)
Article Google Scholar
Barrow, H.G., Tenenbaum, J.M.: Recovering intrinsic scene characteristics from images. Comput. Vis. Syst. 2(3–26), 2 (1978)
Google Scholar
Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM Trans. Graphics (SIGGRAPH) 33(4), 159:1–159:12 (2014). https://doi.org/10.1145/2601097.2601206
Article Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
Google Scholar
Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: Proceedings of the International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes Paris look like Paris? ACM Trans. Graphics (SIGGRAPH) 31(4), 101:1–101:9 (2012)
Article Google Scholar
Gebru, T., et al.: Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the United States. Proc. Natl. Acad. Sci. 114(50), 13108–13113 (2017). https://doi.org/10.1073/pnas.1700035114
Article Google Scholar
Gronat, P., Obozinski, G., Sivic, J., Pajdla, T.: Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013
Google Scholar
Hold-Geoffroy, Y., Sunkavalli, K., Hadap, S., Gambaretto, E., Lalonde, J.F.: Deep outdoor illumination estimation. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Huang, G.B., Jain, V., Learned-Miller, E.: Unsupervised joint alignment of complex images. In: Proceedings of the International Conference on Computer Vision (ICCV) (2007)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Jacobs, N., Roman, N., Pless, R.: Consistent temporal variations in many outdoor scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 1–6, June 2007. https://doi.org/10.1109/CVPR.2007.383258
Janner, M., Wu, J., Kulkarni, T.D., Yildirim, I., Tenenbaum, J.: Self-supervised intrinsic image decomposition. In: Neural Information Processing Systems, pp. 5936–5946. Curran Associates, Inc. (2017)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Chapter Google Scholar
Laffont, P.Y., Bazin, J.C.: Intrinsic decomposition of image sequences from local temporal variations. In: Proceedings of the International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Graphics (SIGGRAPH) 33(4), 1–11 (2014)
Article Google Scholar
Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Webcam clip art: appearance and illuminant transfer from time-lapse sequences. ACM Trans. Graphics (SIGGRAPH) 28(5), 1–10 (2009)
Article Google Scholar
Lee, S., Maisonneuve, N., Crandall, D., Efros, A.A., Sivic, J.: Linking past to present: discovering style in two centuries of architecture. In: IEEE International Conference on Computational Photography (ICCP) (2015)
Google Scholar
Li, Z., Snavely, N.: CGIntrinsics: better intrinsic image decomposition through physically-based rendering. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 381–399. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_23
Chapter Google Scholar
Li, Z., Snavely, N.: Learning intrinsic image decomposition from watching the world. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Li, Z., Shafiei, M., Ramamoorthi, R., Sunkavalli, K., Chandraker, M.: Inverse rendering for complex indoor scenes: shape, spatially-varying lighting and SVBRDF from a single image. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Martin-Brualla, R., Gallup, D., Seitz, S.M.: Time-lapse mining from internet photos. ACM Trans. Graphics (SIGGRAPH) 34(4), 62:1–62:8 (2015). https://doi.org/10.1145/2766903
Article Google Scholar
Meshry, M., et al.: Neural rerendering in the wild. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Naik, N., Philipoom, J., Raskar, R., Hidalgo, C.: Streetscore - predicting the perceived safety of one million streetscapes. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 793–799, June 2014. https://doi.org/10.1109/CVPRW.2014.121
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Philip, J., Gharbi, M., Zhou, T., Efros, A.A., Drettakis, G.: Multi-view relighting using a geometry-aware network. ACM Trans. Graphics (SIGGRAPH) 38(4) (2019). http://www-sop.inria.fr/reves/Basilic/2019/PGZED19
Rubinstein, M., Liu, C., Sand, P., Durand, F., Freeman, W.T.: Motion denoising with application to time-lapse photography. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 313–320, June 2011
Google Scholar
Sengupta, S., Gu, J., Kim, K., Liu, G., Jacobs, D.W., Kautz, J.: Neural inverse rendering of an indoor scene from a single image. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Sengupta, S., Kanazawa, A., Castillo, C.D., Jacobs, D.W.: SfSNet: learning shape, reflectance and illuminance of faces in the wild. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Sunkavalli, K., Matusik, W., Pfister, H., Rusinkiewicz, S.: Factored time-lapse video. ACM Trans. Graphics (SIGGRAPH) (2007). SIGGRAPH 2007. ACM, New York. https://doi.org/10.1145/1275808.1276504
Vo, N.N., Hays, J.: Localizing and orienting street views using overhead imagery. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 494–509. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_30
Chapter Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Weiss, Y.: Deriving intrinsic images from image sequences. In: Proceedings of the International Conference on Computer Vision (ICCV) (2001)
Google Scholar
Yu, Y., Smith, W.A.: InverseRenderNet: learning single image inverse rendering. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Zhou, T., Krähenbähl, P., Efros, A.A.: Learning data-driven reflectance priors for intrinsic image decomposition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 262–277. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_16
Chapter Google Scholar
Zhu, J.Y., et al.: Visual object networks: image generation with disentangled 3D representations. In: Neural Information Processing Systems (2018)
Google Scholar

Download references

Acknowledgements

We would like to thank Richard Tucker, Richard Bowen, Ameesh Makadia, and Vincent Sitzmann for insightful discussions. We would also like to thank Angjoo Kanazawa and Tim Brooks for their help with preparing the manuscript. This work is supported, in part, by NSF grant IIS-1633310.

Author information

Authors and Affiliations

Google, Berkeley, USA
Andrew Liu & Noah Snavely
UC Berkeley, Berkeley, USA
Shiry Ginosar & Alexei A. Efros
Humen, Inc., San Francisco, USA
Tinghui Zhou

Authors

Andrew Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shiry Ginosar
View author publications
You can also search for this author in PubMed Google Scholar
Tinghui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Alexei A. Efros
View author publications
You can also search for this author in PubMed Google Scholar
Noah Snavely
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Andrew Liu , Shiry Ginosar , Alexei A. Efros or Noah Snavely .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 81817 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, A., Ginosar, S., Zhou, T., Efros, A.A., Snavely, N. (2020). Learning to Factorize and Relight a City. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-58548-8_32
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics