Skip to main content

DeepLandscape: Adversarial Modeling of Landscape Videos

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Abstract

We build a new model of landscape videos that can be trained on a mixture of static landscape images as well as landscape animations. Our architecture extends StyleGAN model by augmenting it with parts that allow to model dynamic changes in a scene. Once trained, our model can be used to generate realistic time-lapse landscape videos with moving objects and time-of-the-day changes. Furthermore, by fitting the learned models to a static landscape image, the latter can be reenacted in a realistic way. We propose simple but necessary modifications to StyleGAN inversion procedure, which lead to in-domain latent codes and allow to manipulate real images. Quantitative comparisons and user studies suggest that our model produces more compelling animations of given photographs than previously proposed methods. The results of our approach including comparisons with prior art can be seen in supplementary materials and on the project page https://saic-mdal.github.io/deep-landscape/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    CSAIL-Vision: https://github.com/CSAILVision/semantic-segmentation-pytorch.

  2. 2.

    https://flickr.com.

  3. 3.

    https://github.com/endo-yuki-t/Animating-Landscape.

  4. 4.

    https://github.com/tamarott/SinGAN.

  5. 5.

    https://github.com/ryersonvisionlab/two-stream-dyntex-synth.

References

  1. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  2. Endo, Y., Kanamori, Y., Kuriyama, S.: Animating landscape: self-supervised learning of decoupled motion and appearance for single-image video synthesis. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH Asia 2019) 38(6), 175:1–175:19 (2019)

    Google Scholar 

  3. Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4570–4580 (2019)

    Google Scholar 

  4. Tesfaldet, M., Brubaker, M.A., Derpanis, K.G.: Two-stream convolutional networks for dynamic texture synthesis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6703–6712 (2017)

    Google Scholar 

  5. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)

    Google Scholar 

  6. Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing motion and content for natural video sequence prediction. arXiv preprint arXiv:1706.08033 (2017)

  7. Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. CoRR abs/1511.05440 (2015)

    Google Scholar 

  8. Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)

    Google Scholar 

  9. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  10. Pan, J., et al.: Video generation from single semantic label map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2019)

    Google Scholar 

  11. Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.-H.: Flow-grounded spatial-temporal video prediction from still images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 609–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_37

    Chapter  Google Scholar 

  12. Wang, T.C., et al.: Video-to-video synthesis. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  13. Aigner, S., Körner, M.: FutureGAN: anticipating the future frames of video sequences using spatio-temporal 3D convolutions in progressively growing autoencoder GANs. arXiv preprint arXiv:1810.01325 (2018)

  14. Xiong, W., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  15. Li, Y., Roblek, D., Tagliasacchi, M.: From here to there: video inbetweening using direct 3D convolutions. ArXiv abs/1905.10240 (2019)

    Google Scholar 

  16. Clark, A., Donahue, J., Simonyan, K.: Efficient video generation on complex datasets. ArXiv abs/1907.06571 (2019)

    Google Scholar 

  17. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild (2012)

    Google Scholar 

  18. Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., Zisserman, A.: A short note about kinetics-600. ArXiv abs/1808.01340 (2018)

    Google Scholar 

  19. Chen, B., Wang, W., Wang, J.: Video imagination from a single image with transformation generation. In: ACM Multimedia (2017)

    Google Scholar 

  20. Van Amersfoort, J., Kannan, A., Ranzato, M., Szlam, A., Tran, D., Chintala, S.: Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435 (2017)

  21. Chuang, Y.Y., Goldman, D.B., Zheng, K.C., Curless, B., Salesin, D.H., Szeliski, R.: Animating pictures with stochastic motion textures. In: ACM SIGGRAPH 2005 Papers. SIGGRAPH 2005, pp. 853–860. Association for Computing Machinery (2005)

    Google Scholar 

  22. Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1526–1535 (2017)

    Google Scholar 

  23. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems, pp. 613–621 (2016)

    Google Scholar 

  24. Denton, E.L., et al.: Unsupervised learning of disentangled representations from video. In: Advances in Neural Information Processing Systems, pp. 4414–4423 (2017)

    Google Scholar 

  25. Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36

    Chapter  Google Scholar 

  26. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019)

    Google Scholar 

  27. Bau, D., et al.: Semantic photo manipulation with a generative image prior. ACM Trans. Graph. 38(4), 59:1–59:11 (2019)

    Google Scholar 

  28. Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9459–9468 (2019)

    Google Scholar 

  29. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)

    Google Scholar 

  30. Mescheder, L., Geiger, A., Nowozin, S.: On the convergence properties of GAN training. CoRR abs/1801.04406 (2018)

    Google Scholar 

  31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  32. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization

    Google Scholar 

  33. Zhou, B., et al.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 127, 302–321 (2018)

    Article  Google Scholar 

  34. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  35. Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5

    Chapter  Google Scholar 

  36. Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1838–1847 (2018)

    Google Scholar 

  37. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

  38. Yuan, L., Wen, F., Liu, C., Shum, H.-Y.: Synthesizing dynamic texture with closed-loop linear dynamic system. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3022, pp. 603–616. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24671-8_48

    Chapter  Google Scholar 

  39. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS (2017)

    Google Scholar 

  40. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  41. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  42. Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Towards accurate generative models of video: a new metric & challenges. arXiv preprint arXiv:1812.01717 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elizaveta Logacheva .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 95143 KB)

Supplementary material 1 (pdf 287 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Logacheva, E., Suvorov, R., Khomenko, O., Mashikhin, A., Lempitsky, V. (2020). DeepLandscape: Adversarial Modeling of Landscape Videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58592-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58591-4

  • Online ISBN: 978-3-030-58592-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics