Skip to main content

DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

Abstract

This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors. The proposed DTVNet consists of two submodules: Optical Flow Encoder (OFE) and Dynamic Video Generator (DVG). The OFE maps a sequence of optical flow maps to a normalized motion vector that encodes the motion information inside the generated video. The DVG contains motion and content streams that learn from the motion vector and the single image respectively, as well as an encoder and a decoder to learn shared content features and construct video frames with corresponding motion respectively. Specifically, the motion stream introduces multiple adaptive instance normalization (AdaIN) layers to integrate multi-level motion information that are processed by linear layers. In the testing stage, videos with the same content but various motion information can be generated by different normalized motion vectors based on only one input image. We further conduct experiments on Sky Time-lapse dataset, and the results demonstrate the superiority of our approach over the state-of-the-art methods for generating high-quality and dynamic videos, as well as the variety for generating videos with various motion information (https://github.com/zhangzjn/DTVNet).

J. Zhang and C. Xu—Indicates equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aigner, S., Körner, M.: FutureGAN: anticipating the future frames of video sequences using spatio-temporal 3D convolutions in progressively growing GANs. arXiv preprint arXiv:1810.01325 (2018)

  2. Cai, H., Bai, C., Tai, Y.W., Tang, C.K.: Deep video generation, prediction and completion of human action sequences. In: ECCV, pp. 366–382 (2018)

    Google Scholar 

  3. Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)

    Google Scholar 

  4. Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS, pp. 2672–2680 (2014)

    Google Scholar 

  5. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NeurIPS, pp. 5767–5777 (2017)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)

    Google Scholar 

  8. Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: CVPR, pp. 5754–5763 (2019)

    Google Scholar 

  9. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470 (2017)

    Google Scholar 

  10. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

    Google Scholar 

  11. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

  12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  13. Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Flow-grounded spatial-temporal video prediction from still images. In: ECCV, pp. 600–615 (2018)

    Google Scholar 

  14. Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV, pp. 1744–1752 (2017)

    Google Scholar 

  15. Liu, L., et al.: Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: CVPR, pp. 6489–6498 (2020)

    Google Scholar 

  16. Liu, P., King, I., Lyu, M.R., Xu, J.: DDFlow: learning optical flow with unlabeled data distillation. In: AAAI, vol. 33, pp. 8770–8777 (2019)

    Google Scholar 

  17. Liu, P., Lyu, M., King, I., Xu, J.: SelFlow: self-supervised learning of optical flow. In: CVPR, pp. 4571–4580 (2019)

    Google Scholar 

  18. Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)

  19. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  20. Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., Kim, S.J.: End-to-end time-lapse video synthesis from a single outdoor image. In: CVPR, pp. 1409–1418 (2019)

    Google Scholar 

  21. Pan, J., et al.: Video generation from single semantic label map. In: CVPR, pp. 3733–3742 (2019)

    Google Scholar 

  22. Ranjan, A., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: CVPR, pp. 12240–12249 (2019)

    Google Scholar 

  23. Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV, pp. 2830–2839 (2017)

    Google Scholar 

  24. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: CVPR, pp. 8934–8943 (2018)

    Google Scholar 

  25. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)

    Google Scholar 

  26. Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018)

    Google Scholar 

  27. Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: ICML, pp. 3560–3569 (2017)

    Google Scholar 

  28. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NeurIPS, pp. 613–621 (2016)

    Google Scholar 

  29. Wang, T.C., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)

  30. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  31. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS, pp. 82–90 (2016)

    Google Scholar 

  32. Xiong, W., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: CVPR, pp. 2364–2373 (2018)

    Google Scholar 

  33. Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: ECCV, pp. 201–216 (2018)

    Google Scholar 

  34. Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: CVPR, pp. 12095–12104 (2019)

    Google Scholar 

  35. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgements

We thank anonymous reviewers for their constructive comments. This work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61836015 and the Fundamental Research Funds for the Central Universities (2020XZA205).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 48172 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J. et al. (2020). DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58558-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58557-0

  • Online ISBN: 978-3-030-58558-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics