DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image

Zhang, Jiangning; Xu, Chao; Liu, Liang; Wang, Mengmeng; Wu, Xia; Liu, Yong; Jiang, Yunliang

doi:10.1007/978-3-030-58558-7_18

Jiangning Zhang¹²,
Chao Xu¹²,
Liang Liu¹²,
Mengmeng Wang¹²,
Xia Wu¹²,
Yong Liu¹² &
…
Yunliang Jiang¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

European Conference on Computer Vision

6350 Accesses
13 Citations

Abstract

This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors. The proposed DTVNet consists of two submodules: Optical Flow Encoder (OFE) and Dynamic Video Generator (DVG). The OFE maps a sequence of optical flow maps to a normalized motion vector that encodes the motion information inside the generated video. The DVG contains motion and content streams that learn from the motion vector and the single image respectively, as well as an encoder and a decoder to learn shared content features and construct video frames with corresponding motion respectively. Specifically, the motion stream introduces multiple adaptive instance normalization (AdaIN) layers to integrate multi-level motion information that are processed by linear layers. In the testing stage, videos with the same content but various motion information can be generated by different normalized motion vectors based on only one input image. We further conduct experiments on Sky Time-lapse dataset, and the results demonstrate the superiority of our approach over the state-of-the-art methods for generating high-quality and dynamic videos, as well as the variety for generating videos with various motion information (https://github.com/zhangzjn/DTVNet).

J. Zhang and C. Xu—Indicates equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aigner, S., Körner, M.: FutureGAN: anticipating the future frames of video sequences using spatio-temporal 3D convolutions in progressively growing GANs. arXiv preprint arXiv:1810.01325 (2018)
Cai, H., Bai, C., Tai, Y.W., Tang, C.K.: Deep video generation, prediction and completion of human action sequences. In: ECCV, pp. 366–382 (2018)
Google Scholar
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS, pp. 2672–2680 (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NeurIPS, pp. 5767–5777 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)
Google Scholar
Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: CVPR, pp. 5754–5763 (2019)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470 (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Flow-grounded spatial-temporal video prediction from still images. In: ECCV, pp. 600–615 (2018)
Google Scholar
Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV, pp. 1744–1752 (2017)
Google Scholar
Liu, L., et al.: Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: CVPR, pp. 6489–6498 (2020)
Google Scholar
Liu, P., King, I., Lyu, M.R., Xu, J.: DDFlow: learning optical flow with unlabeled data distillation. In: AAAI, vol. 33, pp. 8770–8777 (2019)
Google Scholar
Liu, P., Lyu, M., King, I., Xu, J.: SelFlow: self-supervised learning of optical flow. In: CVPR, pp. 4571–4580 (2019)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., Kim, S.J.: End-to-end time-lapse video synthesis from a single outdoor image. In: CVPR, pp. 1409–1418 (2019)
Google Scholar
Pan, J., et al.: Video generation from single semantic label map. In: CVPR, pp. 3733–3742 (2019)
Google Scholar
Ranjan, A., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: CVPR, pp. 12240–12249 (2019)
Google Scholar
Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV, pp. 2830–2839 (2017)
Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: CVPR, pp. 8934–8943 (2018)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Google Scholar
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018)
Google Scholar
Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: ICML, pp. 3560–3569 (2017)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NeurIPS, pp. 613–621 (2016)
Google Scholar
Wang, T.C., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS, pp. 82–90 (2016)
Google Scholar
Xiong, W., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: CVPR, pp. 2364–2373 (2018)
Google Scholar
Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: ECCV, pp. 201–216 (2018)
Google Scholar
Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: CVPR, pp. 12095–12104 (2019)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgements

We thank anonymous reviewers for their constructive comments. This work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61836015 and the Fundamental Research Funds for the Central Universities (2020XZA205).

Author information

Authors and Affiliations

APRIL Lab, College of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang, China
Jiangning Zhang, Chao Xu, Liang Liu, Mengmeng Wang, Xia Wu & Yong Liu
Huzhou University, Huzhou, Zhejiang, China
Yunliang Jiang

Authors

Jiangning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mengmeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xia Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yunliang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 48172 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2020). DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-58558-7_18
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58557-0
Online ISBN: 978-3-030-58558-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics