Advertisement

DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)

Abstract

This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors. The proposed DTVNet consists of two submodules: Optical Flow Encoder (OFE) and Dynamic Video Generator (DVG). The OFE maps a sequence of optical flow maps to a normalized motion vector that encodes the motion information inside the generated video. The DVG contains motion and content streams that learn from the motion vector and the single image respectively, as well as an encoder and a decoder to learn shared content features and construct video frames with corresponding motion respectively. Specifically, the motion stream introduces multiple adaptive instance normalization (AdaIN) layers to integrate multi-level motion information that are processed by linear layers. In the testing stage, videos with the same content but various motion information can be generated by different normalized motion vectors based on only one input image. We further conduct experiments on Sky Time-lapse dataset, and the results demonstrate the superiority of our approach over the state-of-the-art methods for generating high-quality and dynamic videos, as well as the variety for generating videos with various motion information (https://github.com/zhangzjn/DTVNet).

Keywords

Generative adversarial network Optical flow encoding Time-Lapse video generation 

Notes

Acknowledgements

We thank anonymous reviewers for their constructive comments. This work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61836015 and the Fundamental Research Funds for the Central Universities (2020XZA205).

Supplementary material

504441_1_En_18_MOESM1_ESM.zip (47 mb)
Supplementary material 1 (zip 48172 KB)

References

  1. 1.
    Aigner, S., Körner, M.: FutureGAN: anticipating the future frames of video sequences using spatio-temporal 3D convolutions in progressively growing GANs. arXiv preprint arXiv:1810.01325 (2018)
  2. 2.
    Cai, H., Bai, C., Tai, Y.W., Tang, C.K.: Deep video generation, prediction and completion of human action sequences. In: ECCV, pp. 366–382 (2018)Google Scholar
  3. 3.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)Google Scholar
  4. 4.
    Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS, pp. 2672–2680 (2014)Google Scholar
  5. 5.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NeurIPS, pp. 5767–5777 (2017)Google Scholar
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  7. 7.
    Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)Google Scholar
  8. 8.
    Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: CVPR, pp. 5754–5763 (2019)Google Scholar
  9. 9.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470 (2017)Google Scholar
  10. 10.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)Google Scholar
  11. 11.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
  12. 12.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  13. 13.
    Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Flow-grounded spatial-temporal video prediction from still images. In: ECCV, pp. 600–615 (2018)Google Scholar
  14. 14.
    Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV, pp. 1744–1752 (2017)Google Scholar
  15. 15.
    Liu, L., et al.: Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: CVPR, pp. 6489–6498 (2020)Google Scholar
  16. 16.
    Liu, P., King, I., Lyu, M.R., Xu, J.: DDFlow: learning optical flow with unlabeled data distillation. In: AAAI, vol. 33, pp. 8770–8777 (2019)Google Scholar
  17. 17.
    Liu, P., Lyu, M., King, I., Xu, J.: SelFlow: self-supervised learning of optical flow. In: CVPR, pp. 4571–4580 (2019)Google Scholar
  18. 18.
    Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
  19. 19.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  20. 20.
    Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., Kim, S.J.: End-to-end time-lapse video synthesis from a single outdoor image. In: CVPR, pp. 1409–1418 (2019)Google Scholar
  21. 21.
    Pan, J., et al.: Video generation from single semantic label map. In: CVPR, pp. 3733–3742 (2019)Google Scholar
  22. 22.
    Ranjan, A., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: CVPR, pp. 12240–12249 (2019)Google Scholar
  23. 23.
    Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV, pp. 2830–2839 (2017)Google Scholar
  24. 24.
    Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: CVPR, pp. 8934–8943 (2018)Google Scholar
  25. 25.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)Google Scholar
  26. 26.
    Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018)Google Scholar
  27. 27.
    Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: ICML, pp. 3560–3569 (2017)Google Scholar
  28. 28.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NeurIPS, pp. 613–621 (2016)Google Scholar
  29. 29.
    Wang, T.C., et al.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)
  30. 30.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  31. 31.
    Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS, pp. 82–90 (2016)Google Scholar
  32. 32.
    Xiong, W., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: CVPR, pp. 2364–2373 (2018)Google Scholar
  33. 33.
    Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: ECCV, pp. 201–216 (2018)Google Scholar
  34. 34.
    Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: CVPR, pp. 12095–12104 (2019)Google Scholar
  35. 35.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.APRIL Lab, College of Control Science and EngineeringZhejiang UniversityHangzhouChina
  2. 2.Huzhou UniversityHuzhouChina

Personalised recommendations