Better Guider Predicts Future Better: Difference Guided Generative Adversarial Networks

Ying, Guohao; Zou, Yingtian; Wan, Lin; Hu, Yiming; Feng, Jiashi

doi:10.1007/978-3-030-20876-9_18

Guohao Ying¹⁸,
Yingtian Zou¹⁹,
Lin Wan¹⁸,
Yiming Hu¹⁸ &
…
Jiashi Feng¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11366))

Included in the following conference series:

Asian Conference on Computer Vision

1936 Accesses
3 Citations

Abstract

Predicting the future is a fantasy but practicality work. It is the key component to intelligent agents, such as self-driving vehicles, medical monitoring devices and robotics. In this work, we consider generating unseen future frames from previous observations, which is notoriously hard due to the uncertainty in frame dynamics. While recent works based on generative adversarial networks (GANs) made remarkable progress, there is still an obstacle for making accurate and realistic predictions. In this paper, we propose a novel GAN based on inter-frame difference to circumvent the difficulties. More specifically, our model is a multi-stage generative network, which is named the Difference Guided Generative Adversarial Network (DGGAN). The DGGAN learns to explicitly enforce future-frame predictions that is guided by synthetic inter-frame difference. Given a sequence of frames, DGGAN first uses dual paths to generate meta information. One path, called Coarse Frame Generator, predicts the coarse details about future frames, and the other path, called Difference Guide Generator, generates the difference image which include complementary fine details. Then our coarse details will then be refined via guidance of difference image under the support of GANs. With this model and novel architecture, we achieve state-of-the-art performance for future video prediction on UCF-101, KITTI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Generative Adversarial Networks for Video Prediction with Action Control

Future Frame Prediction Using Deep Learning

Action-Conditioned Frame Prediction Without Discriminator

References

Barsoum, E., Kender, J., Liu, Z.: HP-GAN: Probabilistic 3D human motion prediction via GAN. arXiv preprint arXiv:1711.09561 (2017)
Bhattacharjee, P., Das, S.: Temporal coherency based criteria for predicting video frames using deep multi-stage generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 4271–4280 (2017)
Google Scholar
Byeon, W., Wang, Q., Srivastava, R.K., Koumoutsakos, P.: Fully context-aware video prediction. arXiv preprint arXiv:1710.08518 (2017)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Google Scholar
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5769–5779 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Google Scholar
Jin, X., et al.: Predicting scene parsing and motion dynamics in the future. In: Advances in Neural Information Processing Systems, pp. 6918–6927 (2017)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. arXiv preprint (2017)
Google Scholar
Liu, Z., Yeh, R., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: International Conference on Computer Vision (ICCV), vol. 2 (2017)
Google Scholar
Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104 (2016)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Ohnishi, K., Yamamoto, S., Ushiku, Y., Harada, T.: Hierarchical video generation from orthogonal information: Optical flow and texture. arXiv preprint arXiv:1711.09618 (2017)
Patraucean, V., Handa, A., Cipolla, R.: Spatio-temporal video autoencoder with differentiable memory. arXiv preprint arXiv:1511.06309 (2015)
Rezazadegan, F., Shirazi, S., Davis, L.S.: A real-time action prediction framework by encoding temporal evolution
Google Scholar
Sedaghat, N.: Next-flow: Hybrid multi-tasking with next-frame prediction to boost optical-flow estimation in the wild. arXiv preprint arXiv:1612.03777 (2016)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. arXiv preprint arXiv:1704.05831 (2017)
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems, pp. 613–621 (2016)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Xiong, W., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2364–2373 (2018)
Google Scholar
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.: Learning to forecast and refine residual motion for image-to-video generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 403–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_24
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Guohao Ying, Lin Wan & Yiming Hu
National University of Singapore, Singapore, Singapore
Yingtian Zou & Jiashi Feng

Authors

Guohao Ying
View author publications
You can also search for this author in PubMed Google Scholar
Yingtian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wan
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jiashi Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Wan .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ying, G., Zou, Y., Wan, L., Hu, Y., Feng, J. (2019). Better Guider Predicts Future Better: Difference Guided Generative Adversarial Networks. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-20876-9_18
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20875-2
Online ISBN: 978-3-030-20876-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Better Guider Predicts Future Better: Difference Guided Generative Adversarial Networks

Abstract

Access this chapter

Similar content being viewed by others

Generative Adversarial Networks for Video Prediction with Action Control

Future Frame Prediction Using Deep Learning

Action-Conditioned Frame Prediction Without Discriminator

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Better Guider Predicts Future Better: Difference Guided Generative Adversarial Networks

Abstract

Access this chapter

Similar content being viewed by others

Generative Adversarial Networks for Video Prediction with Action Control

Future Frame Prediction Using Deep Learning

Action-Conditioned Frame Prediction Without Discriminator

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation