Action-Conditioned Frame Prediction Without Discriminator

Valencia, David; Williams, Henry; MacDonald, Bruce; Qiao, Ting

doi:10.1007/978-3-030-95467-3_24

David Valencia¹⁶,
Henry Williams¹⁶,
Bruce MacDonald¹⁶ &
…
Ting Qiao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13163))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

Abstract

Predicting high-quality images that depend on past images and external events is a challenge in computer vision. Prior proposals have tried to solve this problem; however, their architectures are complex, unstable, or difficult to train. This paper presents an action-conditioned network based upon Introspective Variational Autoencoder (IntroVAE) with a simplistic design to predict high-quality samples. The proposed architecture combines features of Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) with encoding and decoding layers that can self-evaluate the quality of predicted frames; no extra discriminator network is needed in our framework. Experimental results with two data sets show that the proposed architecture could be applied to small and large images. Our predicted samples are comparable to the state-of-the-art GAN-based networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Generative Adversarial Networks for Video Prediction with Action Control

Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks

Unsupervised video-based action recognition using two-stream generative adversarial network

Article 26 December 2023

Notes

1.
We have to mention that other activation functions were also tested, specifically LeakyReLU and Tanh (for the last layer of the generator). However, the results did not improve, and the computational load increased significantly.

References

Open AI: Gym toolkit. https://gym.openai.com/envs/CarRacing-v0.html
Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks (2017)
Google Scholar
Castelló, J.S.: A comprehensive survey on deep future frame video prediction (2018)
Google Scholar
Daniel, T., Tamar, A.: Soft-introVAE: analyzing and improving the introspective variational autoencoder (2021)
Google Scholar
Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., Levine, S.: Visual foresight: model-based deep reinforcement learning for vision-based robotic control (2018)
Google Scholar
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction (2016)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial networks (2014)
Google Scholar
Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)
Heydari, A.A., Mehmood, A.: SRVAE: super resolution using variational autoencoders. In: Pattern Recognition and Tracking XXXI, vol. 11400, p. 114000U. International Society for Optics and Photonics (2020)
Google Scholar
Huang, H., Li, Z., He, R., Sun, Z., Tan, T.: IntroVAE: introspective variational autoencoders for photographic image synthesis (2018)
Google Scholar
Khan, S.H., Hayat, M., Barnes, N.: Adversarial training of variational auto-encoders for high fidelity image generation (2018)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2014)
Google Scholar
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric (2016)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network (2017)
Google Scholar
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction (2018)
Google Scholar
Joyce, J.M.: Kullback-Leibler Divergence. In: Lovric, M. (eds.) International Encyclopedia of Statistical Science, pp. 720–722. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_327
Malik, A., Troute, M., Capoor, B.: DeepGIFs: Using deep learning to understand and synthesize motion (2018)
Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in atari games (2015)
Google Scholar
Oprea, S., et al.: A review on deep learning techniques for video prediction (2020)
Google Scholar
Paxton, C., Barnoy, Y., Katyal, K., Arora, R., Hager, G.D.: Visual robot task planning (2018)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2016)
Google Scholar
Rasouli, A.: Deep learning for vision-based prediction: a survey (2020)
Google Scholar
Rhinehart, N., McAllister, R., Kitani, K., Levine, S.: PRECOG: prediction conditioned on goals in visual multi-agent settings (2019)
Google Scholar
Sainburg, T., Thielk, M., Theilman, B., Migliori, B., Gentner, T.: Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions (2019)
Google Scholar
Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2234–2242. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-Paper.pdf
Vu, H.S., Ueta, D., Hashimoto, K., Maeno, K., Pranata, S., Shen, S.M.: Anomaly detection with adversarial dual autoencoders (2019)
Google Scholar
Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: video forecasting by generating pose futures (2017)
Google Scholar
Wang, E., Kosson, A., Mu, T.: Deep action conditional neural network for frame prediction in atari games. Technical report, Stanford University (2017)
Google Scholar
Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network (2017)
Google Scholar
Zhao, S., Song, J., Ermon, S.: InfoVAE: information maximizing variational autoencoders (2018)
Google Scholar
Zheng, K., Cheng, Y., Kang, X., Yao, H., Tian, T.: Conditional introspective variational autoencoder for image synthesis. IEEE Access 8, 153905–153913 (2020). https://doi.org/10.1109/ACCESS.2020.3018228
Article Google Scholar
Zhu, D., Chen, H., Yao, H., Nosrati, M., Yadmellat, P., Zhang, Y.: Practical issues of action-conditioned next image prediction (2018)
Google Scholar
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Automation and Robotic Engineering Science, University of Auckland, Auckland, New Zealand
David Valencia, Henry Williams, Bruce MacDonald & Ting Qiao

Authors

David Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Henry Williams
View author publications
You can also search for this author in PubMed Google Scholar
Bruce MacDonald
View author publications
You can also search for this author in PubMed Google Scholar
Ting Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Valencia .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
Department of Computer Science, University of Reading, Reading, UK
Varun Ojha
Department of Computer Science, University of Oxford, Oxford, UK
Emanuele La Malfa
Cambridge Judge Business School, University of Cambridge, Cambridge, UK
Gabriele La Malfa
Department of Biochemistry, University of Cambridge, Cambridge, UK
Giorgio Jansen
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Department of Informatics, Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valencia, D., Williams, H., MacDonald, B., Qiao, T. (2022). Action-Conditioned Frame Prediction Without Discriminator. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-95467-3_24
Published: 02 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95466-6
Online ISBN: 978-3-030-95467-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Action-Conditioned Frame Prediction Without Discriminator

Abstract

Access this chapter

Similar content being viewed by others

Generative Adversarial Networks for Video Prediction with Action Control

Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks

Unsupervised video-based action recognition using two-stream generative adversarial network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Action-Conditioned Frame Prediction Without Discriminator

Abstract

Access this chapter

Similar content being viewed by others

Generative Adversarial Networks for Video Prediction with Action Control

Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks

Unsupervised video-based action recognition using two-stream generative adversarial network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation