The Dreaming Variational Autoencoder for Reinforcement Learning Environments

Andersen, Per-Arne; Goodwin, Morten; Granmo, Ole-Christoffer

doi:10.1007/978-3-030-04191-5_11

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

Per-Arne Andersen¹⁵,
Morten Goodwin¹⁵ &
Ole-Christoffer Granmo¹⁵

Conference paper
First Online: 16 November 2018

2929 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11311))

Abstract

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The Deep Maze is open-source and publicly available at https://github.com/CAIR/deep-maze.

References

Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8
Chapter Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:1611.09894, November 2016
Blundell, C., et al.: Model-free episodic control. arxiv preprint arXiv:1606.04460, June 2016
Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:1802.03006, February 2018
Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015)
Google Scholar
Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016
Google Scholar
Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015)
Google Scholar
Li, Y.: Deep reinforcement learning: an overview. arxiv preprint arXiv:1701.07274, January 2017
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32
Chapter Google Scholar
Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (jul 2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998)
Google Scholar
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017)
Google Scholar
Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Per-Arne Andersen, Morten Goodwin & Ole-Christoffer Granmo

Authors

Per-Arne Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Morten Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
Middlesex University, London, UK
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2018). The Dreaming Variational Autoencoder for Reinforcement Learning Environments. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science(), vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-04191-5_11
Published: 16 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04190-8
Online ISBN: 978-3-030-04191-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics