Abstract
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The Deep Maze is open-source and publicly available at https://github.com/CAIR/deep-maze.
References
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:1611.09894, November 2016
Blundell, C., et al.: Model-free episodic control. arxiv preprint arXiv:1606.04460, June 2016
Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:1802.03006, February 2018
Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015)
Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016
Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015)
Li, Y.: Deep reinforcement learning: an overview. arxiv preprint arXiv:1701.07274, January 2017
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016)
Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32
Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (jul 2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998)
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017)
Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Andersen, PA., Goodwin, M., Granmo, OC. (2018). The Dreaming Variational Autoencoder for Reinforcement Learning Environments. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science(), vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-04191-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04190-8
Online ISBN: 978-3-030-04191-5
eBook Packages: Computer ScienceComputer Science (R0)