Skip to main content

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11311))

Abstract

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The Deep Maze is open-source and publicly available at https://github.com/CAIR/deep-maze.

References

  1. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8

    Chapter  Google Scholar 

  2. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)

    Article  Google Scholar 

  3. Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:1611.09894, November 2016

  4. Blundell, C., et al.: Model-free episodic control. arxiv preprint arXiv:1606.04460, June 2016

  5. Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:1802.03006, February 2018

  6. Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015)

    Google Scholar 

  7. Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018

  8. Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016

    Google Scholar 

  9. Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017)

    Google Scholar 

  10. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996)

    Google Scholar 

  11. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015)

    Google Scholar 

  12. Li, Y.: Deep reinforcement learning: an overview. arxiv preprint arXiv:1701.07274, January 2017

  13. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016)

    Google Scholar 

  14. Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013

    Google Scholar 

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  16. Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32

    Chapter  Google Scholar 

  17. Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016)

    Google Scholar 

  18. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015)

    Google Scholar 

  19. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (jul 2017)

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998)

    Google Scholar 

  21. Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017)

    Google Scholar 

  22. Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2018). The Dreaming Variational Autoencoder for Reinforcement Learning Environments. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science(), vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04191-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04190-8

  • Online ISBN: 978-3-030-04191-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics