Skip to main content

Sequential Constant Size Compressors for Reinforcement Learning

  • Conference paper
Artificial General Intelligence (AGI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6830))

Included in the following conference series:

Abstract

Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Memory, trained through standard back-propagation. Results illustrate the feasibility of this approach — this system learns to deal with high-dimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)

    Article  MathSciNet  Google Scholar 

  2. Anderson, C.W.: Strategy learning with multilayer connectionist representations. Technical Report TR87-509.3, GTE Labs, Waltham, MA (1987)

    Google Scholar 

  3. Bakker, B.: Reinforcement learning with Long Short-Term Memory. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)

    Google Scholar 

  4. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC 13, 834–846 (1983)

    Google Scholar 

  5. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Neural Information Processing Systems (NIPS) (2007)

    Google Scholar 

  6. Gomez, F.J., Miikkulainen, R.: Solving non-Markovian control tasks with neuroevolution. In: Proc. IJCAI 1999, Denver, CO. Morgan Kaufman, San Francisco (1999)

    Google Scholar 

  7. Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 654–662. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Gruau, F., Whitley, D., Pyeatt, L.: A comparison between cellular encoding and direct encoding for genetic neural networks. Technical Report NC-TR-96-048, NeuroCOLT (1996)

    Google Scholar 

  9. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2), 159–195 (2001)

    Article  Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004); (On J. Schmidhuber’s SNF grant 20-61847)

    Google Scholar 

  12. Kolen, J.F., Pollack, J.B.: Back propagation is sensitive to initial conditions. Advances in neural information processing systems 3, 860–867 (1991)

    Google Scholar 

  13. Lange, S., Riedmiller, M.: Deep Auto-Encoder Neural Networks in Reinforcement Learning. IJCNN (2010)

    Google Scholar 

  14. Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)

    Google Scholar 

  15. Pollack, J.B.: Recursive distributed representations. Artificial Intelligence 46(1-2), 77–105 (1990)

    Article  Google Scholar 

  16. Saravanan, N., Fogel, D.B.: Evolving neural control systems. IEEE Expert, 23–27 (June 1995)

    Google Scholar 

  17. Schaul, T., Glasmachers, T., Schmidhuber, J.: High dimensions and heavy tails for natural evolution strategies. In: Genetic and Evolutionary Computation Conference (GECCO) (2011)

    Google Scholar 

  18. Schmidhuber, J.: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science 1(4), 403–412 (1989)

    Article  Google Scholar 

  19. Schmidhuber, J.: Recurrent networks adjusted by adaptive critics. In: Proc. IEEE/INNS International Joint Conference on Neural Networks, Washington, D. C, vol. 1, pp. 719–722 (1990)

    Google Scholar 

  20. Schmidhuber, J.: Reinforcement learning in Markovian and non-Markovian environments. In: Lippman, D.S., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 3, pp. 500–506. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  21. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10, 99–127 (2002)

    Article  Google Scholar 

  22. Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  23. Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)

    Google Scholar 

  24. Werbos, P.J.: Neural networks for control and system identification. In: Proceedings of IEEE/CDC Tampa, Florida (1989)

    Google Scholar 

  25. Wierstra, D., Schmidhuber, J.: Policy gradient critics. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 466–477. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  26. Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: Congress on Evolutionary Computation CEC (2008)

    Google Scholar 

  27. Yao, X.: Xin Yao. A review of evolutionary artificial neural networks. International Journal of Intelligent Systems 4, 203–222 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gisslén, L., Luciw, M., Graziano, V., Schmidhuber, J. (2011). Sequential Constant Size Compressors for Reinforcement Learning. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science(), vol 6830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22887-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22887-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22886-5

  • Online ISBN: 978-3-642-22887-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics