Sequential Constant Size Compressors for Reinforcement Learning

  • Linus Gisslén
  • Matt Luciw
  • Vincent Graziano
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6830)


Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Memory, trained through standard back-propagation. Results illustrate the feasibility of this approach — this system learns to deal with high-dimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.


recurrent auto-associative memory reinforcement-learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Anderson, C.W.: Strategy learning with multilayer connectionist representations. Technical Report TR87-509.3, GTE Labs, Waltham, MA (1987)Google Scholar
  3. 3.
    Bakker, B.: Reinforcement learning with Long Short-Term Memory. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)Google Scholar
  4. 4.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC 13, 834–846 (1983)Google Scholar
  5. 5.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Neural Information Processing Systems (NIPS) (2007)Google Scholar
  6. 6.
    Gomez, F.J., Miikkulainen, R.: Solving non-Markovian control tasks with neuroevolution. In: Proc. IJCAI 1999, Denver, CO. Morgan Kaufman, San Francisco (1999)Google Scholar
  7. 7.
    Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 654–662. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Gruau, F., Whitley, D., Pyeatt, L.: A comparison between cellular encoding and direct encoding for genetic neural networks. Technical Report NC-TR-96-048, NeuroCOLT (1996)Google Scholar
  9. 9.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2), 159–195 (2001)CrossRefGoogle Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004); (On J. Schmidhuber’s SNF grant 20-61847)Google Scholar
  12. 12.
    Kolen, J.F., Pollack, J.B.: Back propagation is sensitive to initial conditions. Advances in neural information processing systems 3, 860–867 (1991)Google Scholar
  13. 13.
    Lange, S., Riedmiller, M.: Deep Auto-Encoder Neural Networks in Reinforcement Learning. IJCNN (2010)Google Scholar
  14. 14.
    Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)Google Scholar
  15. 15.
    Pollack, J.B.: Recursive distributed representations. Artificial Intelligence 46(1-2), 77–105 (1990)CrossRefGoogle Scholar
  16. 16.
    Saravanan, N., Fogel, D.B.: Evolving neural control systems. IEEE Expert, 23–27 (June 1995)Google Scholar
  17. 17.
    Schaul, T., Glasmachers, T., Schmidhuber, J.: High dimensions and heavy tails for natural evolution strategies. In: Genetic and Evolutionary Computation Conference (GECCO) (2011)Google Scholar
  18. 18.
    Schmidhuber, J.: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science 1(4), 403–412 (1989)CrossRefGoogle Scholar
  19. 19.
    Schmidhuber, J.: Recurrent networks adjusted by adaptive critics. In: Proc. IEEE/INNS International Joint Conference on Neural Networks, Washington, D. C, vol. 1, pp. 719–722 (1990)Google Scholar
  20. 20.
    Schmidhuber, J.: Reinforcement learning in Markovian and non-Markovian environments. In: Lippman, D.S., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 3, pp. 500–506. Morgan Kaufmann, San Francisco (1991)Google Scholar
  21. 21.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10, 99–127 (2002)CrossRefGoogle Scholar
  22. 22.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  23. 23.
    Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)Google Scholar
  24. 24.
    Werbos, P.J.: Neural networks for control and system identification. In: Proceedings of IEEE/CDC Tampa, Florida (1989)Google Scholar
  25. 25.
    Wierstra, D., Schmidhuber, J.: Policy gradient critics. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 466–477. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  26. 26.
    Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: Congress on Evolutionary Computation CEC (2008)Google Scholar
  27. 27.
    Yao, X.: Xin Yao. A review of evolutionary artificial neural networks. International Journal of Intelligent Systems 4, 203–222 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Linus Gisslén
    • 1
  • Matt Luciw
    • 1
  • Vincent Graziano
    • 1
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIAUniversity of LuganoManno-LuganoSwitzerland

Personalised recommendations