Reinforcement Learning with Echo State Networks

  • István Szita
  • Viktor Gyenes
  • András Lőrincz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4131)


Function approximators are often used in reinforcement learning tasks with large or continuous state spaces. Artificial neural networks, among them recurrent neural networks are popular function approximators, especially in tasks where some kind of of memory is needed, like in real-world partially observable scenarios. However, convergence guarantees for such methods are rarely available. Here, we propose a method using a class of novel RNNs, the echo state networks. Proof of convergence to a bounded region is provided for k-order Markov decision processes. Runs on POMDPs were performed to test and illustrate the working of the architecture.


Reinforcement Learning Recurrent Neural Network Reinforcement Learning Algorithm Echo State Network Continuous State Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Gordon, G.J.: Chattering in SARSA(lambda) - a CMU Learning Lab Internal Report (1996)Google Scholar
  3. 3.
    Jaeger, H.: Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the ’echo state network’ approach. Technical Report GMD Report 159, German National Research Center for Information Technology (2002)Google Scholar
  4. 4.
    Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication. Science, 78–80 (2004)Google Scholar
  5. 5.
    Tesauro, G., Sejnowski, T.J.: A parallel network that learns to play backgammon. Artificial Intelligence 39, 357–390 (1989)MATHCrossRefGoogle Scholar
  6. 6.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  7. 7.
    Lin, L.J., Mitchell, T.M.: Memory approaches to reinforcement learning in non-markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, Pittsburgh, PA (1992)Google Scholar
  8. 8.
    Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)CrossRefGoogle Scholar
  9. 9.
    Glickman, M.R., Sycara, K.: Evolution of goal-directed behavior from limited information in a complex environment. In: Proc. of the Genetic and Evol. Comp. Conf., Orlando, Florida, USA, pp. 1281–1288. Morgan Kaufmann, San Francisco (1999)Google Scholar
  10. 10.
    Bakker, B.: Reinforcement learning with long short-term memory. Advances in Neural Information Processing Systems 14, 1475–1482 (2002)Google Scholar
  11. 11.
    Bakker, P.B.: The State of Mind - Reinforcement Learning with Recurrent Neural Networks. PhD thesis, Universiteit Leiden (2004)Google Scholar
  12. 12.
    Schmidhuber, J.: Making the world differentiable. Technical Report TR-FKI-126-90, Institut für Informatik, Technische Universität München (1990)Google Scholar
  13. 13.
    Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: International Conference on Machine Learning, pp. 30–37 (1995)Google Scholar
  14. 14.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK (1989)Google Scholar
  15. 15.
    Gordon, G.J.: Reinforcement learning with function approximation converges to a region. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1040–1046. MIT Press, Cambridge (2001)Google Scholar
  16. 16.
    Kaelbling, L.P., Littman, A.R.C., Acting, M.L.: optimally in partially observable stochastic domains. In: Proc. of the 12th Nat’l Conf. on Artif. Intell. (1994)Google Scholar
  17. 17.
    Russell, S.J., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice-Hall, Englewood Cliffs (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • István Szita
    • 1
  • Viktor Gyenes
    • 1
  • András Lőrincz
    • 1
  1. 1.Eötvös Loránd UniversityBudapestHungary

Personalised recommendations