Evolving Neural Networks for Online Reinforcement Learning

  • Jan Hendrik Metzen
  • Mark Edgington
  • Yohannes Kassahun
  • Frank Kirchner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5199)


For many complex Reinforcement Learning problems with large and continuous state spaces, neuroevolution (the evolution of artificial neural networks) has achieved promising results. This is especially true when there is noise in sensor and/or actuator signals. These results have mainly been obtained in offline learning settings, where the training and evaluation phase of the system are separated. In contrast, in online Reinforcement Learning tasks where the actual performance of the systems during its learning phase matters, the results of neuroevolution are significantly impaired by its purely exploratory nature, meaning that it does not use (i.e. exploit) its knowledge of the performance of single individuals in order to improve its performance during learning. In this paper we describe modifications which significantly improve the online performance of the neuroevolutionary method Evolutionary Acquisition of Neural Topologies (EANT) and discuss the results obtained on two benchmark problems.


Reinforcement Learn Mature Individual Cerebellar Model Articulation Controller Online Performance Evolve Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beyer, H.-G., Schwefel, H.-P.: Evolution strategies - a comprehensive introduction. Natural Computing: an International Journal 1(1), 3–52 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Proceedings of the 17th European Conference on Machine Learning (ECML), Berlin, Germany, September 2006, pp. 654–662 (2006)Google Scholar
  3. 3.
    Kassahun, Y.: Towards a Unified Approach to Learning and Adaptation. PhD thesis, Institute of Computer Science and Applied Mathematics, Christian-Albrechts University, Kiel, Germany (February 2006)Google Scholar
  4. 4.
    Kassahun, Y., Edgington, M., Metzen, J.H., Sommer, G., Kirchner, F.: A common genetic encoding for both direct and indirect encodings of networks. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO 2007), pp. 1029–1036 (2007)Google Scholar
  5. 5.
    Metzen, J.H., Edgington, M., Kassahun, Y., Kirchner, F.: Analysis of an evolutionary reinforcement learning method in a multiagent domain. In: Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2008), Estoril, Portugal, pp. 291–298 (May 2008) Google Scholar
  6. 6.
    Stagge, P.: Averaging efficiently in the presence of noise. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 188–200. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  7. 7.
    Stanley, K.O.: Efficient Evolution of Neural Networks through Complexification. PhD thesis, Artificial Intelligence Laboratory. The University of Texas at Austin., Austin, USA (August 2004)Google Scholar
  8. 8.
    Stanley, K.O., Bryant, B.D., Miikkulainen, R.: Real-time neuroevolution in the nero video game. IEEE Trans. Evolutionary Computation 9(6), 653–668 (2005)CrossRefGoogle Scholar
  9. 9.
    Stone, P., Kuhlmann, G., Taylor, M.E., Liu, Y.: Keepaway soccer: From machine learning testbed to benchmark. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005. LNCS (LNAI), vol. 4020, pp. 93–105. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Sutton, R., Barto, A.: Reinforcement Learning. An Introduction. MIT Press, Massachusetts (1998)Google Scholar
  11. 11.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12, pp. 1057–1063 (1999)Google Scholar
  12. 12.
    Taylor, M.E., Whiteson, S., Stone, P.: Comparing evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2006), pp. 1321–1328 (2006)Google Scholar
  13. 13.
    Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research 7, 877–917 (2006)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Whiteson, S., Taylor, M.E., Stone, P.: Empirical studies in action selection with reinforcement learning. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems 15(1), 33–50 (2007)Google Scholar
  15. 15.
    Yao, X.: Evolving artificial neural networks. Proceedings of the IEEE 87(9), 1423–1447 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jan Hendrik Metzen
    • 1
  • Mark Edgington
    • 2
  • Yohannes Kassahun
    • 2
  • Frank Kirchner
    • 1
    • 2
  1. 1.Robotics Lab, German Research Center for Artificial Intelligence (DFKI)BremenGermany
  2. 2.Robotics GroupUniversity of BremenBremenGermany

Personalised recommendations