Advertisement

Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method

  • Martin Riedmiller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)

Abstract

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

References

  1. [BM95]
    Boyan, J., Moore, C.: Generalization in reinforcement learning: Safely approximating the value function. In: Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Francisco (1995)Google Scholar
  2. [EPG05]
    Ernst, D., Wehenkel, L., Geurts, P.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetGoogle Scholar
  3. [Gor95]
    Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the ICML, San Francisco, CA (1995)Google Scholar
  4. [Lin92]
    Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992)Google Scholar
  5. [LP03]
    Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)CrossRefMathSciNetGoogle Scholar
  6. [RB93]
    Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Ruspini, H. (ed.) Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, pp. 586–591 (1993)Google Scholar
  7. [Rie00]
    Riedmiller, M.: Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application 8, 323–338 (2000)CrossRefGoogle Scholar
  8. [SB98]
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  9. [Tes92]
    Tesauro, G.: Practical issues in temporal difference learning. Machine Learning (8), 257–277 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Martin Riedmiller
    • 1
  1. 1.Neuroinformatics GroupUniversity of OnsabrückOsnabrück

Personalised recommendations