Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method
This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.
- [BM95]Boyan, J., Moore, C.: Generalization in reinforcement learning: Safely approximating the value function. In: Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Francisco (1995)Google Scholar
- [Gor95]Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the ICML, San Francisco, CA (1995)Google Scholar
- [Lin92]Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992)Google Scholar
- [RB93]Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Ruspini, H. (ed.) Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, pp. 586–591 (1993)Google Scholar
- [SB98]Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
- [Tes92]Tesauro, G.: Practical issues in temporal difference learning. Machine Learning (8), 257–277 (1992)Google Scholar