Abstract

A reinforcement learning algorithm called kNN-TD is introduced. This algorithm has been developed using the classical formulation of temporal difference methods and a k-nearest neighbors scheme as its expectations memory. By means of this kind of memory the algorithm is able to generalize properly over continuous state spaces and also take benefits from collective action selection and learning processes. Furthermore, with the addition of probability traces, we obtain the kNN-TD(λ) algorithm which exhibits a state of the art performance. Finally the proposed algorithm has been tested on a series of well known reinforcement learning problems and also at the Second Annual RL Competition with excellent results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning, An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Watkins, C.J., Dayan, P.: Technical note Q-learning. Machine Learning 8, 279 (1992)MATHGoogle Scholar
  3. 3.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)CrossRefMATHGoogle Scholar
  4. 4.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Chichester (1973)MATHGoogle Scholar
  5. 5.
    Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics SMC-6(4), 325–327 (1976)CrossRefGoogle Scholar
  6. 6.
    Gordon, G.J.: Stable function approximation in dynamic programming. In: ICML, pp. 261–268 (1995)Google Scholar
  7. 7.
    Atkeson, C., Moore, A., Schaal, S.: Locally weighted learning. AI Review 11, 11–73 (1997)Google Scholar
  8. 8.
    Bosman, S.: Locally weighted approximations: yet another type of neural network. Master’s thesis, Intelligent Autonomous Systems Group, Dep. of Computer Science, University of Amsterdam (July 1996)Google Scholar
  9. 9.
    Martin, H., Antonio, J., de Lope, J.: A k-NN based perception scheme for reinforcement learning. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 138–145. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Machine Learning 22(1-3), 123–158 (1996)CrossRefMATHGoogle Scholar
  11. 11.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • José Antonio Martín H.
    • 1
  • Javier de Lope
    • 2
  • Darío Maravall
    • 2
  1. 1.Dep. Sistemas Informáticos y ComputaciónUniversidad Complutense de MadridSpain
  2. 2.Perception for Computers and RobotsUniversidad Politécnica de MadridSpain

Personalised recommendations