Heuristically Accelerated Q–Learning: A New Approach to Speed Up Reinforcement Learning

  • Reinaldo A. C. Bianchi
  • Carlos H. C. Ribeiro
  • Anna H. R. Costa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3171)

Abstract

This work presents a new algorithm, called Heuristically Accelerated Q–Learning (HAQL), that allows the use of heuristics to speed up the well-known Reinforcement Learning algorithm Q–learning. A heuristic function \(\mathcal{H}\) that influences the choice of the actions characterizes the HAQL algorithm. The heuristic function is strongly associated with the policy: it indicates that an action must be taken instead of another. This work also proposes an automatic method for the extraction of the heuristic function \(\mathcal{H}\) from the learning process, called Heuristic from Exploration. Finally, experimental results shows that even a very simple heuristic results in a significant enhancement of performance of the reinforcement learning algorithm.

Keywords

Reinforcement Learning Cognitive Robotics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Upper Saddle River (1987)MATHGoogle Scholar
  2. 2.
    Bonabeau, E., Dorigo, M., Theraulaz, G.: Inspiration for optimization from social insect behaviour. Nature 406 [6791] (2000)Google Scholar
  3. 3.
    Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research 16, 59–104 (2002)MATHGoogle Scholar
  4. 4.
    Foster, D., Dayan, P.: Structure in the space of value functions. Machine Learning 49(2/3), 325–346 (2002)MATHCrossRefGoogle Scholar
  5. 5.
    Gambardella, L., Dorigo, M.: Ant–Q: A reinforcement learning approach to the traveling salesman problem. In: Proceedings of the ML 1995 – Twelfth International Conference on Machine Learning, pp. 252–260 (1995)Google Scholar
  6. 6.
    Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4(2), 100–107 (1968)CrossRefGoogle Scholar
  7. 7.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  8. 8.
    Littman, M.L., Szepesvári, C.: A generalized reinforcement learning model: Convergence and applications. In: Procs. of the Thirteenth International Conf. on Machine Learning (ICML 1996), pp. 310–318 (1996)Google Scholar
  9. 9.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)MATHGoogle Scholar
  10. 10.
    Nehmzow, U.: Mobile Robotics: A Practical Introduction. Springer, Berlin (2000)MATHGoogle Scholar
  11. 11.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, University of Cambridge (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Reinaldo A. C. Bianchi
    • 1
    • 2
  • Carlos H. C. Ribeiro
    • 3
  • Anna H. R. Costa
    • 1
  1. 1.Laboratório de Técnicas InteligentesEscola Politécnica da Universidade de São PauloSão PauloBrazil
  2. 2.Centro Universitário da FEISão Bernardo do CampoBrazil
  3. 3.Instituto Tecnológico de AeronáuticaSão José dos CamposBrazil

Personalised recommendations