Advertisement

Machine Learning

, Volume 33, Issue 1, pp 105–115 | Cite as

Fast Online Q(λ)

  • Marco Wiering
  • Jürgen Schmidhuber
Article

Abstract

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

reinforcement learning Q-learning TD(λ) online Q(λ) lazy learning 

References

  1. Albus, J.S. (1975). A new approach to manipulator control: The cerebellar model articulationcontroller (CMAC). Dynamic Systems, Measurement and Control, 97, 220–227.Google Scholar
  2. Atkeson, C.G., Schaal, S., & Moore, A.W. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.Google Scholar
  3. Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike adaptive elements that can solvedifficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.Google Scholar
  4. Bertsekas, D.P., & Tsitsiklis, J.N. (1996). Neuro-dynamic programming. Belmont, MA: AthenaScientific.Google Scholar
  5. Caironi, P.V.C., & Dorigo, M. (1994). Training Q-agents (Technical Report IRIDIA-94-14). Université Libre de Bruxelles.Google Scholar
  6. Cichosz, P. (1995). Truncating temporal differences: On theefficient implementation of TD(λ) for reinforcement learning. Journal of Artificial Intelligence Research, 2, 287–318.Google Scholar
  7. Fritzke, B. (1994). Supervised learning with growing cell structures. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 255–262). San Mateo, CA: Morgan Kaufmann.Google Scholar
  8. Koenig, S., & Simmons, R.G. (1996). The effect ofrepresentation and knowledge on goal-directed exploration with reinforcement learning algorithms. Machine Learning, 22, 228–250.Google Scholar
  9. Kohonen, T. (1988). Self-organization and associative memory (2nded.). Springer.Google Scholar
  10. Lin, L.-J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis,Carnegie Mellon University, Pittsburgh.Google Scholar
  11. Peng, J., & Williams, R. (1996). Incremental multi-step Q-learning. Machine Learning, 22, 283–290.Google Scholar
  12. Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist sytems (Technical Report CUED/ F-INFENG-TR 166). UK: Cambridge University.Google Scholar
  13. Singh, S., & Sutton, R. (1996). Reinforcement learning with replacing eligibility traces.Machine Learning, 22, 123–158.Google Scholar
  14. Sutton, R.S. (1988). Learning to predict by the methods oftemporal differences. Machine Learning, 3, 9–44.Google Scholar
  15. Sutton, R.S. (1996). Generalization inreinforcement learning: Successful examples using sparse coarse coding. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems, (Vol. 8, pp. 1033–1045). Cambridge, MA: MIT Press.Google Scholar
  16. Tesauro, G. (1992). Practical issues in temporal difference learning. InD.S., Lippman, J.E. Moody, & D.S Touretzky (Eds.), Advances in neural information processing systems (Vol. 4, pp. 259–266). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. Thrun, S. (1992). Efficient explorationin reinforcement learning (Technical Report CMU-CS-92-102). Carnegie-Mellon University.Google Scholar
  18. Watkins, C.J.C.H. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge,England.Google Scholar
  19. Watkins, C.J.C.H., & Dayan, P. (1992). Technical note: Q-learning. Machine Learning, 8,279–292.Google Scholar
  20. Whitehead, S. (1992). Reinforcement learning for the adaptive control of perception and action.Ph.D. thesis, University of Rochester.Google Scholar
  21. Wiering, M.A., & Schmidhuber, J. (1998). Speeding up Q(λ)-learning. In C. Nedellec, & C. Rouveirol (Eds.), Machine Learning: Proceedings of the Tenth European Conference. Berlin: Springer Verlag.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Marco Wiering
    • 1
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIALuganoSwitzerland. E-mail: Email

Personalised recommendations