Machine Learning

, Volume 43, Issue 1–2, pp 7–52 | Cite as

Relational Reinforcement Learning

  • Sašo Džeroski
  • Luc De Raedt
  • Kurt Driessens
Article

Abstract

Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.

reinforcement learning inductive logic programming planning 

References

  1. Baum, E. B. (1996). Toward a model of mind as a laissez-faire economy of idiots. In Proc. 13th Intl. Conf. on Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  2. Blockeel, H.& De Raedt, L. (1997). Lookahead and discretization in ILP. In Proc. 7th Intl.Workshop on Inductive Logic Programming (pp. 77–84). Berlin: Springer.Google Scholar
  3. Blockeel, H., De Raedt, L.,& Ramon, J. (1998). Top-down induction of clustering trees. In Proc. 15th Intl. Conf. on Machine Learning (pp. 55–63). San Francisco, CA: Morgan Kaufmann.Google Scholar
  4. Blockeel, H.& De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101(1/2), 285–297.Google Scholar
  5. Blockeel, H., De Raedt, L., Jacobs, N.,& Demoen, B. (1999). Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery, 3, 59–93.Google Scholar
  6. Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.Google Scholar
  7. Carbonell, J.& Gill, Y. (1990). Learning by experimentation: The operator refinement method. In Y. Kodratoff& R. Michalski (Eds.), Machine learning: An artificial intelligence approach (pp. 191–213). San Mateo, CA: Morgan Kaufmann.Google Scholar
  8. Chapman, D.& Kaelbling, L. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proc. 12th Intl. Joint Conf. on Artificial Intelligence (pp. 726–731). San Mateo, CA: Morgan Kaufmann.Google Scholar
  9. De Raedt, L.& Blockeel, H. (1997). Using logical decision trees for clustering. In Proc. 7th Intl. Workshop on Inductive Logic Programming (pp. 133–141). Berlin: Springer.Google Scholar
  10. Fikes, R. E.& Nilsson, N. J. (1971). STRIPS: A new approach to the application of theorem proving. Artificial Intelligence, 2(3/4), 189–208.Google Scholar
  11. Kaelbling, L., Littman, M.,& Moore, A. (1996). Reinforcement learning:Asurvey. Journal of Artificial Intelligence Research, 4, 237–285.Google Scholar
  12. Karalic, A.& Bratko, I. (1997). First order regression. Machine Learning, 26, 147–176.Google Scholar
  13. Koenig, S.& Simmons, R. G. (1996). The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Machine Learning, 22, 227–250.Google Scholar
  14. Kramer, S. (1996). Structural regression trees. In Proc. 13th Natl. Conf. on Artificial Intelligence (pp. 812–819). Menlo Park, CA: AAAI Press.Google Scholar
  15. Langley, P. (1985). Strategy acquisition governed by experimentation. In L. Steels & J. A. Campbell (Eds.), Progress in artificial intelligence (P. 52). Chichester: Ellis Horwood.Google Scholar
  16. Langley, P. (1996). Elements of machine learning. San Matco, CA: Morgan Kaufmann.Google Scholar
  17. Lavrač, N.& Džeroski, S. (1994). Inductive logic programming: Techniques and applications. Chichester: Ellis Horwood.Google Scholar
  18. Lin, L.-J. (1992). Self-Improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.Google Scholar
  19. Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.Google Scholar
  20. Mitchell, T., Keller, R.,& Kedar-Cabelli, S. (1986). Explanation based generalization: A unifying view. Machine Learning, 1(1), 47–80.Google Scholar
  21. Mitchell, T., Utgoff, P. E.,& Banerji, R. (1984). Learning by experimentation: Acquiring and refining problemsolving heuristics. In R. S. Michalski, J. G. Carbonell,& T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. Springer-Verlag. Palo Alto, CA: Tioga.Google Scholar
  22. Mooney, R. J.& Califf, M. E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, (3), 1–24.Google Scholar
  23. Muggleton, S.& De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679.Google Scholar
  24. Nedellec, C., Rouveirol, C., Adé, H., Bergadano, F.,& Tausend, B. (1996). Declarative bias in ILP. In L. De Raedt (Ed.), Advances in inductive logic programming (pp. 82–103). Amsterdam: IOS Press.Google Scholar
  25. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  26. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.Google Scholar
  27. Quinlan, J. R. (1993). C 4.5: Programs for machine learning. Morgan Kaufmann.Google Scholar
  28. Stone, P.& Veloso, M. (1999). Team partitioned, opaque transition reinforcement learning. In Proc. Third Annual Conference on Autonomous Agents (pp. 206–212). San Matco: Morgan Kaufmann. ACM Press.Google Scholar
  29. Sutton, R. S., Precup, D.,& Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.Google Scholar
  30. Tesauro, G. (1995). Temporal difference learning and TD-GAMMON. Communications of the ACM, 38(3), 58–68.Google Scholar
  31. Utgoff, P. E., Berkman, N. C.,& Clause, J. A. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29, 5–44.Google Scholar
  32. Watkins, C.& Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.Google Scholar
  33. Widmer, G.& Kubat, M. (Eds.) (1998). Special issue on context sensitivity and concept drift. Machine Learning, 32(2), 83–201.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Sašo Džeroski
    • 1
  • Luc De Raedt
    • 2
  • Kurt Driessens
    • 3
  1. 1.Department of Intelligent SystemsJožef Stefan InstituteLjubljanaSlovenia
  2. 2.Institüt für InformatikAlbert-Lüdwigs-Universität Freiburg, Georges KöhlerFreiburgGermany
  3. 3.Department of Computer ScienceKatholieke Universiteit LeuvenHeverleeBelgium

Personalised recommendations