Abstract
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.
Article PDF
Similar content being viewed by others
References
Baum, E. B. (1996). Toward a model of mind as a laissez-faire economy of idiots. In Proc. 13th Intl. Conf. on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Blockeel, H.& De Raedt, L. (1997). Lookahead and discretization in ILP. In Proc. 7th Intl.Workshop on Inductive Logic Programming (pp. 77–84). Berlin: Springer.
Blockeel, H., De Raedt, L.,& Ramon, J. (1998). Top-down induction of clustering trees. In Proc. 15th Intl. Conf. on Machine Learning (pp. 55–63). San Francisco, CA: Morgan Kaufmann.
Blockeel, H.& De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101(1/2), 285–297.
Blockeel, H., De Raedt, L., Jacobs, N.,& Demoen, B. (1999). Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery, 3, 59–93.
Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
Carbonell, J.& Gill, Y. (1990). Learning by experimentation: The operator refinement method. In Y. Kodratoff& R. Michalski (Eds.), Machine learning: An artificial intelligence approach (pp. 191–213). San Mateo, CA: Morgan Kaufmann.
Chapman, D.& Kaelbling, L. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proc. 12th Intl. Joint Conf. on Artificial Intelligence (pp. 726–731). San Mateo, CA: Morgan Kaufmann.
De Raedt, L.& Blockeel, H. (1997). Using logical decision trees for clustering. In Proc. 7th Intl. Workshop on Inductive Logic Programming (pp. 133–141). Berlin: Springer.
Fikes, R. E.& Nilsson, N. J. (1971). STRIPS: A new approach to the application of theorem proving. Artificial Intelligence, 2(3/4), 189–208.
Kaelbling, L., Littman, M.,& Moore, A. (1996). Reinforcement learning:Asurvey. Journal of Artificial Intelligence Research, 4, 237–285.
Karalic, A.& Bratko, I. (1997). First order regression. Machine Learning, 26, 147–176.
Koenig, S.& Simmons, R. G. (1996). The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Machine Learning, 22, 227–250.
Kramer, S. (1996). Structural regression trees. In Proc. 13th Natl. Conf. on Artificial Intelligence (pp. 812–819). Menlo Park, CA: AAAI Press.
Langley, P. (1985). Strategy acquisition governed by experimentation. In L. Steels & J. A. Campbell (Eds.), Progress in artificial intelligence (P. 52). Chichester: Ellis Horwood.
Langley, P. (1996). Elements of machine learning. San Matco, CA: Morgan Kaufmann.
Lavrač, N.& Džeroski, S. (1994). Inductive logic programming: Techniques and applications. Chichester: Ellis Horwood.
Lin, L.-J. (1992). Self-Improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
Mitchell, T., Keller, R.,& Kedar-Cabelli, S. (1986). Explanation based generalization: A unifying view. Machine Learning, 1(1), 47–80.
Mitchell, T., Utgoff, P. E.,& Banerji, R. (1984). Learning by experimentation: Acquiring and refining problemsolving heuristics. In R. S. Michalski, J. G. Carbonell,& T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. Springer-Verlag. Palo Alto, CA: Tioga.
Mooney, R. J.& Califf, M. E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, (3), 1–24.
Muggleton, S.& De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679.
Nedellec, C., Rouveirol, C., Adé, H., Bergadano, F.,& Tausend, B. (1996). Declarative bias in ILP. In L. De Raedt (Ed.), Advances in inductive logic programming (pp. 82–103). Amsterdam: IOS Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Quinlan, J. R. (1993). C 4.5: Programs for machine learning. Morgan Kaufmann.
Stone, P.& Veloso, M. (1999). Team partitioned, opaque transition reinforcement learning. In Proc. Third Annual Conference on Autonomous Agents (pp. 206–212). San Matco: Morgan Kaufmann. ACM Press.
Sutton, R. S., Precup, D.,& Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Tesauro, G. (1995). Temporal difference learning and TD-GAMMON. Communications of the ACM, 38(3), 58–68.
Utgoff, P. E., Berkman, N. C.,& Clause, J. A. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29, 5–44.
Watkins, C.& Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Widmer, G.& Kubat, M. (Eds.) (1998). Special issue on context sensitivity and concept drift. Machine Learning, 32(2), 83–201.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Džeroski, S., De Raedt, L. & Driessens, K. Relational Reinforcement Learning. Machine Learning 43, 7–52 (2001). https://doi.org/10.1023/A:1007694015589
Issue Date:
DOI: https://doi.org/10.1023/A:1007694015589