Logical Markov Decision Programs and the Convergence of Logical TD(λ)

  • Kristian Kersting
  • Luc De Raedt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3194)


Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Decision Processes (MDPs) with Logic Programs. Using LOMDPs one can compactly and declaratively represent complex MDPs. Within this framework we then devise a relational upgrade of TD(λ) called logical TD(λ) and prove convergence. Experiments validate our approach.


Abstract State Markov Decision Process Abstraction Level Inductive Logic Programming Policy Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andre, D., Russell, S.: Programmable reinforcement learning agents. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1019–1025. MIT Press, Cambridge (2001)Google Scholar
  2. 2.
    Baum, E.B.: Towards a Model of Intelligence as an Economy of Agents. Machine Learning 35(2), 155–185 (1999)zbMATHCrossRefGoogle Scholar
  3. 3.
    Boutilier, C., Deam, T., Hanks, S.: Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. JAIR 11, 1–94 (1999)zbMATHGoogle Scholar
  4. 4.
    Boutilier, C., Reiter, R., Price, B.: Symbolic Dynamic Programming for Firstorder MDPs. In: Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001) Seattle, USA, pp. 690–700 (2001)Google Scholar
  5. 5.
    Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, vol. 7 (1995)Google Scholar
  6. 6.
    De Raedt, L., Kersting, K.: Probabilistic Logic Learning. ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining 5(1), 31–48 (2003)Google Scholar
  7. 7.
    Dearden, R., Boutilier, C.: Abstraction and approximate decision theoretic planning. Artificial Intelligence 89(1), 219–283 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Driessens, K., Ramon, J.: Relational Instance Based Regression for Relational Reinforcement Learning. In: Proceedings of the Twelfth International Conference on Machine Learning, Washington DC, USA, pp. 123–130 (2003)Google Scholar
  10. 10.
    Ďzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1/2), 7–52 (2001)zbMATHCrossRefGoogle Scholar
  11. 11.
    Ďzeroski, S., Lavrač, N.: Relational Data Mining. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  12. 12.
    Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. In: Proceedings of the Neural Information Processing Conference, NIPS (2003)Google Scholar
  13. 13.
    Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: Proceedings of the Eighteenth International Conference on Uncertainty in Artificial Intelligence, UAI 2002 (2002)Google Scholar
  14. 14.
    Flach, P.: Simply logical: intelligent reasoning by example. John Wiley and Sons, Chichester (1994)zbMATHGoogle Scholar
  15. 15.
    Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 1300–1309. Morgan Kaufmann, San Francisco (1999)Google Scholar
  16. 16.
    Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147, 163–224 (2003)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Gordon, G.J.: Stable fitted reinforcement learning. In: Advances in Neural Information Processing, pp. 1052–1058. MIT Press, Cambridge (1996)Google Scholar
  18. 18.
    Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing Plans to New Environments in Relational MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico (2003)Google Scholar
  19. 19.
    Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: Working Notes of the AAAI Stanford Spring Symposium on Learning Grounded Representations (2001)Google Scholar
  20. 20.
    Kersting, K., De Raedt, L.: Logical markov decision programs. In: Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL 2003), pp. 63–70 (2003)Google Scholar
  21. 21.
    Kersting, K., Van Otterlo, M., De Raedt, L.: Bellman goes Relational. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada , July 4-8 (2004) (to appear)Google Scholar
  22. 22.
    Kim, K.-E., Dean, T.: Solving factored mdps using non-homogeneous partitions. Artificial Intelligence 147, 225–251 (2003)zbMATHMathSciNetGoogle Scholar
  23. 23.
    McCallum, K.: Reinforcement Learning with Selective Perception and Hidden States. PhD thesis, Department of Computer Science, University of Rochester (1995)Google Scholar
  24. 24.
    Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Munos, R., Moore, A.: Influence and Variance of a Markov Chain: Application to Adaptive Discretization in Optimal Control. In: Proceedings of the IEEE Conference on Decision and Control (1999)Google Scholar
  26. 26.
    Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1–2), 7–56 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester (1994)zbMATHGoogle Scholar
  28. 28.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Advances in Neural Information Processing 7, pp. 361–268. MIT Press, Cambridge (1994)Google Scholar
  29. 29.
    Slaney, J., Thiébaux, S.: Blocks World revisited. Artificial Intelligence 125, 119–153 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  31. 31.
    Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions of Automatic Control 42, 674–690 (1997)zbMATHCrossRefGoogle Scholar
  33. 33.
    Van Otterlo, M.: Reinforcement Learning for Relational MDPs. In: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands (2004)Google Scholar
  34. 34.
    Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)Google Scholar
  35. 35.
    Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, UAI (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Kristian Kersting
    • 1
  • Luc De Raedt
    • 1
  1. 1.Institute for Computer Science, Machine Learning LabAlbert-Ludwigs-UniversityFreiburg i. Brg.Germany

Personalised recommendations