Solving Relational and First-Order Logical Markov Decision Processes: A Survey

Part of the Adaptation, Learning, and Optimization book series (ALO, volume 12)

Abstract

In this chapter we survey representations and techniques for Markov decision processes, reinforcement learning, and dynamic programming in worlds explicitly modeled in terms of objects and relations. Such relational worlds can be found everywhere in planning domains, games, real-world indoor scenes and many more. Relational representations allow for expressive and natural datastructures that capture the objects and relations in an explicit way, enabling generalization over objects and relations, but also over similar problems which differ in the number of objects. The field was recently surveyed completely in (van Otterlo, 2009b), and here we describe a large portion of the main approaches. We discuss model-free – both value-based and policy-based – and model-based dynamic programming techniques. Several other aspects will be covered, such as models and hierarchies, and we end with several recent efforts and future directions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)Google Scholar
  2. Alpaydin, E.: Introduction to Machine Learning. The MIT Press, Cambridge (2004)Google Scholar
  3. Andersen, C.C.S.: Hierarchical relational reinforcement learning. Master’s thesis, Aalborg University, Denmark (2005)Google Scholar
  4. Asgharbeygi, N., Stracuzzi, D.J., Langley, P.: Relational temporal difference learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 49–56 (2006)Google Scholar
  5. Aycenina, M.: Hierarchical relational reinforcement learning. In: Stanford Doctoral Symposium (2002) (unpublished)Google Scholar
  6. Baum, E.B.: Toward a model of intelligence as an economy of agents. Machine Learning 35(2), 155–185 (1999)MATHCrossRefGoogle Scholar
  7. Baum, E.B.: What is Thought? The MIT Press, Cambridge (2004)Google Scholar
  8. Bergadano, F., Gunetti, D.: Inductive Logic Programming: From Machine Learning to Software Engineering. The MIT Press, Cambridge (1995)Google Scholar
  9. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  10. Boutilier, C., Poole, D.: Computing optimal policies for partially observable markov decision processes using compact representations. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1168–1175 (1996)Google Scholar
  11. Boutilier, C., Dean, T., Hanks, S.: Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)MathSciNetMATHGoogle Scholar
  12. Boutilier, C., Dearden, R.W., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121(1-2), 49–107 (2000)MathSciNetMATHCrossRefGoogle Scholar
  13. Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDP’s. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 690–697 (2001)Google Scholar
  14. Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Proceedings of the Neural Information Processing Conference (NIPS), pp. 369–376 (1995)Google Scholar
  15. Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Morgan Kaufmann Publishers, San Francisco (2004)Google Scholar
  16. Castilho, M.A., Kunzle, L.A., Lecheta, E., Palodeto, V., Silva, F.: An Investigation on Genetic Algorithms for Generic STRIPS Planning. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 185–194. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 726–731 (1991)Google Scholar
  18. Chen, J., Muggleton, S.: Decision-theoretic logic programs. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2010)Google Scholar
  19. Cocora, A., Kersting, K., Plagemann, C., Burgard, W., De Raedt, L.: Learning relational navigation policies. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2006)Google Scholar
  20. Cole, J., Lloyd, J.W., Ng, K.S.: Symbolic learning for adaptive agents. In: Proceedings of the Annual Partner Conference, Smart Internet Technology Cooperative Research Centre (2003), http://csl.anu.edu.au/jwl/crc_paper.pdf
  21. Croonenborghs, T.: Model-assisted approaches for relational reinforcement learning. PhD thesis, Department of Computer Science, Catholic University of Leuven, Belgium (2009)Google Scholar
  22. Croonenborghs, T., Driessens, K., Bruynooghe, M.: Learning relational options for inductive transfer in relational reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007a)Google Scholar
  23. Croonenborghs, T., Ramon, J., Blockeel, H., Bruynooghe, M.: Online learning and exploiting relational models in reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 726–731 (2007b)Google Scholar
  24. Dabney, W., McGovern, A.: Utile distinctions for relational reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 738–743 (2007)Google Scholar
  25. de la Rosa, T., Jimenez, S., Borrajo, D.: Learning relational decision trees for guiding heuristic planning. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2008)Google Scholar
  26. De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)MATHCrossRefGoogle Scholar
  27. Dietterich, T.G., Flann, N.S.: Explanation-based learning and reinforcement learning: A unified view. Machine Learning 28(503), 169–210 (1997)CrossRefGoogle Scholar
  28. Diuk, C.: An object-oriented representation for efficient reinforcement learning. PhD thesis, Rutgers University, Computer Science Department (2010)Google Scholar
  29. Diuk, C., Cohen, A., Littman, M.L.: An object-oriented representation for efficient reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)Google Scholar
  30. Driessens, K., Blockeel, H.: Learning Digger using hierarchical reinforcement learning for concurrent goals. In: Proceedings of the European Workshop on Reinforcement Learning, EWRL (2001)Google Scholar
  31. Driessens, K., Džeroski, S.: Integrating experimentation and guidance in relational reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 115–122 (2002)Google Scholar
  32. Driessens, K., Džeroski, S.: Combining model-based and instance-based learning for first order regression. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 193–200 (2005)Google Scholar
  33. Driessens, K., Ramon, J.: Relational instance based regression for relational reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 123–130 (2003)Google Scholar
  34. Driessens, K., Ramon, J., Blockeel, H.: Speeding Up Relational Reinforcement Learning Through the Use of an Incremental First Order Decision Tree Learner. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 97–108. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  35. Džeroski, S., De Raedt, L., Blockeel, H.: Relational reinforcement learning. In: Shavlik, J. (ed.) Proceedings of the International Conference on Machine Learning (ICML), pp. 136–143 (1998)Google Scholar
  36. Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001)MATHCrossRefGoogle Scholar
  37. Feng, Z., Dearden, R.W., Meuleau, N., Washington, R.: Dynamic programming for structured continuous Markov decision problems. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 154–161 (2004)Google Scholar
  38. Fern, A., Yoon, S.W., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research (JAIR) 25, 75–118 (2006); special issue on the International Planning Competition 2004 MathSciNetMATHGoogle Scholar
  39. Fern, A., Yoon, S.W., Givan, R.: Reinforcement learning in relational domains: A policy-language approach. The MIT Press, Cambridge (2007)Google Scholar
  40. Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(2) (1971)Google Scholar
  41. Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried Didn’t work very well: Deictic representations in reinforcement learning. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 154–161 (2002)Google Scholar
  42. Finzi, A., Lukasiewicz, T.: Game-theoretic agent programming in Golog. In: Proceedings of the European Conference on Artificial Intelligence (ECAI) (2004a)Google Scholar
  43. Finzi, A., Lukasiewicz, T.: Relational Markov Games. In: Alferes, J.J., Leite, J. (eds.) JELIA 2004. LNCS (LNAI), vol. 3229, pp. 320–333. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  44. García-Durán, R., Fernández, F., Borrajo, D.: Learning and transferring relational instance-based policies. In: Proceedings of the AAAI-2008 Workshop on Transfer Learning for Complex Tasks (2008)Google Scholar
  45. Gardiol, N.H., Kaelbling, L.P.: Envelope-based planning in relational MDPs. In: Proceedings of the Neural Information Processing Conference (NIPS) (2003)Google Scholar
  46. Gardiol, N.H., Kaelbling, L.P.: Adaptive envelope MDPs for relational equivalence-based planning. Tech. Rep. MIT-CSAIL-TR-2008-050, MIT CS & AI Lab, Cambridge, MA (2008)Google Scholar
  47. Gärtner, T., Driessens, K., Ramon, J.: Graph kernels and Gaussian processes for relational reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2003)Google Scholar
  48. Gearhart, C.: Genetic programming as policy search in Markov decision processes. In: Genetic Algorithms and Genetic Programming at Stanford, pp. 61–67 (2003)Google Scholar
  49. Geffner, H., Bonet, B.: High-level planning and control with incomplete information using pomdps. In: Proceedings Fall AAAI Symposium on Cognitive Robotics (1998)Google Scholar
  50. Gil, Y.: Learning by experimentation: Incremental refinement of incomplete planning domains. In: Proceedings of the International Conference on Machine Learning (ICML) (1994)Google Scholar
  51. Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 261–268 (1995)Google Scholar
  52. Gretton, C.: Gradient-based relational reinforcement-learning of temporally extended policies. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007a)Google Scholar
  53. Gretton, C.: Gradient-based relational reinforcement learning of temporally extended policies. In: Workshop on Artificial Intelligence Planning and Learning at the International Conference on Automated Planning Systems (2007b)Google Scholar
  54. Gretton, C., Thiébaux, S.: Exploiting first-order regression in inductive policy selection. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 217–225 (2004a)Google Scholar
  55. Gretton, C., Thiébaux, S.: Exploiting first-order regression in inductive policy selection (extended abstract). In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004b)Google Scholar
  56. Groote, J.F., Tveretina, O.: Binary decision diagrams for first-order predicate logic. The Journal of Logic and Algebraic Programming 57, 1–22 (2003)MathSciNetMATHCrossRefGoogle Scholar
  57. Grounds, M., Kudenko, D.: Combining Reinforcement Learning with Symbolic Planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds.) ALAMAS 2005, ALAMAS 2006, and ALAMAS 2007. LNCS (LNAI), vol. 4865, pp. 75–86. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  58. Guestrin, C.: Planning under uncertainty in complex structured environments. PhD thesis, Computer Science Department, Stanford University (2003)Google Scholar
  59. Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing plans to new environments in relational MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1003–1010 (2003a)Google Scholar
  60. Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research (JAIR) 19, 399–468 (2003b)MathSciNetMATHGoogle Scholar
  61. Halbritter, F., Geibel, P.: Learning Models of Relational MDPs Using Graph Kernels. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 409–419. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  62. Hanks, S., McDermott, D.V.: Modeling a dynamic and uncertain world I: Symbolic and probabilistic reasoning about change. Artificial Intelligence 66(1), 1–55 (1994)MathSciNetMATHCrossRefGoogle Scholar
  63. Guerra-Hernández, A., Fallah-Seghrouchni, A.E., Soldano, H.: Learning in BDI Multi-Agent Systems. In: Dix, J., Leite, J. (eds.) CLIMA 2004. LNCS (LNAI), vol. 3259, pp. 218–233. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  64. Hernández, J., Morales, E.F.: Relational reinforcement learning with continuous actions by combining behavioral cloning and locally weighted regression. Journal of Intelligent Systems and Applications 2, 69–79 (2010)CrossRefGoogle Scholar
  65. Häming, K., Peters, G.: Relational Reinforcement Learning Applied to Appearance-Based Object Recognition. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds.) EANN 2009. Communications in Computer and Information Science, vol. 43, pp. 301–312. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  66. Hölldobler, S., Skvortsova, O.: A logic-based approach to dynamic programming. In: Proceedings of the AAAI Workshop on Learning and Planning in Markov Processes - Advances and Challenges (2004)Google Scholar
  67. Itoh, H., Nakamura, K.: Towards learning to learn and plan by relational reinforcement learning. In: Proceedings of the ICML Workshop on Relational Reinforcement Learning (2004)Google Scholar
  68. Joshi, S.: First-order decision diagrams for decision-theoretic planning. PhD thesis, Tufts University, Computer Science Department (2010)Google Scholar
  69. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)MathSciNetMATHCrossRefGoogle Scholar
  70. Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: The AAAI Spring Symposium (2001)Google Scholar
  71. Karabaev, E., Skvortsova, O.: A heuristic search algorithm for solving first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2005)Google Scholar
  72. Karabaev, E., Rammé, G., Skvortsova, O.: Efficient symbolic reasoning for first-order MDPs. In: ECAI Workshop on Planning, Learning and Monitoring with Uncertainty and Dynamic Worlds (2006)Google Scholar
  73. Katz, D., Pyuro, Y., Brock, O.: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Proceedings of Robotics: Science and Systems IV (2008)Google Scholar
  74. Kersting, K., De Raedt, L.: Logical Markov decision programs and the convergence of TD(λ). In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2004)Google Scholar
  75. Kersting, K., Driessens, K.: Non-parametric gradients: A unified treatment of propositional and relational domains. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)Google Scholar
  76. Kersting, K., van Otterlo, M., De Raedt, L.: Bellman goes relational. In: Proceedings of the International Conference on Machine Learning (ICML) (2004)Google Scholar
  77. Khardon, R.: Learning to take actions. Machine Learning 35(1), 57–90 (1999)MATHMathSciNetCrossRefGoogle Scholar
  78. Kochenderfer, M.J.: Evolving Hierarchical and Recursive Teleo-Reactive Programs Through Genetic Programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 83–92. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  79. Lane, T., Wilson, A.: Toward a topological theory of relational reinforcement learning for navigation tasks. In: Proceedings of the International Florida Artificial Intelligence Research Society Conference (FLAIRS) (2005)Google Scholar
  80. Lang, T., Toussaint, M.: Approximate inference for planning in stochastic relational worlds. In: Proceedings of the International Conference on Machine Learning (ICML) (2009)Google Scholar
  81. Lang, T., Toussaint, M.: Probabilistic backward and forward reasoning in stochastic relational worlds. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)Google Scholar
  82. Langley, P.: Cognitive architectures and general intelligent systems. AI Magazine 27, 33–44 (2006)Google Scholar
  83. Lanzi, P.L.: Learning classifier systems from a reinforcement learning perspective. Soft Computing 6, 162–170 (2002)MATHCrossRefGoogle Scholar
  84. Lecoeuche, R.: Learning optimal dialogue management rules by using reinforcement learning and inductive logic programming. In: Proceedings of the North American Chapter of the Association for Computational Linguistics, NAACL (2001)Google Scholar
  85. Letia, I., Precup, D.: Developing collaborative Golog agents by reinforcement learning. In: Proceedings of the 13th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2001). IEEE Computer Society (2001)Google Scholar
  86. Levine, J., Humphreys, D.: Learning Action Strategies for Planning Domains Using Genetic Programming. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 684–695. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  87. Lison, P.: Towards relational POMDPs for adaptive dialogue management. In: ACL 2010: Proceedings of the ACL 2010 Student Research Workshop, pp. 7–12. Association for Computational Linguistics, Morristown (2010)Google Scholar
  88. Littman, M.L., Sutton, R.S., Singh, S.: Predictive representations of state. In: Proceedings of the Neural Information Processing Conference (NIPS) (2001)Google Scholar
  89. Lloyd, J.W.: Logic for Learning: Learning Comprehensible Theories From Structured Data. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  90. Martin, M., Geffner, H.: Learning generalized policies in planning using concept languages. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR) (2000)Google Scholar
  91. Mausam, Weld, D.S.: Solving relational MDPs with first-order machine learning. In: Workshop on Planning under Uncertainty and Incomplete Information at ICAPS 2003 (2003)Google Scholar
  92. McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 387–395 (1995)Google Scholar
  93. Mellor, D.: A Learning Classifier System Approach to Relational Reinforcement Learning. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006 and IWLCS 2007. LNCS (LNAI), vol. 4998, pp. 169–188. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  94. Minker, J.: Logic-Based Artificial Intelligence. Kluwer Academic Publishers Group, Dordrecht (2000)MATHCrossRefGoogle Scholar
  95. Minton, S., Carbonell, J., Knoblock, C.A., Kuokka, D.R., Etzioni, O., Gil, Y.: Explanation-based learning: A problem solving perspective. Artificial Intelligence 40(1-3), 63–118 (1989)CrossRefGoogle Scholar
  96. Mooney, R.J., Califf, M.E.: Induction of first-order decision lists: Results on learning the past tense of english verbs. Journal of Artificial Intelligence Research (JAIR) 3, 1–24 (1995)Google Scholar
  97. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1), 103–130 (1993)Google Scholar
  98. Morales, E.F.: Scaling up reinforcement learning with a relational representation. In: Proceedings of the Workshop on Adaptability in Multi-Agent Systems at AORC 2003, Sydney (2003)Google Scholar
  99. Morales, E.F.: Learning to fly by combining reinforcement learning with behavioral cloning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 598–605 (2004)Google Scholar
  100. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research (JAIR) 11, 241–276 (1999)MATHGoogle Scholar
  101. Mourão, K., Petrick, R.P.A., Steedman, M.: Using kernel perceptrons to learn action effects for planning. In: Proceedings of the International Conference on Cognitive Systems (CogSys), pp. 45–50 (2008)Google Scholar
  102. Muller, T.J., van Otterlo, M.: Evolutionary reinforcement learning in relational domains. In: Proceedings of the 7th European Workshop on Reinforcement Learning (2005)Google Scholar
  103. Nason, S., Laird, J.E.: Soar-RL: Integrating reinforcement learning with soar. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004)Google Scholar
  104. Nath, A., Domingos, P.: A language for relational decision theory. In: International Workshop on Statistical Relational Learning, SRL (2009)Google Scholar
  105. Neruda, R., Slusny, S.: Performance comparison of two reinforcement learning algorithms for small mobile robots. International Journal of Control and Automation 2(1), 59–68 (2009)Google Scholar
  106. Oates, T., Cohen, P.R.: Learning planning operators with conditional and probabilistic effects. In: Planning with Incomplete Information for Robot Problems: Papers from the 1996 AAAI Spring Symposium, pp. 86–94 (1996)Google Scholar
  107. Pasula, H.M., Zettlemoyer, L.S., Kaelbling, L.P.: Learning probabilistic planning rules. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2004)Google Scholar
  108. Poole, D.: The independent choice logic for modeling multiple agents under uncertainty. Artificial Intelligence 94, 7–56 (1997)MathSciNetMATHCrossRefGoogle Scholar
  109. Ramon, J., Driessens, K., Croonenborghs, T.: Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 699–707. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  110. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press, Cambridge (2001)Google Scholar
  111. Rodrigues, C., Gerard, P., Rouveirol, C.: On and off-policy relational reinforcement learning. In: Late-Breaking Papers of the International Conference on Inductive Logic Programming (2008)Google Scholar
  112. Rodrigues, C., Gérard, P., Rouveirol, C.: IncremEntal Learning of Relational Action Models in Noisy Environments. In: Frasconi, P., Lisi, F.A. (eds.) ILP 2010. LNCS, vol. 6489, pp. 206–213. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  113. Roncagliolo, S., Tadepalli, P.: Function approximation in hierarchical relational reinforcement learning. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML (2004)Google Scholar
  114. Russell, S.J., Norvig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice Hall, New Jersey (2003)Google Scholar
  115. Ryan, M.R.K.: Using abstract models of behaviors to automatically generate reinforcement learning hierarchies. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 522–529 (2002)Google Scholar
  116. Saad, E.: A Logical Framework to Reinforcement Learning Using Hybrid Probabilistic Logic Programs. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 341–355. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  117. Safaei, J., Ghassem-Sani, G.: Incremental learning of planning operators in stochastic domains. In: Proceedings of the International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pp. 644–655 (2007)Google Scholar
  118. Sanner, S.: Simultaneous learning of structure and value in relational reinforcement learning. In: Driessens, K., Fern, A., van Otterlo, M. (eds.) Proceedings of the ICML-2005 Workshop on Rich Representations for Reinforcement Learning (2005)Google Scholar
  119. Sanner, S.: Online feature discovery in relational reinforcement learning. In: Proceedings of the ICML-2006 Workshop on Open Problems in Statistical Relational Learning (2006)Google Scholar
  120. Sanner, S., Boutilier, C.: Approximate linear programming for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2005)Google Scholar
  121. Sanner, S., Boutilier, C.: Practical linear value-approximation techniques for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2006)Google Scholar
  122. Sanner, S., Boutilier, C.: Approximate solution techniques for factored first-order MDPs. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007)Google Scholar
  123. Sanner, S., Kersting, K.: Symbolic dynamic programming for first-order pomdps. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)Google Scholar
  124. Schmid, U.: Inductive synthesis of functional programs: Learning domain-specific control rules and abstraction schemes. In: Habilitationsschrift, Fakultät IV, Elektrotechnik und Informatik, Technische Universität Berlin, Germany (2001)Google Scholar
  125. Schuurmans, D., Patrascu, R.: Direct value approximation for factored MDPs. In: Proceedings of the Neural Information Processing Conference (NIPS) (2001)Google Scholar
  126. Shapiro, D., Langley, P.: Separating skills from preference. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 570–577 (2002)Google Scholar
  127. Simpkins, C., Bhat, S., Isbell, C.L., Mateas, M.: Adaptive Programming: Integrating Reinforcement Learning into a Programming Language. In: Proceedings of the Twenty-Third ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA (2008)Google Scholar
  128. Slaney, J., Thiébaux, S.: Blocks world revisited. Artificial Intelligence 125, 119–153 (2001)MathSciNetMATHCrossRefGoogle Scholar
  129. Song, Z.W., Chen, X.P.: States evolution in Θ(λ)-learning based on logical mdps with negation. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1624–1629 (2007)Google Scholar
  130. Song, Z.W., Chen, X.P.: Agent learning in relational domains based on logical mdps with negation. Journal of Computers 3(9), 29–38 (2008)CrossRefGoogle Scholar
  131. Stone, P.: Learning and multiagent reasoning for autonomous agents. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Computers and Thought Award Paper (2007)Google Scholar
  132. Stracuzzi, D.J., Asgharbeygi, N.: Transfer of knowledge structures with relational temporal difference learning. In: Proceedings of the ICML 2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)Google Scholar
  133. Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. The MIT Press, Cambridge (1998)Google Scholar
  134. Sutton, R.S., McAllester, D.A., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the Neural Information Processing Conference (NIPS), pp. 1057–1063 (2000)Google Scholar
  135. Thielscher, M.: Introduction to the Fluent Calculus. Electronic Transactions on Artificial Intelligence 2(3-4), 179–192 (1998)MathSciNetGoogle Scholar
  136. Thon, I., Guttman, B., van Otterlo, M., Landwehr, N., De Raedt, L.: From non-deterministic to probabilistic planning with the help of statistical relational learning. In: Workshop on Planning and Learning at ICAPS (2009)Google Scholar
  137. Torrey, L.: Relational transfer in reinforcement learning. PhD thesis, University of Wisconsin-Madison, Computer Science Department (2009)Google Scholar
  138. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Relational macros for transfer in reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007)Google Scholar
  139. Torrey, L., Shavlik, J., Natarajan, S., Kuppili, P., Walker, T.: Transfer in reinforcement learning via markov logic networks. In: Proceedings of the AAAI-2008 Workshop on Transfer Learning for Complex Tasks (2008)Google Scholar
  140. Toussaint, M.: Probabilistic inference as a model of planned behavior. Künstliche Intelligenz (German Artificial Intelligence Journal) 3 (2009)Google Scholar
  141. Toussaint, M., Plath, N., Lang, T., Jetchev, N.: Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference. In: IEEE International Conference on Robotics and Automation, ICRA (2010)Google Scholar
  142. Van den Broeck, G., Thon, I., van Otterlo, M., De Raedt, L.: DTProbLog: A decision-theoretic probabilistic prolog. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)Google Scholar
  143. van Otterlo, M.: Efficient reinforcement learning using relational aggregation. In: Proceedings of the Sixth European Workshop on Reinforcement Learning, Nancy, France (EWRL-6) (2003)Google Scholar
  144. van Otterlo, M.: Reinforcement learning for relational MDPs. In: Nowé, A., Lenaerts, T., Steenhaut, K. (eds.) Machine Learning Conference of Belgium and the Netherlands (BeNeLearn 2004), pp. 138–145 (2004)Google Scholar
  145. van Otterlo, M.: Intensional dynamic programming: A rosetta stone for structured dynamic programming. Journal of Algorithms 64, 169–191 (2009a)MATHCrossRefGoogle Scholar
  146. van Otterlo, M.: The Logic of Adaptive Behavior: Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains. IOS Press, Amsterdam (2009b)Google Scholar
  147. van Otterlo, M., De Vuyst, T.: Evolving and transferring probabilistic policies for relational reinforcement learning. In: Proceedings of the Belgium-Netherlands Artificial Intelligence Conference (BNAIC), pp. 201–208 (2009)Google Scholar
  148. van Otterlo, M., Wiering, M.A., Dastani, M., Meyer, J.J.: A characterization of sapient agents. In: Mayorga, R.V., Perlovsky, L.I. (eds.) Toward Computational Sapience: Principles and Systems, ch. 9. Springer, Heidelberg (2007)Google Scholar
  149. Vargas, B., Morales, E.: Solving navigation tasks with learned teleo-reactive programs, pp. 4185–4185 (2008), doi:10.1109/IROS.2008.4651240Google Scholar
  150. Vargas-Govea, B., Morales, E.: Learning Relational Grammars from Sequences of Actions. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 892–900. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  151. Vere, S.A.: Induction of relational productions in the presence of background information. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 349–355 (1977)Google Scholar
  152. Walker, T., Shavlik, J., Maclin, R.: Relational reinforcement learning via sampling the space of first-order conjunctive features. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004)Google Scholar
  153. Walker, T., Torrey, L., Shavlik, J., Maclin, R.: Building relational world models for reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007)Google Scholar
  154. Walsh, T.J.: Efficient learning of relational models for sequential decision making. PhD thesis, Rutgers University, Computer Science Department (2010)Google Scholar
  155. Walsh, T.J., Littman, M.L.: Efficient learning of action schemas and web-service descriptions. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2008)Google Scholar
  156. Walsh, T.J., Li, L., Littman, M.L.: Transferring state abstractions between mdps. In: ICML-2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)Google Scholar
  157. Wang, C.: First-order markov decision processes. PhD thesis, Department of Computer Science, Tufts University, U.S.A (2007)Google Scholar
  158. Wang, C., Khardon, R.: Policy iteration for relational mdps. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2007)Google Scholar
  159. Wang, C., Khardon, R.: Relational partially observable mdps. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)Google Scholar
  160. Wang, C., Schmolze, J.: Planning with pomdps using a compact, logic-based representation. In: Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, ICTAI (2005)Google Scholar
  161. Wang, C., Joshi, S., Khardon, R.: First order decision diagrams for relational MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2007)Google Scholar
  162. Wang, C., Joshi, S., Khardon, R.: First order decision diagrams for relational MDPs. Journal of Artificial Intelligence Research (JAIR) 31, 431–472 (2008a)MathSciNetMATHGoogle Scholar
  163. Wang, W., Gao, Y., Chen, X., Ge, S.: Reinforcement Learning with Markov Logic Networks. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 230–242. Springer, Heidelberg (2008b)CrossRefGoogle Scholar
  164. Wang, X.: Learning by observation and practice: An incremental approach for planning operator acquisition. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 549–557 (1995)Google Scholar
  165. Wingate, D., Soni, V., Wolfe, B., Singh, S.: Relational knowledge with predictive state representations. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2007)Google Scholar
  166. Wooldridge, M.: An introduction to MultiAgent Systems. John Wiley & Sons Ltd., West Sussex (2002)Google Scholar
  167. Wu, J.H., Givan, R.: Discovering relational domain features for probabilistic planning. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007)Google Scholar
  168. Wu, K., Yang, Q., Jiang, Y.: ARMS: Action-relation modelling system for learning action models. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2005)Google Scholar
  169. Xu, J.Z., Laird, J.E.: Instance-based online learning of deterministic relational action models. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)Google Scholar
  170. Yoon, S.W., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2002)Google Scholar
  171. Zettlemoyer, L.S., Pasula, H.M., Kaelbling, L.P.: Learning planning rules in noisy stochastic worlds. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2005)Google Scholar
  172. Zhao, H., Doshi, P.: Haley: A hierarchical framework for logical composition of web services. In: Proceedings of the International Conference on Web Services (ICWS), pp. 312–319 (2007)Google Scholar
  173. Zhuo, H., Li, L., Bian, R., Wan, H.: Requirement Specification Based on Action Model Learning. In: Huang, D.-S., Heutte, L., Loog, M. (eds.) ICIC 2007. LNCS, vol. 4681, pp. 565–574. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations