Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Symbolic Dynamic Programming

  • Scott Sanner
  • Kristian Kersting
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_806



Symbolic dynamic programming (SDP) is a generalization of the  dynamic programming technique for solving  Markov decision processes (MDPs) that exploits the symbolic structure in the solution of relational and first-order logical MDPs through a lifted version of dynamic programming.

Motivation and Background

Decision-theoretic planning aims at constructing a policy for acting in an uncertain environment that maximizes an agent’s expected utility along a sequence of steps. For this task, Markov decision processes (MDPs) have become the standard model. However, classical dynamic programming algorithms for solving MDPs require explicit state and action enumeration, which is often impractical: the number of states and actions grows very quickly with the number of domain objects and relations. In contrast, SDP algorithms seek to avoid explicit state and action...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.zbMATHGoogle Scholar
  2. Boutilier, C., Reiter, R., & Price, B. (2001). Symbolic dynamic programming for first-order MDPs. In IJCAI-01 (pp.690–697) Seattle.Google Scholar
  3. Fikes, R. E., & Nilsson, N. J. (1971). STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2, 189–208.zbMATHCrossRefGoogle Scholar
  4. Fern, A., Yoon, S., & Givan, R. (2003). Approximate policy iteration with a policy language bias. In NIPS-2003. Vancouver.Google Scholar
  5. Gretton, C., & Thiebaux, S. (2004). Exploiting first-order regression in inductive policy selection. In UAI-04. (pp.217–225) Banff, Canada.Google Scholar
  6. Guestrin, C., Koller, D., Gearhart, C., & Kanodia, N. (2003). Generalizing plans to new environments in relational MDPs. In IJCAI-03. Acapulco, Mexico.Google Scholar
  7. Hölldobler, S., & Skvortsova, O. (2004). A logic-based approach to dynamic programming. In AAAI-04 Workshop on Learning and Planning in MDPs (pp.31–36). Menlo Park, CA.Google Scholar
  8. Karabaev, E., & Skvortsova, O. (2005). A heuristic search algorithm for solving first-order MDPs. In UAI-2005 (pp.292–299). Edinburgh, Scotland.Google Scholar
  9. Kersting, K., van Otterlo, M., & De Raedt, L. (2004). Bellman goes relational. In ICML-04. New York ACM Press.Google Scholar
  10. Kushmerick, N., Hanks, S., & Weld, D. (1995). An algorithm for probabilistic planning. Artificial Intelligence, 76, 239–286.CrossRefGoogle Scholar
  11. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.zbMATHGoogle Scholar
  12. Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. In UAI-2005. Edinburgh, Scotland.Google Scholar
  13. Sanner, S., & Boutilier, C. (2006) Practical linear evaluation techniques for first-order MDPs. In UAI-2006. Boston.Google Scholar
  14. Sanner, S., & Boutilier, C. (2007). Approximate solution techniques for factored first-order MDPs. In ICAPS-07. Providence, RI. pp. 288–295.Google Scholar
  15. Wang, C., Joshi, S., & Khardon, R. (2007). First order decision diagrams for relational MDPs. In IJCAI. Hyderabad, India.Google Scholar
  16. Wang, C., & Khardon, R. (2007). Policy iteration for relational MDPs. In UAI. Vancouver, Canada.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Scott Sanner
  • Kristian Kersting

There are no affiliations available