Learning to Act Optimally in Partially Observable Markov Decision Processes Using Hybrid Probabilistic Logic Programs

  • Emad Saad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6929)


We present a probabilistic logic programming framework to reinforcement learning, by integrating reinforcement learning, in POMDP environments, with normal hybrid probabilistic logic programs with probabilistic answer set semantics, that is capable of representing domain-specific knowledge. We formally prove the correctness of our approach. We show that the complexity of finding a policy for a reinforcement learning problem in our approach is NP-complete. In addition, we show that any reinforcement learning problem can be encoded as a classical logic program with answer set semantics. We also show that a reinforcement learning problem can be encoded as a SAT problem. We present a new high level action description language that allows the factored representation of POMDP. Moreover, we modify the original model of POMDP so that it be able to distinguish between knowledge producing actions and actions that change the environment.


Logic Program Optimal Policy Markov Decision Process Belief State Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baral, C., Tran, N., Tuan, L.C.: Reasoning about actions in a probabilistic setting. In: AAAI 2002 (2002)Google Scholar
  2. 2.
    Bagnell, J., Kakade, S., Ng, A., Schneider, J.: Policy search by dynamic programming. In: Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2003)Google Scholar
  3. 3.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. Journal of AI Research 11, 1–94 (1999)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDPs. In: 17th IJCAI (2001)Google Scholar
  5. 5.
    Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. In: 2nd ICAIPS (1994)Google Scholar
  6. 6.
    Eiter, T., Lukasiewicz, T.: Probabilistic reasoning about actions in nonmonotonic causal theories. In: 19th Conference on Uncertainty in Artificial Intelligence (2003)Google Scholar
  7. 7.
    Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: ICSLP. MIT Press, Cambridge (1988)Google Scholar
  8. 8.
    Gelfond, M., Lifschitz, V.: Classical negation in logic programs and disjunctive databases. New Generation Computing 9(3-4), 363–385 (1991)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gelfond, M., Lifschitz, V.: Representing action and change by logic programs. Journal of Logic Programming 17, 301–321 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Iocchi, L., Lukasiewicz, T., Nardi, D., Rosati, R.: Reasoning about actions with sensing under qualitative and probabilistic uncertainty. In: 16th ECAI (2004)Google Scholar
  11. 11.
    Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kaelbling, L., Littman, M., Moore, A.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  13. 13.
    Kautz, H., Selman, B.: Pushing the envelope: planning, propositional logic, and stochastic search. In: 13th National Conference on Artificial Intelligence (1996)Google Scholar
  14. 14.
    Kersting, K., De Raedt, L.: Logical Markov decision programs and the convergence of logical TD(λ). In: 14th International Conference on Inductive Logic Programming (2004)Google Scholar
  15. 15.
    Kushmerick, N., Hanks, S., Weld, D.: An algorithm for probabilistic planning. Artificial Intelligence 76(1-2), 239–286 (1995)CrossRefGoogle Scholar
  16. 16.
    Lin, F., Zhao, Y.: ASSAT: Computing answer sets of a logic program by SAT solvers. Artificial Intelligence 157(1-2), 115–137 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Littman, M., Cassandra, A., Kaelbling, L.: Learning policies for partially observable environments: scaling up. In: 12th ICML (1995)Google Scholar
  18. 18.
    Majercik, S., Littman, M.: Contingent planning under uncertainty via stochastic satisfiability. Artificial Intelligence 147(1–2), 119–162 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Mundhenk, M., Goldsmith, J., Lusena, C., Allender, E.: Complexity of finite-horizon Markov decision process problems. Journal of the ACM (2000)Google Scholar
  20. 20.
    Niemela, I., Simons, P.: Efficient implementation of the well-founded and stable model semantics. In: Joint ICSLP, pp. 289–303 (1996)Google Scholar
  21. 21.
    Saad, E.: Incomplete knowlege in hybrid probabilistic logic programs. In: 10th European Conference on Logics in Artificial Intelligence (2006)Google Scholar
  22. 22.
    Saad, E.: Probabilistic planning with imperfect sensing actions using hybrid probabilistic logic programs. In: Godo, L., Pugliese, A. (eds.) SUM 2009. LNCS, vol. 5785, pp. 206–222. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Saad, E.: A logical framework to reinforcement learning using hybrid probabilistic logic programs. In: 2nd International Conference on Scalable Uncertainty Management (2008)Google Scholar
  24. 24.
    Saad, E.: On the relationship between hybrid probabilistic logic programs and stochastic satisfiability. In: 2nd International Conference on Scalable Uncertainty Management (2008)Google Scholar
  25. 25.
    Saad, E.: Probabilistic planning in hybrid probabilistic logic programs. In: 1st International Conference on Scalable Uncertainty Management (2007)Google Scholar
  26. 26.
    Saad, E., Pontelli, E.: A new approach to hybrid probabilistic logic programs. Annals of Mathematics and Artificial Intelligence Journal 48(3-4), 187–243 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Scherl, R., Levesque, H.: The frame problem and knowledge producing actions. In: AAAI 1993 (1993)Google Scholar
  28. 28.
    Son, T., Baral, C., Nam, T., McIlraith, S.: Domain-dependent knowledge in answer set planning. ACM Transactions on Computational Logic 7(4), 613–657 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Emad Saad
    • 1
  1. 1.Department of Computer ScienceGulf University for Science and TechnologyMishrefKuwait

Personalised recommendations