Solving Hybrid Markov Decision Processes

  • Alberto Reyes
  • L. Enrique Sucar
  • Eduardo F. Morales
  • Pablo H. Ibargüengoytia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


Markov decision processes (MDPs) have developed as a standard for representing uncertainty in decision-theoretic planning. However, MDPs require an explicit representation of the state space and the probabilistic transition model which, in continuous or hybrid continuous-discrete domains, are not always easy to define. Even when this representation is available, the size of the state space and the number of state variables to consider in the transition function may be such that the resulting MDP cannot be solved using traditional techniques. In this paper a reward-based abstraction for solving hybrid MDPs is presented. In the proposed method, we gather information about the rewards and the dynamics of the system by exploring the environment. This information is used to build a decision tree (C4.5) representing a small set of abstract states with equivalent rewards, and then is used to learn a probabilistic transition function using a Bayesian networks learning algorithm (K2). The system output is a problem specification ready for its solution with traditional dynamic programming algorithms. We have tested our abstract MDP model approximation in real-world problem domains. We present the results in terms of the models learned and their solutions for different configurations showing that our approach produces fast solutions with satisfying policies.


Markov Decision Process Reward Function Dynamic Bayesian Network Qualitative State Motion Planning Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellman, R.E.: Dynamic Programming. Princeton U. Press, Princeton (1957)MATHGoogle Scholar
  2. 2.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. Journal of AI Research 11, 1–94 (1999)MATHMathSciNetGoogle Scholar
  3. 3.
    Boutilier, C., Goldszmidt, M., Sabata, B.: Continuous value function approximation for sequential bidding policies. In: Laskey, K., Prade, H. (eds.) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI 1999), Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  4. 4.
    Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning (1992)Google Scholar
  5. 5.
    Darwiche, A., Goldszmidt, M.: Action networks: A framework for reasoning about actions and change under understanding. In: Proceedings of the Tenth Conf. on Uncertainty in AI, UAI 1994, Seattle, WA, USA, pp. 136–144 (1994)Google Scholar
  6. 6.
    Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proc. of the 14th National Conf. on AI, pp. 106–111. AAAI Press, Menlo Park (1997)Google Scholar
  7. 7.
    Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5, 142–150 (1989)CrossRefGoogle Scholar
  8. 8.
    Dearden, R., Boutillier, R.: Abstraction and approximate decision-theoretic planning. AI 89, 219–283 (1997)MATHGoogle Scholar
  9. 9.
    Feng, Z., Dearden, R., Meuleau, N., Washington, R.: Dynamic programming for structured continuous Markov decision problems. In: Proc. of the 20th Conf. on Uncertainty in AI (UAI-2004), Banff, Canada (2004)Google Scholar
  10. 10.
    Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: Proc. of the Sixteenth International Joint on Artificial Intelligence, IJCAI 1999, Stockolm, Sweden (1999)Google Scholar
  11. 11.
    Li, L., Littman, M.L.: Lazy approximation for solving continuous finite-horizon MDPs. In: AAAI 2005, Pittsburgh, PA, pp. 1175–1180 (2005)Google Scholar
  12. 12.
    Munos, R., Moore, A.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: Dean, T. (ed.) Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), August 1999, pp. 1348–1355. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  13. 13.
    Pineau, J., Gordon, G., Thrun, S.: Policy-contingent abstraction for robust control. In: Proc. of the 19th Conf. on Uncertainty in AI, UAI 2003, pp. 477–484 (2003)Google Scholar
  14. 14.
    Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)MATHCrossRefGoogle Scholar
  15. 15.
    Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  16. 16.
    Reyes, A., Sucar, L.E., Morales, E., Ibarguengoytia, P.H.: Abstract MDPs using qualitative change predicates: An application in power generation. In: Planning under Uncertainty in Real-World Problems Workshop. Neural Information Processing Systems (NIPS-2003), Vancouver CA (winter, 2003)Google Scholar
  17. 17.
    Sallans, B., Hinton, G.E.: Reinforcement learning with factored states and actions. Journal of Machine Learning and Research, 1063–1088 (2004)Google Scholar
  18. 18.
    Suc, D., Bratko, I.: Qualitative reverse engineering. In: Proc. of the 19th International Conf. on Machine Learning (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alberto Reyes
    • 1
  • L. Enrique Sucar
    • 2
  • Eduardo F. Morales
    • 2
  • Pablo H. Ibargüengoytia
    • 1
  1. 1.Instituto de Investigaciones EléctricasCuernavaca, Mor.México
  2. 2.INAOETonantzintla, Pue.México

Personalised recommendations