Advertisement

Solving Uncertain Markov Decision Problems: An Interval-Based Method

  • Shulin Cui
  • Jigui Sun
  • Minghao Yin
  • Shuai Lu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4222)

Abstract

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with VI, PI, RTDP, LAO* and so on. However, in many practical problems the estimation of the probabilities is far from accurate. In this paper, we present uncertain transition probabilities as close real intervals. Also, we describe a general algorithm, called gLAO*, that can solve uncertain MDPs efficiently. We demonstrate that Buffet and Aberdeen’s approach, searching for the best policy under the worst model, is a special case of our approaches. Experiments show that gLAO* inherits excellent performance of LAO* for solving uncertain MDPs.

Keywords

Optimal Policy Dynamic Programming Algorithm State Transition Probability Policy Iteration Solution Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hansen, E.A., Ziberstein, S.: LAO*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence 129, 35–62 (2001)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Bagnell, J.A., Ng, A.Y., Schneider, J.: Solving uncertain markov decision problems. Technical Report CMU-RI-TR-01-25, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (August 2001)Google Scholar
  3. 3.
    Daram, U.K., Chong, E.K.P., Shroff, N.B.: Markov Decision Processes with Uncertain Transition Rates: Sensitivity and Robust Control. In: Proceedings of the 41st IEEE, Conference on Devision and Control, Las Vegas, Nevada, USA (December 2002)Google Scholar
  4. 4.
    Buffet, O., Aberdeen, D.: Robust planning with (l)rtdp. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005) (2005)Google Scholar
  5. 5.
    Givan, R., Leach, S., Dean, T.: Bounded parameter markov decision processes. Artificial Intelligence 122(1-2), 71–109 (2000)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)Google Scholar
  7. 7.
    Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific, Belmont (1995)MATHGoogle Scholar
  8. 8.
    Martelli, A., Montanari, U.: Optimizing decision trees through heuristically guided search. Comm. ACM 21(12), 1025–1039 (1978)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Barto, A.G., Bradtke, S., Singh, S.: Learning to act using real time dynamic programming. Artificial Intelligence 72 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shulin Cui
    • 1
    • 3
  • Jigui Sun
    • 2
    • 3
  • Minghao Yin
    • 2
    • 3
  • Shuai Lu
    • 2
    • 3
  1. 1.College of SoftwareJilin UniversityChangchunChina
  2. 2.College of Computer Science and TechnologyJilin UniversityChangchunChina
  3. 3.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of EducationJilin UniversityChangchunChina

Personalised recommendations