Solving Uncertain Markov Decision Problems: An Interval-Based Method
Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with VI, PI, RTDP, LAO* and so on. However, in many practical problems the estimation of the probabilities is far from accurate. In this paper, we present uncertain transition probabilities as close real intervals. Also, we describe a general algorithm, called gLAO*, that can solve uncertain MDPs efficiently. We demonstrate that Buffet and Aberdeen’s approach, searching for the best policy under the worst model, is a special case of our approaches. Experiments show that gLAO* inherits excellent performance of LAO* for solving uncertain MDPs.
KeywordsOptimal Policy Dynamic Programming Algorithm State Transition Probability Policy Iteration Solution Graph
Unable to display preview. Download preview PDF.
- 2.Bagnell, J.A., Ng, A.Y., Schneider, J.: Solving uncertain markov decision problems. Technical Report CMU-RI-TR-01-25, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (August 2001)Google Scholar
- 3.Daram, U.K., Chong, E.K.P., Shroff, N.B.: Markov Decision Processes with Uncertain Transition Rates: Sensitivity and Robust Control. In: Proceedings of the 41st IEEE, Conference on Devision and Control, Las Vegas, Nevada, USA (December 2002)Google Scholar
- 4.Buffet, O., Aberdeen, D.: Robust planning with (l)rtdp. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005) (2005)Google Scholar
- 6.Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)Google Scholar
- 9.Barto, A.G., Bradtke, S., Singh, S.: Learning to act using real time dynamic programming. Artificial Intelligence 72 (1995)Google Scholar