Suboptimal policy determination for large-scale Markov decision processes, part 2: Implementation and numerical evaluation
- 58 Downloads
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.
Key WordsMarkov decision processes suboptimal design
Unable to display preview. Download preview PDF.
- 1.White, C. C., andPopyack, J. L.,Suboptimal Policy Determination for Large-Scale Markov Decision Processes, Part 1: Description and Bounds, Journal of Optimization Theory and Applications, Vol. 46, pp. 319–341, 1985.Google Scholar
- 2.Popyack, J. L.,Approximating Markov Decision Processes with Multimodule Markov Decision Processes, University of Virginia, Department of Applied Mathematics and Computer Science, PhD Dissertation, 1982.Google Scholar
- 3.Bertsekas, D. P.,Dynamic Programming and Stochastic Control, Academic Press, New York, New York, 1976.Google Scholar
- 4.Mendelssohn, R. A.,An Iterative Aggregation Procedure for Markov Decision Processes, Operations Research, Vol. 30, pp. 62–73, 1982.Google Scholar
- 5.Platzman, L. K., White, C. C., andPopyack, J. L.,Optimally Damped Successive Approximation Algorithms for Markov Decision Programming (to appear).Google Scholar
- 6.Howard, R. A.,Dynamic Programming and Markov Processes, John Wiley and Sons, New York, New York, 1960.Google Scholar