Abstract
This paper is the first of two papers that present and evaluate an approach for determining suboptimal policies for large-scale Markov decision processes (MDP). Part 1 is devoted to the determination of bounds that motivate the development and indicate the quality of the suboptimal design approach; Part 2 is concerned with the implementation and evaluation of the suboptimal design approach. The specific MDP considered is the infinite-horizon, expected total discounted cost MDP with finite state and action spaces. The approach can be described as follows. First, the original MDP is approximated by a specially structured MDP. The special structure suggests how to construct associated smaller, more computationally tractable MDP's. The suboptimal policy for the original MDP is then constructed from the solutions of these smaller MDP's. The key feature of this approach is that the state and action space cardinalities of the smaller MDP's are exponential reductions of the state and action space cardinalities of the original MDP.
Similar content being viewed by others
References
Pierskalla, W. P., andVoeller, J. A.,A Survey of Maintenance Models: The Control and Surveillance of Deteriorating Systems, Naval Research Logistics Quarterly, Vol. 23, pp. 353–358, 1976.
Ross, S. M.,Quality Control under Markovian Deterioration, Management Science, Vol. 17, pp. 587–596, 1971.
Derman, C.,On Optimal Replacement Rules When Changes of State Are Markovian, Mathematical Optimization Techniques, Edited by R. Bellman, University of California Press, Berkeley, California, 1963.
Lembersky, M. R.,The Application of Markov Decision Processes to Forest Management, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, pp. 207–219, 1978.
Lipstein, B.,A Mathematical Model of Consumer Behavior, Journal of Marketing Research, Vol. 11, pp. 259–265, 1965.
Rothstein, M.,Hotel Overbooking as a Markovian Sequential Decision Process, Decision Sciences, Vol. 5, pp. 389–404, 1974.
Shoemaker, C. A.,Applications of Dynamic Programming and Other Optimization Methods in Pest Management, IEEE Transactions on Automatic Control, Vol. AC-26, pp. 1125–1132, 1981.
Shoemaker, C. A.,Optimal Integrated Control of Univoltine Pest Populations with Age Structure, Operations Research, Vol. 30, pp. 40–61, 1982.
Varaiya, P., Schweitzer, P. J., andHartwick, J.,A Class of Markovian Problems Related to the Districting Problem for Urban Emergency Services, Ricerche di Automatica, and Vol. 8, pp. 1–19, 1977.
Porteus, E. L.,Overview of Iterative Methods for Discounted Finite Markov and Semi-Markov Decision Chains, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. C. Thomas, and D. J. White, Academic Press, London, England, pp. 1–20, 1980.
Platzman, L. K., White, C. C., andPopyack, J. L.,Optimally Damped Successive Approximation Algorithms for Markov Decision Programming (to appear).
Whitt, W.,Approximations of Dynamic Programs, I, Mathematics for Operations Research, Vol. 3, pp. 231–243, 1978.
Whitt, W.,Approximations of Dynamic Programs, II, Mathematics for Operations Research, Vol. 4, pp. 179–185, 1979.
Mendelssohn, R. A.,An Iterative Aggregation Procedure for Markov Decision Processes, Operations Research, Vol. 30, pp. 62–73, 1982.
Schweitzer, P. J.,A Survey of Aggregation/Disaggregation Methods in Markov Decision Programming, Proceedings of the 19th IEEE Conference on Decision and Control, 1980.
White, D. J.,Finite State Approximations for Denumerable State Infinite Horizon Discounted Markov Decision Processes: The Method of Successive Approximations, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. C. Thomas, and D. J. White, Academic Press, London, England, pp. 57–72, 1980.
Forestier, J. P., andVaraiya, P.,Multilayer Control of Large Markov Chains, IEEE Transactions on Automatic Control, Vol. AC-23, pp. 298–305, 1978.
Teneketzis, D., Javid, S. H., andShridhar, B. L.,Control of Weakly-Coupled Markov Chains, Proceedings of the 19th IEEE Conference on Decision and Control, 1980.
White, C. C., andSchlussel, K.,Suboptimal Design for Large Scale Multimodule Systems, Operations Research, Vol. 29, pp. 865–875, 1981.
Bertsekas, D. P.,Dynamic Programming and Stochastic Control, Academic Press, New York, New York, 1976.
Popyack, J. L.,Approximating Markov Decision Processes with Multimodule Markov Decision Processes, University of Virginia, Department of Applied Mathematics and Computer Science, PhD Dissertation, 1982.
Michael, A., andHerget, C. J.,Mathematical Foundations in Engineering and Science, Prentice-Hall, Englewood Cliffs, New Jersey, 1981.
Author information
Authors and Affiliations
Additional information
Communicated by R. A. Howard
This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.
Rights and permissions
About this article
Cite this article
White, C.C., Popyack, J.L. Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds. J Optim Theory Appl 46, 319–341 (1985). https://doi.org/10.1007/BF00939287
Issue Date:
DOI: https://doi.org/10.1007/BF00939287