Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds

White, C. C.; Popyack, J. L.

doi:10.1007/BF00939287

Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds

Contributed Papers
Published: July 1985

Volume 46, pages 319–341, (1985)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

C. C. White III¹ &
J. L. Popyack²^nAff3

95 Accesses
5 Citations
Explore all metrics

Abstract

This paper is the first of two papers that present and evaluate an approach for determining suboptimal policies for large-scale Markov decision processes (MDP). Part 1 is devoted to the determination of bounds that motivate the development and indicate the quality of the suboptimal design approach; Part 2 is concerned with the implementation and evaluation of the suboptimal design approach. The specific MDP considered is the infinite-horizon, expected total discounted cost MDP with finite state and action spaces. The approach can be described as follows. First, the original MDP is approximated by a specially structured MDP. The special structure suggests how to construct associated smaller, more computationally tractable MDP's. The suboptimal policy for the original MDP is then constructed from the solutions of these smaller MDP's. The key feature of this approach is that the state and action space cardinalities of the smaller MDP's are exponential reductions of the state and action space cardinalities of the original MDP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Introduction to Reinforcement Learning

Iterative MILP algorithm to find alternate solutions in linear programming models

Article Open access 26 April 2024

References

Pierskalla, W. P., andVoeller, J. A.,A Survey of Maintenance Models: The Control and Surveillance of Deteriorating Systems, Naval Research Logistics Quarterly, Vol. 23, pp. 353–358, 1976.
Google Scholar
Ross, S. M.,Quality Control under Markovian Deterioration, Management Science, Vol. 17, pp. 587–596, 1971.
Google Scholar
Derman, C.,On Optimal Replacement Rules When Changes of State Are Markovian, Mathematical Optimization Techniques, Edited by R. Bellman, University of California Press, Berkeley, California, 1963.
Google Scholar
Lembersky, M. R.,The Application of Markov Decision Processes to Forest Management, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, pp. 207–219, 1978.
Google Scholar
Lipstein, B.,A Mathematical Model of Consumer Behavior, Journal of Marketing Research, Vol. 11, pp. 259–265, 1965.
Google Scholar
Rothstein, M.,Hotel Overbooking as a Markovian Sequential Decision Process, Decision Sciences, Vol. 5, pp. 389–404, 1974.
Google Scholar
Shoemaker, C. A.,Applications of Dynamic Programming and Other Optimization Methods in Pest Management, IEEE Transactions on Automatic Control, Vol. AC-26, pp. 1125–1132, 1981.
Google Scholar
Shoemaker, C. A.,Optimal Integrated Control of Univoltine Pest Populations with Age Structure, Operations Research, Vol. 30, pp. 40–61, 1982.
Google Scholar
Varaiya, P., Schweitzer, P. J., andHartwick, J.,A Class of Markovian Problems Related to the Districting Problem for Urban Emergency Services, Ricerche di Automatica, and Vol. 8, pp. 1–19, 1977.
Google Scholar
Porteus, E. L.,Overview of Iterative Methods for Discounted Finite Markov and Semi-Markov Decision Chains, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. C. Thomas, and D. J. White, Academic Press, London, England, pp. 1–20, 1980.
Google Scholar
Platzman, L. K., White, C. C., andPopyack, J. L.,Optimally Damped Successive Approximation Algorithms for Markov Decision Programming (to appear).
Whitt, W.,Approximations of Dynamic Programs, I, Mathematics for Operations Research, Vol. 3, pp. 231–243, 1978.
Google Scholar
Whitt, W.,Approximations of Dynamic Programs, II, Mathematics for Operations Research, Vol. 4, pp. 179–185, 1979.
Google Scholar
Mendelssohn, R. A.,An Iterative Aggregation Procedure for Markov Decision Processes, Operations Research, Vol. 30, pp. 62–73, 1982.
Google Scholar
Schweitzer, P. J.,A Survey of Aggregation/Disaggregation Methods in Markov Decision Programming, Proceedings of the 19th IEEE Conference on Decision and Control, 1980.
White, D. J.,Finite State Approximations for Denumerable State Infinite Horizon Discounted Markov Decision Processes: The Method of Successive Approximations, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. C. Thomas, and D. J. White, Academic Press, London, England, pp. 57–72, 1980.
Google Scholar
Forestier, J. P., andVaraiya, P.,Multilayer Control of Large Markov Chains, IEEE Transactions on Automatic Control, Vol. AC-23, pp. 298–305, 1978.
Google Scholar
Teneketzis, D., Javid, S. H., andShridhar, B. L.,Control of Weakly-Coupled Markov Chains, Proceedings of the 19th IEEE Conference on Decision and Control, 1980.
White, C. C., andSchlussel, K.,Suboptimal Design for Large Scale Multimodule Systems, Operations Research, Vol. 29, pp. 865–875, 1981.
Google Scholar
Bertsekas, D. P.,Dynamic Programming and Stochastic Control, Academic Press, New York, New York, 1976.
Google Scholar
Popyack, J. L.,Approximating Markov Decision Processes with Multimodule Markov Decision Processes, University of Virginia, Department of Applied Mathematics and Computer Science, PhD Dissertation, 1982.
Michael, A., andHerget, C. J.,Mathematical Foundations in Engineering and Science, Prentice-Hall, Englewood Cliffs, New Jersey, 1981.
Google Scholar

Download references

Author information

J. L. Popyack (Graduate Student)
Present address: Department of Mathematical Science, Drexel University, Philadelphia, Pennsylvania

Authors and Affiliations

Department of Systems Engineering, Thornton Hall, University of Virginia, Charlottesville, Virginia
C. C. White III (Professor)
Department of Applied Mathematics and Computer Science, University of Virginia, Charlottesville, Virginia
J. L. Popyack (Graduate Student)

Authors

C. C. White III
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Popyack
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by R. A. Howard

This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, C.C., Popyack, J.L. Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds. J Optim Theory Appl 46, 319–341 (1985). https://doi.org/10.1007/BF00939287

Download citation

Issue Date: July 1985
DOI: https://doi.org/10.1007/BF00939287

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Introduction to Reinforcement Learning

Iterative MILP algorithm to find alternate solutions in linear programming models

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Introduction to Reinforcement Learning

Iterative MILP algorithm to find alternate solutions in linear programming models

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation