An improved algorithm for solving communicating average reward Markov decision processes

Haviv, Moshe; Puterman, Martin L.

doi:10.1007/BF02055583

An improved algorithm for solving communicating average reward Markov decision processes

Research Contributions
Published: December 1991

Volume 28, pages 229–242, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Moshe Haviv¹ &
Martin L. Puterman¹

122 Accesses
14 Citations
Explore all metrics

Abstract

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

J. Bather, Optimal decision procedures for finite Markov chains, Part II: communicating systems, Adv. Appl. Prob. 5 (1973) 521–540.
Google Scholar
D. Blackwell, Discrete dynamic programming, Ann. Math. Statist. 33 (1962) 719–726.
Google Scholar
C. Derman, Denumerable state Markov decision processes — average cost criteria, Ann. Math. Statist. 37 (1966) 1545–1553.
Google Scholar
J. Filar and Schultz, Communicating MDPs: Equivalence and LP properties, Oper. Res. Lett. 7 (1988) 303–307.
Google Scholar
B.L. Fox and M.D. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix, Commun. ACM 1 (1968) 619–621.
Google Scholar
A. Hordijk and M.L. Puterman, On the convergence of policy iteration in undiscounted finite state Markov decision processes: The unichain case, Math. Oper. Res. 12 (1987) 163–176.
Article Google Scholar
R. Howard,Dynamic Programming and Markov Processes (The MIT Press, Cambridge, MA, 1960).
Google Scholar
K. Ohno and K. Ichiki, Computing optimal policies for controlled tandem queueing systems, Oper. Res. 35 (1987) 121–126.
Article Google Scholar
M.L. Puterman, Markov decision processes, in:Handbook of Operations Research, vol. 2,Stochastic Models, D.P. Heyman and M.J. Sobel (eds.) (North-Holland, 1190) pp. 331–434.
K.W. Ross and R. Vardarajan, Multichain Markov decision processes with a sample-path constraint: a decomposition approach, Math. Oper. Res. (1991), to appear.
H. Tijms,Stochastic Modelling and Analysis (Wiley, New York, 1986).
Google Scholar
J. van der Wal,Stochastic Dynamic Programs, Tract 139 (The Mathematical Centre, Amsterdam, 1981).
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Commerce and Business Administration, The University of British Columbia, 2053 Main Mall, V6T 1Y8, Vancouver, B.C., Canada
Moshe Haviv & Martin L. Puterman

Authors

Moshe Haviv
View author publications
You can also search for this author in PubMed Google Scholar
Martin L. Puterman
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This research has been partially supported by NSERC Grant A-5527.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haviv, M., Puterman, M.L. An improved algorithm for solving communicating average reward Markov decision processes. Ann Oper Res 28, 229–242 (1991). https://doi.org/10.1007/BF02055583

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02055583

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved algorithm for solving communicating average reward Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved algorithm for solving communicating average reward Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation