Computation of optimal policies in discounted semi-Markov decision chains

Cantaluppi, L.

doi:10.1007/BF01719612

Computation of optimal policies in discounted semi-Markov decision chains

Theoretical Papers
Published: 01 September 1984

Volume 6, pages 147–160, (1984)
Cite this article

Operations-Research-Spektrum Aims and scope Submit manuscript

L. Cantaluppi¹

58 Accesses
2 Citations
Explore all metrics

Summary

The control of a finite-state semi-Markov process is investigated. In each state, a finite number of actions is available. Each action determines reward rates and transition rates to the other states. These rates depend on the holding time in the state and the actions can be changed at any point in time — not just at transition times. The goal is to find a policy that maximizes the expected discounted reward. A method of successive approximations is shown to converge to the optimal present value of the process and determines an ε-optimal policy. When the reward and transition rates are piecewise-constant in the holding time, it is shown that an ε-optimal policy can be computed exactly in finite time. The above method is shown to be much faster than a more standard algorithm operating by discretization of the holding times.

Zusammenfassung

Die vorliegende Arbeit befaßt sich mit Semi-Markov-Entscheidungsprozessen mit endlichem Zustandsraum. In jedem Zustand steht eine endliche Anzahl Aktionen zur Verfügung. Jede Aktion legt Gewinn- und Übergangsraten zu anderen Zuständen fest. Die Aktionen dürfen aber im vorliegenden Falle zu beliebigen Zeitpunkten und nicht nur bei den Übergängen geändert werden. Das Ziel der Arbeit ist die Bestimmung der optimalen Politik, die den erwarteten Gewinn maximiert. Auf die eingebettete Markov-Entscheidungskette wird ein Wertiterationsalgorithmus definiert. Wenn die Gewinn- und Übergangsraten stückweise-konstante Funktionen sind, kann eine ε-optimale Politik mit einer endlichen Anzahl Rechenoperationen gefunden werden. Der beschriebene Algorithmus ist viel effizienter als der gewöhnliche Wertiterationsalgorithmus, angewendet auf das Problem nach Diskretisierung der Verweildauer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Cantaluppi LJ (1984) Optimality of piecewise-constant holding-time-dependent policies in semi-Markov decision chains. SIAM J Control 22
Chitgopekar SS (1969) Continuous time Markovian sequential control processes. SIAM J Control 7:367–389
Article Google Scholar
Denardo EV, Fox BL (1968). Multichain Markov renewal programs. SIAM J Appl Math 16:468–487
Article Google Scholar
Howard RA (1963) Semi-Markov decision processes. Bull Inst Int Statist 40:625–652
Google Scholar
Jewell WS (1963) Markov renewal programming, I and II. Oper Res 11:938–971
Article Google Scholar
Kao EP (1973) Optimal replacement rules when changes of state are semi-Markovian. Oper Res 21:1231–1249
Article Google Scholar
Miller BL (1968) Finite state continuous time Markov Decision processes with an infinite planning horizon. J Math Anal Appl 22:552–569
Article Google Scholar
Miller BL (1968) Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J Control 6:266–280
Article Google Scholar
Osaki S, Mine H (1968) Linear programming algorithms for semi-Markov decision processes. J Math Anal Appl 22:356–381
Article Google Scholar
Stone LD (1973) Necessary and sufficient conditions for optimal control of semi-Markov processes. SIAM J Control 7:367–389
Google Scholar
Veinott AF, JR (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Statist 40:1635–1660
Article Google Scholar
Whitt W (1978) Approximations of dynamic programs, I. Math Oper Res 3:231–243
Article Google Scholar
Whitt W (1979) Approximations of dynamic programs, II. Math Oper Res 4:179–185
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Operations Research, ETH-Zentrum, CH-8092, Zürich, Switzerland
L. Cantaluppi

Authors

L. Cantaluppi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cantaluppi, L. Computation of optimal policies in discounted semi-Markov decision chains. OR Spektrum 6, 147–160 (1984). https://doi.org/10.1007/BF01719612

Download citation

Received: 30 May 1983
Accepted: 30 March 1984
Published: 01 September 1984
Issue Date: September 1984
DOI: https://doi.org/10.1007/BF01719612

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computation of optimal policies in discounted semi-Markov decision chains

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Numerical Approximations for Discounted Continuous Time Markov Decision Processes

Finite Markov Chains and Markov Decision Processes

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computation of optimal policies in discounted semi-Markov decision chains

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Numerical Approximations for Discounted Continuous Time Markov Decision Processes

Finite Markov Chains and Markov Decision Processes

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation