Skip to main content
Log in

Computation of optimal policies in discounted semi-Markov decision chains

  • Theoretical Papers
  • Published:
Operations-Research-Spektrum Aims and scope Submit manuscript

Summary

The control of a finite-state semi-Markov process is investigated. In each state, a finite number of actions is available. Each action determines reward rates and transition rates to the other states. These rates depend on the holding time in the state and the actions can be changed at any point in time — not just at transition times. The goal is to find a policy that maximizes the expected discounted reward. A method of successive approximations is shown to converge to the optimal present value of the process and determines an ε-optimal policy. When the reward and transition rates are piecewise-constant in the holding time, it is shown that an ε-optimal policy can be computed exactly in finite time. The above method is shown to be much faster than a more standard algorithm operating by discretization of the holding times.

Zusammenfassung

Die vorliegende Arbeit befaßt sich mit Semi-Markov-Entscheidungsprozessen mit endlichem Zustandsraum. In jedem Zustand steht eine endliche Anzahl Aktionen zur Verfügung. Jede Aktion legt Gewinn- und Übergangsraten zu anderen Zuständen fest. Die Aktionen dürfen aber im vorliegenden Falle zu beliebigen Zeitpunkten und nicht nur bei den Übergängen geändert werden. Das Ziel der Arbeit ist die Bestimmung der optimalen Politik, die den erwarteten Gewinn maximiert. Auf die eingebettete Markov-Entscheidungskette wird ein Wertiterationsalgorithmus definiert. Wenn die Gewinn- und Übergangsraten stückweise-konstante Funktionen sind, kann eine ε-optimale Politik mit einer endlichen Anzahl Rechenoperationen gefunden werden. Der beschriebene Algorithmus ist viel effizienter als der gewöhnliche Wertiterationsalgorithmus, angewendet auf das Problem nach Diskretisierung der Verweildauer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cantaluppi LJ (1984) Optimality of piecewise-constant holding-time-dependent policies in semi-Markov decision chains. SIAM J Control 22

  2. Chitgopekar SS (1969) Continuous time Markovian sequential control processes. SIAM J Control 7:367–389

    Article  Google Scholar 

  3. Denardo EV, Fox BL (1968). Multichain Markov renewal programs. SIAM J Appl Math 16:468–487

    Article  Google Scholar 

  4. Howard RA (1963) Semi-Markov decision processes. Bull Inst Int Statist 40:625–652

    Google Scholar 

  5. Jewell WS (1963) Markov renewal programming, I and II. Oper Res 11:938–971

    Article  Google Scholar 

  6. Kao EP (1973) Optimal replacement rules when changes of state are semi-Markovian. Oper Res 21:1231–1249

    Article  Google Scholar 

  7. Miller BL (1968) Finite state continuous time Markov Decision processes with an infinite planning horizon. J Math Anal Appl 22:552–569

    Article  Google Scholar 

  8. Miller BL (1968) Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J Control 6:266–280

    Article  Google Scholar 

  9. Osaki S, Mine H (1968) Linear programming algorithms for semi-Markov decision processes. J Math Anal Appl 22:356–381

    Article  Google Scholar 

  10. Stone LD (1973) Necessary and sufficient conditions for optimal control of semi-Markov processes. SIAM J Control 7:367–389

    Google Scholar 

  11. Veinott AF, JR (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Statist 40:1635–1660

    Article  Google Scholar 

  12. Whitt W (1978) Approximations of dynamic programs, I. Math Oper Res 3:231–243

    Article  Google Scholar 

  13. Whitt W (1979) Approximations of dynamic programs, II. Math Oper Res 4:179–185

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cantaluppi, L. Computation of optimal policies in discounted semi-Markov decision chains. OR Spektrum 6, 147–160 (1984). https://doi.org/10.1007/BF01719612

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01719612

Keywords

Navigation