Advertisement

Annals of Operations Research

, Volume 28, Issue 1, pp 261–271 | Cite as

Value iteration in countable state average cost Markov decision processes with unbounded costs

  • Linn I. Sennott
Research Contributions

Abstract

We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong−Ν n (i), whereΝ n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter.

Keywords

Decision Process Service Rate Markov Decision Process Average Cost Countable State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D.P. Bertsekas,Dynamic Programming (Prentice-Hall, Englewood Cliffs, NJ, 1987).Google Scholar
  2. [2]
    A. Federgruen and P.J. Schweitzer, A survey of asymptotic value-iteration for undiscounted Markovian decision processes, in:Recent Developments in Markov Decision Processes, R. Hartley, L.C. Thomas and D.J. White (eds.) (Academic Press, New York, 1980).Google Scholar
  3. [3]
    O. Hernandez-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).Google Scholar
  4. [4]
    A. Hordijk, P.J. Schweitzer and H. Tijms, The asymptotic behavior of the minimal total expected cost for the denumerable state Markov decision model, J. Appl. Prob. 12 (1975) 298–305.Google Scholar
  5. [5]
    V.G. Kulkarni and R.L. Karandikar, Convergence of moments of Markov and semi-Markov processes, University of North Carolina, Operations Research and Systems Analysis Technical Report 86-20 (1986).Google Scholar
  6. [6]
    A.G. Pakes, Some conditions for ergodicity and recurrence of Markov chains, Oper. Res. 17 (1969) 1058–1061.Google Scholar
  7. [7]
    S.M. Ross,Introduction to Stochastic Dynamic Programming (Academic Press, New York, 1983).Google Scholar
  8. [8]
    P.J. Schweitzer and A. Federgruen, The asymptotic behavior of undiscounted value iteration in Markov decision problems, Math. Oper. Res. 2 (1977) 360–381.CrossRefGoogle Scholar
  9. [9]
    L.I. Sennott, A new condition for the existence of optimal stationary policies in average cost Markov decision processes, Oper. Res. Lett. 5 (1986) 17–23.Google Scholar
  10. [10]
    L.I. Sennott, A new condition for the existence of optimum stationary policies in average cost Markov decision processes — Unbounded cost case,Proc. 25th IEEE Conf. on Decision Control, Athens, Greece (1986) pp. 1719–1721.Google Scholar
  11. [11]
    L.I. Sennott, Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs, Oper. Res. 37 (1989) 626–633.Google Scholar
  12. [12]
    L.I. Sennott, Average cost semi-Markov decision processes and the control of queueing systems, Prob. Eng. Inform. Sci. 3 (1989) 247–272.CrossRefGoogle Scholar
  13. [13]
    R.L. Tweedie, Hitting times of Markov chains, with application to state-dependent queues, Bull. Austral. Math. Soc. 17 (1977) 97–107.CrossRefGoogle Scholar
  14. [14]
    D.J. White, Dynamic programming, Markov chains, and the method of successive approximations, J. Math. Anal. Appl. 6 (1963) 373–376.Google Scholar

Copyright information

© J.C. Baltzer A.G. Scientific Publishing Company 1991

Authors and Affiliations

  • Linn I. Sennott
    • 1
  1. 1.Illinois State UniversityNormalUSA

Personalised recommendations