Advertisement

Operations-Research-Spektrum

, Volume 7, Issue 1, pp 27–37 | Cite as

Finite state approximation algorithms for average cost denumerable state Markov decision processes

  • L. C. Thomas
  • D. Stengos
Theoretical Papers

Summary

In this paper we study three finite state, value and policy iteration algorithms for denumerable space Markov decision processes with respect to the average cost criterion. The convergence of these algorithms is guaranteed under a scrambling-type recurrency condition and various “tail” conditions on the transition probabilities. With the value iteration schemes we construct nearly optimal policies by concentrating on a finite set of “important” states and controlling them as well as we can. The policy space algorithm consists of a value determination scheme associated with a policy and a policy improvement step where a “better” policy is determined. Thus a sequence of improved policies is constructed which is shown to converge to the optimal average cost policy.

Keywords

Markov Decision Process Average Cost Policy Space Policy Iteration Cost Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Zusammenfassung

Für Markovsche Entscheidungsprozesse mit abzählbarem Zustandsraum untersuchen wir für den Fall des Durchschnittskostenkriteriums drei endliche Wertiterations- und Politikiterations-Algorithmen. Die Konvergenz der Algorithmen wird durch “scramblingtype” Rekurrenzbedingungen und verschiene “tail” Bedingungen an die Übergangswahrscheinlichkeiten gesichert. Mit den Wertiterationsverfahren konstruieren wir fast optimale Politiken, indem wir uns auf eine endliche Menge von “wichtigen” Zuständen konzentrieren und diese bestmöglich kontrollieren. Der Politikiterations-Algorithmus besteht aus einem Schritt zur Wertbestimmung für eine Politik und einem Schritt zur Verbesserung der Politik. Auf diese Weise wird eine Folge verbesserter Politiken konstruiert, die Konvergenz zur optimalen Politik wird gezeigt.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177CrossRefGoogle Scholar
  2. 2.
    Federgruen A, Schweitzer PJ, Tijms HC (1978) Contraction mappings underlying undiscounted Markov decision problems. JMAA 65:711–730Google Scholar
  3. 3.
    Federgruen A, Tijms HC (1978) The optimality equation in average cost denumerable state semi-Markov decision problems — recurrency conditions and algorithms. J Appl Probab 15:356–373CrossRefGoogle Scholar
  4. 4.
    Federgruen A, Hordijk A, Tijms HC (1978) Recurrence conditions in denumerable state Markov decision processes. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 3–22CrossRefGoogle Scholar
  5. 5.
    Federgruen A, Hordijk A, Tijms HC (1978) A note on simultaneous recurrence conditions on a set of stochastic matrices. J Appl Probab 15:842–847CrossRefGoogle Scholar
  6. 6.
    Fox BL (1971) Finite state approximations to denumerable state dynamic programs. JMAA 34:665–670Google Scholar
  7. 7.
    Hinderer K (1978) On approximate solutions of finite stage dynamic programs. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 289–318CrossRefGoogle Scholar
  8. 8.
    Hordijk A (1974) Dynamic programming and Markov potential theory. Math. Centre Tract 51, AmsterdamGoogle Scholar
  9. 9.
    Hordijk A, Schweitzer PJ, Tijms HC (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J Appl Probab 12:298–305CrossRefGoogle Scholar
  10. 10.
    Nollau V, Hahnewald-Busch A (1978) An approximation procedure for stochastic dynamic programming in countable state space. Math Oper Forsch Stat Ser Opt 9:109–117Google Scholar
  11. 11.
    Nollau V, Hahnewald-Busch A (1979) An approximation procedure for stochastic dynamic programming based on clustering of state and action spaces. Math Oper Forsch Stat Ser Opt 10:121–131Google Scholar
  12. 12.
    Ross SM (1970) Applied Probability Models with optimization applications. Holden-Day, San FranciscoGoogle Scholar
  13. 13.
    Schweitzer PJ (1971) Iterative solutions of the functional equations of undiscounted Markov renewal programming. JMAA 34:495–501Google Scholar
  14. 14.
    Stengos D (1980) Finite state algorithms for average cost countable state Markov decision processes. Ph. D. Thesis, University of ManchesterGoogle Scholar
  15. 15.
    Stengos D, Thomas LC (1980) The blast furnaces problem. Eur J Oper Res 4:330–336CrossRefGoogle Scholar
  16. 16.
    Thomas LC (1981) Connectedness conditions for denumerable state Markov decision processes. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 181–204Google Scholar
  17. 17.
    Thomas LC, Stengos D (1978) Finite state approximations for denumerable state Markov decision processes — the average cost case. Notes in decision theory, No. 60, University of ManchesterGoogle Scholar
  18. 18.
    Thomas LC, Stengos D (1980) Finite state approximation algorithms for average cost denumerable state Markov decision processes — The policy space method. Notes in decision theory, No. 94, University of ManchesterGoogle Scholar
  19. 19.
    Waldmann K (1978) On approximations of dynamic programs. Preprint No. 439, Technische Hochschule DarmstadtGoogle Scholar
  20. 20.
    White DJ (1963) Dynamic programming, Markov chains and the method of successive approximations. JMAA 6:373–376Google Scholar
  21. 21.
    White DJ (1980) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes. JMAA 74:292–295Google Scholar
  22. 22.
    White DJ (1981) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the method of successive approximations. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 57–72Google Scholar
  23. 23.
    White DJ (1979) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the policy space method. JMAA 72:513–523Google Scholar
  24. 24.
    Whitt W (1978) Approximations of dynamic programs I. Math Oper Res 3:237–243CrossRefGoogle Scholar
  25. 25.
    Whitt W (1979) Approximations of dynamic programs II. Math Oper Res 4:179–185CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 1985

Authors and Affiliations

  • L. C. Thomas
    • 1
  • D. Stengos
    • 2
  1. 1.Department of Decision TheoryUniversity of ManchesterUK
  2. 2.Department of StatisticsUniversity of AthensGreece

Personalised recommendations