Advertisement

Operations-Research-Spektrum

, Volume 6, Issue 4, pp 223–227 | Cite as

Isotone policies for the value iteration method for Markov decision processes

  • D. J. White
Theoretical Papers

Summary

This paper considers the value iteration process for countable state discounted Markov decision processes and shows that under certain conditions there will exist anN-isotone sequence of optimal decision rules and value functions, whereN-isotonicity of a sequence of decision rules {δn}, n ∈ {1,2,...}=N requires that, for a specified partial order ≲ overK=UK(i) (K(i) being the feasible action space fori) then δn−1(i)≲δ n (i), ∀n⩾2 and alliI, with a similar definition ofN-isotonic for the value functions {v n },n⩾1.

Keywords

Decision Process Decision Rule Partial Order Iteration Method Action Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Zusammenfassung

Wir betrachten die Wertiteration bei diskontierten Markovschen Entscheidungsprozessen mit abzählbarem Zustandsraum. Wir zeigen, daß unter gewissen Bedingungen eineN-isotone Folge von optimalen Entscheidungsregeln und Wertfunktionen existiert.N-isoton heißt eine Folge von Entscheidungsregeln {δn},n ∈ {1,2,...}=N, dann, wenn für eine Halbordnung ≲ überK=UK(i) gilt δn−1 (i) δn−1(i)≲δ n (i) für allen⩾2 undiI. (K(i is die Menge der zulässigen Aktionen im Zustandi). Eine analoge Definition derN-Isotonie gilt für die Wertfunktionen {vn},n⩾1.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blackwell D (1965) Discounted dynamic programming, Ann Math Stat 36:226–235CrossRefGoogle Scholar
  2. 2.
    Hinderer K (1982) On the structure of solutions of stochastic dynamic programming. Bericht Nr. 20, Universität Karlsruhe, Fakultät für MathematikGoogle Scholar
  3. 3.
    Hinderer K: Unpublished noteGoogle Scholar
  4. 4.
    Serfozo RF (1976) Monotone optimal policies for Markov decision processes. Math Prog Study 6:202–215CrossRefGoogle Scholar
  5. 5.
    Topkis DM (1978) Minimising a subadditive function on a lattice. Oper Res 26:305–321CrossRefGoogle Scholar
  6. 6.
    Wessels J (1977) Markov programming by successive approximation with respect to weighted supremum norms. JMAA 58:326–335Google Scholar
  7. 7.
    White DJ (1978) Finite dynamic programming. Wiley, New YorkGoogle Scholar
  8. 8.
    White CC (1980) The optimality of isotone strategies for Markov decision problems with utility criterion. In: Hartley R, Thomas LC, White DJ (eds) Recent developments in Markov decision processes, Academic Press, New YorkGoogle Scholar

Copyright information

© Springer-Verlag 1984

Authors and Affiliations

  • D. J. White
    • 1
  1. 1.Department of Decision TheoryUniversity of ManchesterManchesterEngland

Personalised recommendations