Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes

Summary

This paper deals with discounted Markov decision processes, Markov with respect to a finite statespaceI, where for eachi∈I, and each decision epocht, there is a finite action space K(i, t). The paper is concerned with problems which are formulated in terms of the discounted rewards in several ways. In order to ensure that optimal, or near optimal, policies are obtained, the state spaceI is extended to augmented state-spaces A(n), or A(), including the accumulated discounted rewards. Specimen problems are formulated and some computational aspects examined.

Zusammenfassung

Es werden diskontierte Markovsche Entscheidungsprozesse behandelt mit endlichem Zustandsraum I, wobei die Mengen der zulässigen Entscheidungen K(i, t) vom Zustandi∈ I und vom Zeitpunktt abhängen können. Es werden verschiedene Zielfunktionen betrachtet, die jeweils als Funktion des diskontierten Gesamtgewinns (nicht dessen Erwartungswerts) formuliert werden. Um optimale oder fast-optimale Politiken zu erhalten ohne die gesamte Vorgeschichte zu registrieren, wird der Zustandsraum um die akkumulierten diskontierten Auszahlungen erweitert. Eine Auswahl solcher Probleme wird exemplarisch diskutiert einschließlich einiger Aspekte der numerischen Behandlung.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Bellman R (1957) Dynamic programming. Princeton University Press, Princeton, NJ

  2. 2.

    Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36:226–235

  3. 3.

    Courant R (1952) Differential and integral calculus. Blackie & Son Limited, London

  4. 4.

    Dantzig GB, Wolfe P (1961) The decomposition algorithm for linear programming. Econometrica 29:767–778

  5. 5.

    Derman C, Klein M (1965) Some remarks on finite horizon Markovian decision models. Oper Res 13:272–278

  6. 6.

    Derman C (1970) Finite state Markovian decision processes. Academic Press, New York

  7. 7.

    Doob JL (1953) Stochastic processes. Wiley, Chichester

  8. 8.

    Everett H (1963) Generalised Lagrange multiplier method for solving problems of optimum allocation of resources. Oper Res 11:399–417

  9. 9.

    Fainberg EA (1982) Controlled Markov processes with arbitrary numerical criteria. Theory Probab Appl 27:486–503

  10. 10.

    Fainberg EA (1982) Non-randomised Markov and semi-Markov strategies in dynamic programming. Theory Probab Appl 27:116–126

  11. 11.

    Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameter. Springer, Berlin Heidelberg New York

  12. 12.

    Jacquette SC (1973) Markov decision processes with a new optimality criterion, small interest rates. Ann Math Stat 1:1894–1901

  13. 13.

    Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical centre tracts. 148, Mathematische Centrum, Amsterdam

  14. 14.

    Kreps DM (1977) Decision problems with expected utility criteria. I. Upper and lower convergent utility. Math Oper Res 2:45–53

  15. 15.

    Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:774–802

  16. 16.

    Tui H (1964) Concave programming under linear constraints. Sov Math 5:1437–1440

  17. 17.

    Van der Wal J (1982) Stochastic dynamic programming. Mathematical Centre Tracts. 139, Mathematische Centrum, Amsterdam

  18. 18.

    White DJ (1972) Dynamic programming with probabilistic constraints. Oper Res 22:654–664

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

White, D.J. Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes. OR Spektrum 9, 13–22 (1987). https://doi.org/10.1007/BF01720793

Download citation

Keywords

  • Decision Process
  • Action Space
  • Markov Decision Process
  • Computational Aspect
  • Probabilistic Constraint