Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes

White, D. J.

doi:10.1007/BF01720793

Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes

Theoretical Papers
Published: 01 March 1987

Volume 9, pages 13–22, (1987)
Cite this article

Operations-Research-Spektrum Aims and scope Submit manuscript

D. J. White¹

77 Accesses
11 Citations
Explore all metrics

Summary

This paper deals with discounted Markov decision processes, Markov with respect to a finite statespaceI, where for eachi∈I, and each decision epocht, there is a finite action space K(i, t). The paper is concerned with problems which are formulated in terms of the discounted rewards in several ways. In order to ensure that optimal, or near optimal, policies are obtained, the state spaceI is extended to augmented state-spaces A(n), or A(^∞), including the accumulated discounted rewards. Specimen problems are formulated and some computational aspects examined.

Zusammenfassung

Es werden diskontierte Markovsche Entscheidungsprozesse behandelt mit endlichem Zustandsraum I, wobei die Mengen der zulässigen Entscheidungen K(i, t) vom Zustandi∈ I und vom Zeitpunktt abhängen können. Es werden verschiedene Zielfunktionen betrachtet, die jeweils als Funktion des diskontierten Gesamtgewinns (nicht dessen Erwartungswerts) formuliert werden. Um optimale oder fast-optimale Politiken zu erhalten ohne die gesamte Vorgeschichte zu registrieren, wird der Zustandsraum um die akkumulierten diskontierten Auszahlungen erweitert. Eine Auswahl solcher Probleme wird exemplarisch diskutiert einschließlich einiger Aspekte der numerischen Behandlung.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bellman R (1957) Dynamic programming. Princeton University Press, Princeton, NJ
Google Scholar
Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36:226–235
Article Google Scholar
Courant R (1952) Differential and integral calculus. Blackie & Son Limited, London
Google Scholar
Dantzig GB, Wolfe P (1961) The decomposition algorithm for linear programming. Econometrica 29:767–778
Article Google Scholar
Derman C, Klein M (1965) Some remarks on finite horizon Markovian decision models. Oper Res 13:272–278
Article Google Scholar
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Google Scholar
Doob JL (1953) Stochastic processes. Wiley, Chichester
Google Scholar
Everett H (1963) Generalised Lagrange multiplier method for solving problems of optimum allocation of resources. Oper Res 11:399–417
Article Google Scholar
Fainberg EA (1982) Controlled Markov processes with arbitrary numerical criteria. Theory Probab Appl 27:486–503
Article Google Scholar
Fainberg EA (1982) Non-randomised Markov and semi-Markov strategies in dynamic programming. Theory Probab Appl 27:116–126
Article Google Scholar
Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameter. Springer, Berlin Heidelberg New York
Book Google Scholar
Jacquette SC (1973) Markov decision processes with a new optimality criterion, small interest rates. Ann Math Stat 1:1894–1901
Google Scholar
Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical centre tracts. 148, Mathematische Centrum, Amsterdam
Google Scholar
Kreps DM (1977) Decision problems with expected utility criteria. I. Upper and lower convergent utility. Math Oper Res 2:45–53
Article Google Scholar
Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:774–802
Article Google Scholar
Tui H (1964) Concave programming under linear constraints. Sov Math 5:1437–1440
Google Scholar
Van der Wal J (1982) Stochastic dynamic programming. Mathematical Centre Tracts. 139, Mathematische Centrum, Amsterdam
Google Scholar
White DJ (1972) Dynamic programming with probabilistic constraints. Oper Res 22:654–664
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Decision Theory, University of Manchester, Dover Street, M13 9PL, Manchester, England
D. J. White

Authors

D. J. White
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, D.J. Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes. OR Spektrum 9, 13–22 (1987). https://doi.org/10.1007/BF01720793

Download citation

Received: 08 May 1984
Accepted: 07 July 1986
Published: 01 March 1987
Issue Date: March 1987
DOI: https://doi.org/10.1007/BF01720793

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Semi-Markov decision processes with variance minimization criterion

Strong n-discount and finite-horizon optimality for continuous-time Markov decision processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Semi-Markov decision processes with variance minimization criterion

Strong n-discount and finite-horizon optimality for continuous-time Markov decision processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation