Markov Decision Processes

Bäuerle, Nicole; Rieder, Ulrich

doi:10.1365/s13291-010-0007-2

Markov Decision Processes

Übersichtsartikel
Published: 08 September 2010

Volume 112, pages 217–243, (2010)
Cite this article

Jahresbericht der Deutschen Mathematiker-Vereinigung Aims and scope Submit manuscript

Nicole Bäuerle¹ &
Ulrich Rieder²

480 Accesses
6 Citations
Explore all metrics

Abstract

The theory of Markov Decision Processes is the theory of controlled Markov chains. Its origins can be traced back to R. Bellman and L. Shapley in the 1950’s. During the decades of the last century this theory has grown dramatically. It has found applications in various areas like e.g. computer science, engineering, operations research, biology and economics. In this article we give a short introduction to parts of this theory. We treat Markov Decision Processes with finite and infinite time horizon where we will restrict the presentation to the so-called (generalized) negative case. Solution algorithms like Howard’s policy improvement and linear programming are also explained. Various examples show the application of the theory. We treat stochastic linear-quadratic control problems, bandit problems and dividend pay-out problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Programming and Markov Decision Processes

Finite Markov Chains and Markov Decision Processes

Risk-Sensitive Markov Decision Processes

References

Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton (1999)
MATH Google Scholar
Bank, P., Föllmer, H.: American options, multi-armed bandits, and optimal consumption plans: a unifying view. In: Paris-Princeton Lectures on Mathematical Finance, 2002, pp. 1–42. Springer, Berlin (2003)
Google Scholar
Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011, to appear)
Google Scholar
Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60, 503–515 (1954)
Article MATH Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Berry, D.A., Fristedt, B.: Bandit Problems. Chapman & Hall, London (1985)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. II, 2nd edn. Athena Scientific, Belmont (2001)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 3rd edn. Athena Scientific, Belmont (2005)
MATH Google Scholar
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control. Academic Press, New York (1978)
MATH Google Scholar
Bielecki, T., Hernández-Hernández, D., Pliska, S.R.: Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Methods Oper. Res. 50, 167–188 (1999)
Article MATH MathSciNet Google Scholar
Blackwell, D.: Discounted dynamic programming. Ann. Math. Stat. 36, 226–235 (1965)
Article MATH MathSciNet Google Scholar
Borkar, V., Meyn, S.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)
Article MATH MathSciNet Google Scholar
Dubins, L.E., Savage, L.J.: How to Gamble if You Must. Inequalities for Stochastic Processes. McGraw-Hill, New York (1965)
MATH Google Scholar
Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Springer, Berlin (1979)
Google Scholar
Enders, J., Powell, W., Egan, D.: A dynamic model for the failure replacement of aging high-voltage transformers. Energy Syst. J. 1, 31–59 (2010)
Article Google Scholar
Feinberg, E.A., Shwartz, A. (eds.): Handbook of Markov Decision Processes. Kluwer Academic, Boston (2002)
MATH Google Scholar
de Finetti, B.: Su unímpostazione alternativa della teoria collettiva del rischio. In: Transactions of the XVth International Congress of Actuaries, vol. 2, pp. 433–443 (1957)
Gittins, J.C.: Multi-armed Bandit Allocation Indices. Wiley, Chichester (1989)
MATH Google Scholar
Goto, J., Lewis, M., Puterman, M.: Coffee, tea or …? A Markov decision process model for airline meal provisioning. Transp. Sci. 38, 107–118 (2004)
Article Google Scholar
Guo, X., Hernández-Lerma, O.: Continuous-time Markov Decision Processes. Springer, New York (2009)
Book MATH Google Scholar
He, M., Zhao, L., Powell, W.: Optimal control of dosage decisions in controlled ovarian hyperstimulation. Ann. Oper. Res. 223–245 (2010)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov Control Processes. Springer, New York (1996)
Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: The linear programming approach. In: Handbook of Markov Decision Processes, pp. 377–408. Kluwer Acad., Boston (2002)
Google Scholar
Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Springer, Berlin (1970)
MATH Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. The Technology Press of MIT, Cambridge (1960)
MATH Google Scholar
Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2001)
MATH Google Scholar
Martin-Löf, A.: Lectures on the use of control theory in insurance. Scand. Actuar. J. 1, 1–25 (1994)
Google Scholar
Meyn, S.: Control Techniques for Complex Networks. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Miyasawa, K.: An economic survival game. Oper. Res. Soc. Jpn. 4, 95–113 (1962)
Google Scholar
Peskir, G., Shiryaev, A.: Optimal Stopping and Free-boundary Problems. Birkhäuser, Basel (2006)
MATH Google Scholar
Powell, W.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007)
Book MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
MATH Google Scholar
Ross, S.: Introduction to Stochastic Dynamic Programming. Academic Press, New York (1983)
MATH Google Scholar
Schäl, M.: Markoffsche Entscheidungsprozesse. Teubner, Stuttgart (1990)
MATH Google Scholar
Schäl, M.: On discrete-time dynamic programming in insurance: exponential utility and minimizing the ruin probability. Scand. Actuar. J. 189–210 (2004)
Schmidli, H.: Stochastic Control in Insurance. Springer, London (2008)
MATH Google Scholar
Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39, 1095–1100 (1953)
Article MATH MathSciNet Google Scholar
Shiryaev, A.N.: Some new results in the theory of controlled random processes. In: Trans. Fourth Prague Conf. on Information Theory, Statistical Decision Functions Random Processes, Prague, pp. 131–203. Academia, Prague (1965)
Google Scholar
Stokey, N.L., Lucas, E.E. Jr.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
MATH Google Scholar
Tijms, H.: A First Course in Stochastic Models. Wiley, Chichester (2003)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Stochastics, Karlsruhe Institute of Technology, 76128, Karlsruhe, Germany
Nicole Bäuerle
Department of Optimization and Operations Research, University of Ulm, 89069, Ulm, Germany
Ulrich Rieder

Authors

Nicole Bäuerle
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Rieder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicole Bäuerle.

Additional information

We dedicate this paper to Karl Hinderer who passed away on April 17th, 2010. He established the theory of Markov Decision Processes in Germany 40 years ago.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bäuerle, N., Rieder, U. Markov Decision Processes. Jahresber. Dtsch. Math. Ver. 112, 217–243 (2010). https://doi.org/10.1365/s13291-010-0007-2

Download citation

Received: 14 April 2010
Published: 08 September 2010
Issue Date: December 2010
DOI: https://doi.org/10.1365/s13291-010-0007-2

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov Decision Processes

Abstract

Access this article

Similar content being viewed by others

Dynamic Programming and Markov Decision Processes

Finite Markov Chains and Markov Decision Processes

Risk-Sensitive Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Markov Decision Processes

Abstract

Access this article

Similar content being viewed by others

Dynamic Programming and Markov Decision Processes

Finite Markov Chains and Markov Decision Processes

Risk-Sensitive Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation