Skip to main content
Log in

Markov Decision Processes

  • Übersichtsartikel
  • Published:
Jahresbericht der Deutschen Mathematiker-Vereinigung Aims and scope Submit manuscript

Abstract

The theory of Markov Decision Processes is the theory of controlled Markov chains. Its origins can be traced back to R. Bellman and L. Shapley in the 1950’s. During the decades of the last century this theory has grown dramatically. It has found applications in various areas like e.g. computer science, engineering, operations research, biology and economics. In this article we give a short introduction to parts of this theory. We treat Markov Decision Processes with finite and infinite time horizon where we will restrict the presentation to the so-called (generalized) negative case. Solution algorithms like Howard’s policy improvement and linear programming are also explained. Various examples show the application of the theory. We treat stochastic linear-quadratic control problems, bandit problems and dividend pay-out problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton (1999)

    MATH  Google Scholar 

  2. Bank, P., Föllmer, H.: American options, multi-armed bandits, and optimal consumption plans: a unifying view. In: Paris-Princeton Lectures on Mathematical Finance, 2002, pp. 1–42. Springer, Berlin (2003)

    Google Scholar 

  3. Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011, to appear)

    Google Scholar 

  4. Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60, 503–515 (1954)

    Article  MATH  Google Scholar 

  5. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  6. Berry, D.A., Fristedt, B.: Bandit Problems. Chapman & Hall, London (1985)

    MATH  Google Scholar 

  7. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. II, 2nd edn. Athena Scientific, Belmont (2001)

    MATH  Google Scholar 

  8. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 3rd edn. Athena Scientific, Belmont (2005)

    MATH  Google Scholar 

  9. Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control. Academic Press, New York (1978)

    MATH  Google Scholar 

  10. Bielecki, T., Hernández-Hernández, D., Pliska, S.R.: Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Methods Oper. Res. 50, 167–188 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  11. Blackwell, D.: Discounted dynamic programming. Ann. Math. Stat. 36, 226–235 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  12. Borkar, V., Meyn, S.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  13. Dubins, L.E., Savage, L.J.: How to Gamble if You Must. Inequalities for Stochastic Processes. McGraw-Hill, New York (1965)

    MATH  Google Scholar 

  14. Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Springer, Berlin (1979)

    Google Scholar 

  15. Enders, J., Powell, W., Egan, D.: A dynamic model for the failure replacement of aging high-voltage transformers. Energy Syst. J. 1, 31–59 (2010)

    Article  Google Scholar 

  16. Feinberg, E.A., Shwartz, A. (eds.): Handbook of Markov Decision Processes. Kluwer Academic, Boston (2002)

    MATH  Google Scholar 

  17. de Finetti, B.: Su unímpostazione alternativa della teoria collettiva del rischio. In: Transactions of the XVth International Congress of Actuaries, vol. 2, pp. 433–443 (1957)

  18. Gittins, J.C.: Multi-armed Bandit Allocation Indices. Wiley, Chichester (1989)

    MATH  Google Scholar 

  19. Goto, J., Lewis, M., Puterman, M.: Coffee, tea or …? A Markov decision process model for airline meal provisioning. Transp. Sci. 38, 107–118 (2004)

    Article  Google Scholar 

  20. Guo, X., Hernández-Lerma, O.: Continuous-time Markov Decision Processes. Springer, New York (2009)

    Book  MATH  Google Scholar 

  21. He, M., Zhao, L., Powell, W.: Optimal control of dosage decisions in controlled ovarian hyperstimulation. Ann. Oper. Res. 223–245 (2010)

  22. Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov Control Processes. Springer, New York (1996)

    Google Scholar 

  23. Hernández-Lerma, O., Lasserre, J.B.: The linear programming approach. In: Handbook of Markov Decision Processes, pp. 377–408. Kluwer Acad., Boston (2002)

    Google Scholar 

  24. Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Springer, Berlin (1970)

    MATH  Google Scholar 

  25. Howard, R.A.: Dynamic Programming and Markov Processes. The Technology Press of MIT, Cambridge (1960)

    MATH  Google Scholar 

  26. Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2001)

    MATH  Google Scholar 

  27. Martin-Löf, A.: Lectures on the use of control theory in insurance. Scand. Actuar. J. 1, 1–25 (1994)

    Google Scholar 

  28. Meyn, S.: Control Techniques for Complex Networks. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  29. Miyasawa, K.: An economic survival game. Oper. Res. Soc. Jpn. 4, 95–113 (1962)

    Google Scholar 

  30. Peskir, G., Shiryaev, A.: Optimal Stopping and Free-boundary Problems. Birkhäuser, Basel (2006)

    MATH  Google Scholar 

  31. Powell, W.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007)

    Book  MATH  Google Scholar 

  32. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    MATH  Google Scholar 

  33. Ross, S.: Introduction to Stochastic Dynamic Programming. Academic Press, New York (1983)

    MATH  Google Scholar 

  34. Schäl, M.: Markoffsche Entscheidungsprozesse. Teubner, Stuttgart (1990)

    MATH  Google Scholar 

  35. Schäl, M.: On discrete-time dynamic programming in insurance: exponential utility and minimizing the ruin probability. Scand. Actuar. J. 189–210 (2004)

  36. Schmidli, H.: Stochastic Control in Insurance. Springer, London (2008)

    MATH  Google Scholar 

  37. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39, 1095–1100 (1953)

    Article  MATH  MathSciNet  Google Scholar 

  38. Shiryaev, A.N.: Some new results in the theory of controlled random processes. In: Trans. Fourth Prague Conf. on Information Theory, Statistical Decision Functions Random Processes, Prague, pp. 131–203. Academia, Prague (1965)

    Google Scholar 

  39. Stokey, N.L., Lucas, E.E. Jr.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)

    MATH  Google Scholar 

  40. Tijms, H.: A First Course in Stochastic Models. Wiley, Chichester (2003)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicole Bäuerle.

Additional information

We dedicate this paper to Karl Hinderer who passed away on April 17th, 2010. He established the theory of Markov Decision Processes in Germany 40 years ago.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bäuerle, N., Rieder, U. Markov Decision Processes. Jahresber. Dtsch. Math. Ver. 112, 217–243 (2010). https://doi.org/10.1365/s13291-010-0007-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1365/s13291-010-0007-2

Keywords

Mathematics Subject Classification (2010)

Navigation