Abstract
The theory of Markov Decision Processes is the theory of controlled Markov chains. Its origins can be traced back to R. Bellman and L. Shapley in the 1950’s. During the decades of the last century this theory has grown dramatically. It has found applications in various areas like e.g. computer science, engineering, operations research, biology and economics. In this article we give a short introduction to parts of this theory. We treat Markov Decision Processes with finite and infinite time horizon where we will restrict the presentation to the so-called (generalized) negative case. Solution algorithms like Howard’s policy improvement and linear programming are also explained. Various examples show the application of the theory. We treat stochastic linear-quadratic control problems, bandit problems and dividend pay-out problems.
Similar content being viewed by others
References
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton (1999)
Bank, P., Föllmer, H.: American options, multi-armed bandits, and optimal consumption plans: a unifying view. In: Paris-Princeton Lectures on Mathematical Finance, 2002, pp. 1–42. Springer, Berlin (2003)
Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011, to appear)
Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60, 503–515 (1954)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Berry, D.A., Fristedt, B.: Bandit Problems. Chapman & Hall, London (1985)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. II, 2nd edn. Athena Scientific, Belmont (2001)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 3rd edn. Athena Scientific, Belmont (2005)
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control. Academic Press, New York (1978)
Bielecki, T., Hernández-Hernández, D., Pliska, S.R.: Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Methods Oper. Res. 50, 167–188 (1999)
Blackwell, D.: Discounted dynamic programming. Ann. Math. Stat. 36, 226–235 (1965)
Borkar, V., Meyn, S.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)
Dubins, L.E., Savage, L.J.: How to Gamble if You Must. Inequalities for Stochastic Processes. McGraw-Hill, New York (1965)
Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Springer, Berlin (1979)
Enders, J., Powell, W., Egan, D.: A dynamic model for the failure replacement of aging high-voltage transformers. Energy Syst. J. 1, 31–59 (2010)
Feinberg, E.A., Shwartz, A. (eds.): Handbook of Markov Decision Processes. Kluwer Academic, Boston (2002)
de Finetti, B.: Su unímpostazione alternativa della teoria collettiva del rischio. In: Transactions of the XVth International Congress of Actuaries, vol. 2, pp. 433–443 (1957)
Gittins, J.C.: Multi-armed Bandit Allocation Indices. Wiley, Chichester (1989)
Goto, J., Lewis, M., Puterman, M.: Coffee, tea or …? A Markov decision process model for airline meal provisioning. Transp. Sci. 38, 107–118 (2004)
Guo, X., Hernández-Lerma, O.: Continuous-time Markov Decision Processes. Springer, New York (2009)
He, M., Zhao, L., Powell, W.: Optimal control of dosage decisions in controlled ovarian hyperstimulation. Ann. Oper. Res. 223–245 (2010)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov Control Processes. Springer, New York (1996)
Hernández-Lerma, O., Lasserre, J.B.: The linear programming approach. In: Handbook of Markov Decision Processes, pp. 377–408. Kluwer Acad., Boston (2002)
Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Springer, Berlin (1970)
Howard, R.A.: Dynamic Programming and Markov Processes. The Technology Press of MIT, Cambridge (1960)
Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2001)
Martin-Löf, A.: Lectures on the use of control theory in insurance. Scand. Actuar. J. 1, 1–25 (1994)
Meyn, S.: Control Techniques for Complex Networks. Cambridge University Press, Cambridge (2008)
Miyasawa, K.: An economic survival game. Oper. Res. Soc. Jpn. 4, 95–113 (1962)
Peskir, G., Shiryaev, A.: Optimal Stopping and Free-boundary Problems. Birkhäuser, Basel (2006)
Powell, W.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Ross, S.: Introduction to Stochastic Dynamic Programming. Academic Press, New York (1983)
Schäl, M.: Markoffsche Entscheidungsprozesse. Teubner, Stuttgart (1990)
Schäl, M.: On discrete-time dynamic programming in insurance: exponential utility and minimizing the ruin probability. Scand. Actuar. J. 189–210 (2004)
Schmidli, H.: Stochastic Control in Insurance. Springer, London (2008)
Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39, 1095–1100 (1953)
Shiryaev, A.N.: Some new results in the theory of controlled random processes. In: Trans. Fourth Prague Conf. on Information Theory, Statistical Decision Functions Random Processes, Prague, pp. 131–203. Academia, Prague (1965)
Stokey, N.L., Lucas, E.E. Jr.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
Tijms, H.: A First Course in Stochastic Models. Wiley, Chichester (2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
We dedicate this paper to Karl Hinderer who passed away on April 17th, 2010. He established the theory of Markov Decision Processes in Germany 40 years ago.
Rights and permissions
About this article
Cite this article
Bäuerle, N., Rieder, U. Markov Decision Processes. Jahresber. Dtsch. Math. Ver. 112, 217–243 (2010). https://doi.org/10.1365/s13291-010-0007-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1365/s13291-010-0007-2