Markov Decision Processes

  • J. Frédéric BonnansEmail author
Part of the Universitext book series (UTX)


This chapter considers the problem of minimizing the expectation of a reward for a controlled Markov chain process, either with a finite horizon, or an infinite one for which the reward has discounted values, including the cases of exit times and stopping decisions. The value and policy (Howard) iterations are compared. Extensions of these results are provided for problems with expectations constraints, partial observation, and for the ergodic case, limit in some sense of large horizon problems with undiscounted cost.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Inria-Saclay and Centre de Mathématiques AppliquéesÉcole PolytechniquePalaiseauFrance

Personalised recommendations