This chapter studies a discrete time Markov decision process with the total reward criterion, where the state space is countable, the action sets are measurable, the reward function is extended real-valued, and the discount factor β Î (–∞,+∞) may be any real number although β Î [0, 1] used to be required in the literature. Two conditions are presented, which are necessary for studying MDPs and are weaker than those presented in the literature. By eliminating some worst actions, the state space S can be partitioned into subsets S∞, S?∞, S0, on which the optimal value function equals +∞,?∞, or is finite, respectively. Furthermore, the validity of the optimality equation is shown when its right-hand side is well defined, especially, when it is restricted to the subset S0. The reward function r(i, a) becomes finite and bounded above in a for each i Î S0. Then, the optimal value function is characterized as a solution of the optimality equation in S0 and the structure of optimal policies is studied. Moreover, successive approximation is studied. Finally, some sufficient conditions for the necessary conditions are presented. The method we use here is elementary. In fact, only some basic concepts from MDPs and discrete time Markov chains are used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
(2008). Discretetimemarkovdecisionprocesses: Total Reward. In: Markov Decision Processes With Their Applications. Advances in Mechanics and Mathematics, vol 14. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-36951-8_2
Download citation
DOI: https://doi.org/10.1007/978-0-387-36951-8_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36950-1
Online ISBN: 978-0-387-36951-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)