Abstract.
This note concerns Markov decision processes on a discrete state space. It is supposed that the reward function is nonnegative, and that the decision maker has a nonnull constant risk-sensitivity, which leads to grade random rewards via the expectation of an exponential utility function. The perfomance index is the risk-sensitive expected-total reward criterion, and the existence of approximately optimal stationary policies, in the absolute and relative senses, is studied. The main results, derived under mild conditions, extend classical theorems in risk-neutral positive dynamic programming and can be summarized as follows: Assuming that the optimal value function is finite, it is proved that (i) ε-optimal stationary policies exist when the state and action spaces are both finite, and (ii) this conclusion is extended to the denumerable state space case whenever (a) the decision maker is risk-averse, and (b) the optimal value function is bounded. This latter result is a (weak) risk-sensitive version of a classical theorem formulated by Ornstein (1969).
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: October 1999/Final version received: April 2000
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Montes-de-Oca, R. Nearly optimal policies in risk-sensitive positive dynamic programming on discrete spaces. Mathematical Methods of OR 52, 133–167 (2000). https://doi.org/10.1007/s001860000068
Issue Date:
DOI: https://doi.org/10.1007/s001860000068