Abstract
We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.
Similar content being viewed by others
References
Balder, E.J.: On compactness of the space of policies in stochastic dynamic programming. Stoch. Process. Appl. 32(1), 141–150 (1989)
Balder, E.J.: Existence without explicit compactness in stochastic dynamic programming. Math. Oper. Res. 17(3), 572–580 (1992)
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete Time Case. Mathematics in Science and Engineering, vol. 139. Academic Press, New York (1978)
Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)
Hinderer, K.: Foundations of Non-stationary Dynamic Programming Withdiscrete Time Parameter. Lecture Notes in Operations Research andMathematical Systems, vol. 33. Springer, Berlin (1970)
Jaśkiewicz, A., Matkowski, J., Nowak, A.S.: Generalised discounting in dynamic programming with unbounded returns. Oper. Res. Lett. 42(3), 231–233 (2014)
Jaśkiewicz, A., Nowak, A.S.: Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378(2), 450–462 (2011)
Jaśkiewicz, A., Nowak, A.S.: Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1(2), 253–279 (2011)
Kertz, R.P., Nachman, D.C.: Persistently optimal plans for nonstationary dynamic programming: the topology of weak convergence case. Ann. Probab. 7(5), 811–826 (1979)
Matkowski, J., Nowak, A.S.: On discounted dynamic programming with unbounded returns. Econ. Theory 46(3), 455–474 (2011)
Nowak, A.S.: On the weak topology on a space of probability measures induced by policies. Bull. Pol. Acad. Sci. Math. 36(3–4), 181–186 (1989). 1988
Schäl, M.: Conditions for optimality in dynamic programming and for the limit of \(n\)-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie Verw. Geb. 32(3), 179–196 (1975)
Schäl, M.: On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3(4), 345–364 (1975)
Schäl, M.: On dynamic programming and statistical desision theory. Ann. Stat. 7(2), 432–445 (1979)
Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2), 326–335 (1977)
Zapała, A.M.: Unbounded mappings and weak convergence of measures. Stat. Probab. Lett. 78(6), 698–706 (2008)
Author information
Authors and Affiliations
Corresponding author
Appendix A: Balder’s Lemma
Appendix A: Balder’s Lemma
We detail here the proof of [1, Lemma 2.4], for the sake of clarity.
Lemma A.1
Let \((\mu _{n})\) be a sequence of probability measures on a metric space Y converging weakly to a probability measure \(\mu \). Consider a function \(u:Y\rightarrow [-\,\infty ,+\,\infty [\), bounded from above, satisfying the following conditions: for any \(\epsilon >0\), there exists a closed subset \(Y_{\epsilon }\) of Y such that
and the restriction of u on \(Y_{\epsilon }\) is upper semicontinuous. Then,
Proof
Let us assume in a first step that u is also bounded from below. Let \(\Vert u\Vert =\sup _{Y}|u|\) and define, for any \(\epsilon >0\), the function
Let us show that \(u_\epsilon \) is upper semicontinuous on Y. For \(\beta \in \mathbb {R}\), consider the level set
Our aim is to show that \(A_\beta \) is open. If \(\beta \le -\Vert u\Vert \), we clearly have \(A_\beta =\emptyset \) which is an open set. Otherwise, we can write
Since u is upper semicontinuous on \(Y_\epsilon \), the level set \(\{x\in Y_\epsilon : u_\epsilon (x) < \beta \}\) is open in \(Y_\epsilon \), and so there exists an open set O of Y such that
Thus \(A_\beta =(Y_\epsilon \cap O)\cup (Y\setminus Y_\epsilon )\). Let \(x\in A_\beta \). If \(x\in Y\setminus Y_\epsilon \), by the closedness of \(Y_\epsilon \), we can find \(\eta >0\) such that \(B(x,\eta )\subset Y\setminus Y_\epsilon \subset A_\beta \). Otherwise, \(x\in Y_\epsilon \cap O\). In this case, since \(x\in O\) which is an open set, we can find \(\eta '>0\) such that \(B(x,\eta ')\subset O\). Then
Thus \(B(x,\eta ')\subset A_\beta \) showing that \(A_\beta \) is open. This implies that \(u_\epsilon \) is upper semicontinuous on Y.
Remark that
Now, using the fact that \(u_\epsilon \) is upper semicontinuous and bounded on the whole space Y,
showing the result.
In the case where u is no longer bounded from below, we introduce \(u_m=u\vee (-m)\) for which the previous step holds. Then, we apply the monotone convergence theorem to obtain the result. \(\square \)
Rights and permissions
About this article
Cite this article
Dufour, F., Genadot, A. On the Expected Total Reward with Unbounded Returns for Markov Decision Processes. Appl Math Optim 82, 433–450 (2020). https://doi.org/10.1007/s00245-018-9533-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-018-9533-6