Skip to main content
Log in

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

  • Published:
Applied Mathematics & Optimization Submit manuscript

Abstract

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Balder, E.J.: On compactness of the space of policies in stochastic dynamic programming. Stoch. Process. Appl. 32(1), 141–150 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  2. Balder, E.J.: Existence without explicit compactness in stochastic dynamic programming. Math. Oper. Res. 17(3), 572–580 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete Time Case. Mathematics in Science and Engineering, vol. 139. Academic Press, New York (1978)

    MATH  Google Scholar 

  4. Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)

    Book  MATH  Google Scholar 

  5. Hinderer, K.: Foundations of Non-stationary Dynamic Programming Withdiscrete Time Parameter. Lecture Notes in Operations Research andMathematical Systems, vol. 33. Springer, Berlin (1970)

    Book  MATH  Google Scholar 

  6. Jaśkiewicz, A., Matkowski, J., Nowak, A.S.: Generalised discounting in dynamic programming with unbounded returns. Oper. Res. Lett. 42(3), 231–233 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. Jaśkiewicz, A., Nowak, A.S.: Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378(2), 450–462 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Jaśkiewicz, A., Nowak, A.S.: Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1(2), 253–279 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kertz, R.P., Nachman, D.C.: Persistently optimal plans for nonstationary dynamic programming: the topology of weak convergence case. Ann. Probab. 7(5), 811–826 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  10. Matkowski, J., Nowak, A.S.: On discounted dynamic programming with unbounded returns. Econ. Theory 46(3), 455–474 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  11. Nowak, A.S.: On the weak topology on a space of probability measures induced by policies. Bull. Pol. Acad. Sci. Math. 36(3–4), 181–186 (1989). 1988

    MathSciNet  MATH  Google Scholar 

  12. Schäl, M.: Conditions for optimality in dynamic programming and for the limit of \(n\)-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie Verw. Geb. 32(3), 179–196 (1975)

    MathSciNet  MATH  Google Scholar 

  13. Schäl, M.: On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3(4), 345–364 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  14. Schäl, M.: On dynamic programming and statistical desision theory. Ann. Stat. 7(2), 432–445 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  15. Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2), 326–335 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  16. Zapała, A.M.: Unbounded mappings and weak convergence of measures. Stat. Probab. Lett. 78(6), 698–706 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Genadot.

Appendix A: Balder’s Lemma

Appendix A: Balder’s Lemma

We detail here the proof of [1, Lemma 2.4], for the sake of clarity.

Lemma A.1

Let \((\mu _{n})\) be a sequence of probability measures on a metric space Y converging weakly to a probability measure \(\mu \). Consider a function \(u:Y\rightarrow [-\,\infty ,+\,\infty [\), bounded from above, satisfying the following conditions: for any \(\epsilon >0\), there exists a closed subset \(Y_{\epsilon }\) of Y such that

$$\begin{aligned} \sup _{n\in \mathbb {N}}\mu _{n}(Y\setminus Y_{\epsilon }) <\epsilon \end{aligned}$$
(A.1)

and the restriction of u on \(Y_{\epsilon }\) is upper semicontinuous. Then,

$$\begin{aligned} \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u d\mu _{n} \le \int _{Y} u d\mu . \end{aligned}$$
(A.2)

Proof

Let us assume in a first step that u is also bounded from below. Let \(\Vert u\Vert =\sup _{Y}|u|\) and define, for any \(\epsilon >0\), the function

$$\begin{aligned} u_\epsilon =u I_{Y_\epsilon } - \Vert u\Vert I_{Y\setminus Y_\epsilon }. \end{aligned}$$

Let us show that \(u_\epsilon \) is upper semicontinuous on Y. For \(\beta \in \mathbb {R}\), consider the level set

$$\begin{aligned} A_\beta =\{ x\in Y : u_\epsilon (x)< \beta \}=\{x\in Y_\epsilon : u(x)< \beta \}\cup \{x\in Y\setminus Y_\epsilon : -\Vert u\Vert < \beta \}. \end{aligned}$$

Our aim is to show that \(A_\beta \) is open. If \(\beta \le -\Vert u\Vert \), we clearly have \(A_\beta =\emptyset \) which is an open set. Otherwise, we can write

$$\begin{aligned} A_\beta =\{x\in Y_\epsilon : u_\epsilon (x)<\beta \}\cup (Y\setminus Y_\epsilon ). \end{aligned}$$

Since u is upper semicontinuous on \(Y_\epsilon \), the level set \(\{x\in Y_\epsilon : u_\epsilon (x) < \beta \}\) is open in \(Y_\epsilon \), and so there exists an open set O of Y such that

$$\begin{aligned} \{x\in Y_\epsilon : u_\epsilon (x)<\beta \}=Y_\epsilon \cap O. \end{aligned}$$

Thus \(A_\beta =(Y_\epsilon \cap O)\cup (Y\setminus Y_\epsilon )\). Let \(x\in A_\beta \). If \(x\in Y\setminus Y_\epsilon \), by the closedness of \(Y_\epsilon \), we can find \(\eta >0\) such that \(B(x,\eta )\subset Y\setminus Y_\epsilon \subset A_\beta \). Otherwise, \(x\in Y_\epsilon \cap O\). In this case, since \(x\in O\) which is an open set, we can find \(\eta '>0\) such that \(B(x,\eta ')\subset O\). Then

$$\begin{aligned} B(x,\eta ')\cap A_\beta= & {} [B(x,\eta ')\cap Y_\epsilon \cap O] \cup [ B(x,\eta ')\cap (Y\setminus Y_\epsilon )]\\= & {} [B(x,\eta ')\cap Y_\epsilon ] \cup [ B(x,\eta ')\cap Y\setminus Y_\epsilon ]=B(x,\eta '). \end{aligned}$$

Thus \(B(x,\eta ')\subset A_\beta \) showing that \(A_\beta \) is open. This implies that \(u_\epsilon \) is upper semicontinuous on Y.

Remark that

$$\begin{aligned} \sup _{n\in \mathbb {N}}\left| \int _Y u d\mu _n -\int _Y u_\epsilon d\mu _n\right| \le 2\epsilon \Vert u\Vert . \end{aligned}$$

Now, using the fact that \(u_\epsilon \) is upper semicontinuous and bounded on the whole space Y,

$$\begin{aligned} \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u d\mu _{n} \le \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u_\epsilon d\mu _{n}+2\Vert u\Vert \epsilon \le \int _{Y} u_\epsilon d\mu +2\Vert u\Vert \epsilon \le \int _{Y} u d\mu +2\Vert u\Vert \epsilon , \end{aligned}$$

showing the result.

In the case where u is no longer bounded from below, we introduce \(u_m=u\vee (-m)\) for which the previous step holds. Then, we apply the monotone convergence theorem to obtain the result. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dufour, F., Genadot, A. On the Expected Total Reward with Unbounded Returns for Markov Decision Processes. Appl Math Optim 82, 433–450 (2020). https://doi.org/10.1007/s00245-018-9533-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00245-018-9533-6

Keywords

Mathematics Subject Classification

Navigation