On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Dufour, F.; Genadot, A.

doi:10.1007/s00245-018-9533-6

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Published: 23 October 2018

Volume 82, pages 433–450, (2020)
Cite this article

Applied Mathematics & Optimization Submit manuscript

F. Dufour¹ &
A. Genadot²

257 Accesses
2 Citations
Explore all metrics

Abstract

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

Article 25 April 2017

First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes

Article 01 July 2022

References

Balder, E.J.: On compactness of the space of policies in stochastic dynamic programming. Stoch. Process. Appl. 32(1), 141–150 (1989)
Article MathSciNet MATH Google Scholar
Balder, E.J.: Existence without explicit compactness in stochastic dynamic programming. Math. Oper. Res. 17(3), 572–580 (1992)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete Time Case. Mathematics in Science and Engineering, vol. 139. Academic Press, New York (1978)
MATH Google Scholar
Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)
Book MATH Google Scholar
Hinderer, K.: Foundations of Non-stationary Dynamic Programming Withdiscrete Time Parameter. Lecture Notes in Operations Research andMathematical Systems, vol. 33. Springer, Berlin (1970)
Book MATH Google Scholar
Jaśkiewicz, A., Matkowski, J., Nowak, A.S.: Generalised discounting in dynamic programming with unbounded returns. Oper. Res. Lett. 42(3), 231–233 (2014)
Article MathSciNet MATH Google Scholar
Jaśkiewicz, A., Nowak, A.S.: Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378(2), 450–462 (2011)
Article MathSciNet MATH Google Scholar
Jaśkiewicz, A., Nowak, A.S.: Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1(2), 253–279 (2011)
Article MathSciNet MATH Google Scholar
Kertz, R.P., Nachman, D.C.: Persistently optimal plans for nonstationary dynamic programming: the topology of weak convergence case. Ann. Probab. 7(5), 811–826 (1979)
Article MathSciNet MATH Google Scholar
Matkowski, J., Nowak, A.S.: On discounted dynamic programming with unbounded returns. Econ. Theory 46(3), 455–474 (2011)
Article MathSciNet MATH Google Scholar
Nowak, A.S.: On the weak topology on a space of probability measures induced by policies. Bull. Pol. Acad. Sci. Math. 36(3–4), 181–186 (1989). 1988
MathSciNet MATH Google Scholar
Schäl, M.: Conditions for optimality in dynamic programming and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie Verw. Geb. 32(3), 179–196 (1975)
MathSciNet MATH Google Scholar
Schäl, M.: On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3(4), 345–364 (1975)
Article MathSciNet MATH Google Scholar
Schäl, M.: On dynamic programming and statistical desision theory. Ann. Stat. 7(2), 432–445 (1979)
Article MathSciNet MATH Google Scholar
Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2), 326–335 (1977)
Article MathSciNet MATH Google Scholar
Zapała, A.M.: Unbounded mappings and weak convergence of measures. Stat. Probab. Lett. 78(6), 698–706 (2008)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut Polytechnique de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, IMB, Institut de Mathématiques de Bordeaux, Université de Bordeaux, Bordeaux, France
F. Dufour
IMB, Institut de Mathématiques de Bordeaux, Université de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, Bordeaux, France
A. Genadot

Authors

F. Dufour
View author publications
You can also search for this author in PubMed Google Scholar
A. Genadot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Genadot.

Appendix A: Balder’s Lemma

We detail here the proof of [1, Lemma 2.4], for the sake of clarity.

Lemma A.1

Let $(\mu _{n})$ be a sequence of probability measures on a metric space Y converging weakly to a probability measure $\mu $. Consider a function $u:Y\rightarrow [-\,\infty ,+\,\infty [$, bounded from above, satisfying the following conditions: for any $\epsilon >0$, there exists a closed subset $Y_{\epsilon }$ of Y such that

$$\begin{aligned} \sup _{n\in \mathbb {N}}\mu _{n}(Y\setminus Y_{\epsilon }) <\epsilon \end{aligned}$$

(A.1)

and the restriction of u on $Y_{\epsilon }$ is upper semicontinuous. Then,

$$\begin{aligned} \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u d\mu _{n} \le \int _{Y} u d\mu . \end{aligned}$$

(A.2)

Proof

Let us assume in a first step that u is also bounded from below. Let $\Vert u\Vert =\sup _{Y}|u|$ and define, for any $\epsilon >0$, the function

$$\begin{aligned} u_\epsilon =u I_{Y_\epsilon } - \Vert u\Vert I_{Y\setminus Y_\epsilon }. \end{aligned}$$

Let us show that $u_\epsilon $ is upper semicontinuous on Y. For $\beta \in \mathbb {R}$, consider the level set

$$\begin{aligned} A_\beta =\{ x\in Y : u_\epsilon (x)< \beta \}=\{x\in Y_\epsilon : u(x)< \beta \}\cup \{x\in Y\setminus Y_\epsilon : -\Vert u\Vert < \beta \}. \end{aligned}$$

Our aim is to show that $A_\beta $ is open. If $\beta \le -\Vert u\Vert $, we clearly have $A_\beta =\emptyset $ which is an open set. Otherwise, we can write

$$\begin{aligned} A_\beta =\{x\in Y_\epsilon : u_\epsilon (x)<\beta \}\cup (Y\setminus Y_\epsilon ). \end{aligned}$$

Since u is upper semicontinuous on $Y_\epsilon $, the level set $\{x\in Y_\epsilon : u_\epsilon (x) < \beta \}$ is open in $Y_\epsilon $, and so there exists an open set O of Y such that

$$\begin{aligned} \{x\in Y_\epsilon : u_\epsilon (x)<\beta \}=Y_\epsilon \cap O. \end{aligned}$$

Thus $A_\beta =(Y_\epsilon \cap O)\cup (Y\setminus Y_\epsilon )$. Let $x\in A_\beta $. If $x\in Y\setminus Y_\epsilon $, by the closedness of $Y_\epsilon $, we can find $\eta >0$ such that $B(x,\eta )\subset Y\setminus Y_\epsilon \subset A_\beta $. Otherwise, $x\in Y_\epsilon \cap O$. In this case, since $x\in O$ which is an open set, we can find $\eta '>0$ such that $B(x,\eta ')\subset O$. Then

$$\begin{aligned} B(x,\eta ')\cap A_\beta= & {} [B(x,\eta ')\cap Y_\epsilon \cap O] \cup [ B(x,\eta ')\cap (Y\setminus Y_\epsilon )]\\= & {} [B(x,\eta ')\cap Y_\epsilon ] \cup [ B(x,\eta ')\cap Y\setminus Y_\epsilon ]=B(x,\eta '). \end{aligned}$$

Thus $B(x,\eta ')\subset A_\beta $ showing that $A_\beta $ is open. This implies that $u_\epsilon $ is upper semicontinuous on Y.

Remark that

$$\begin{aligned} \sup _{n\in \mathbb {N}}\left| \int _Y u d\mu _n -\int _Y u_\epsilon d\mu _n\right| \le 2\epsilon \Vert u\Vert . \end{aligned}$$

Now, using the fact that $u_\epsilon $ is upper semicontinuous and bounded on the whole space Y,

$$\begin{aligned} \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u d\mu _{n} \le \mathop {\overline{\lim }}_{n\rightarrow \infty } \int _{Y} u_\epsilon d\mu _{n}+2\Vert u\Vert \epsilon \le \int _{Y} u_\epsilon d\mu +2\Vert u\Vert \epsilon \le \int _{Y} u d\mu +2\Vert u\Vert \epsilon , \end{aligned}$$

showing the result.

In the case where u is no longer bounded from below, we introduce $u_m=u\vee (-m)$ for which the previous step holds. Then, we apply the monotone convergence theorem to obtain the result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dufour, F., Genadot, A. On the Expected Total Reward with Unbounded Returns for Markov Decision Processes. Appl Math Optim 82, 433–450 (2020). https://doi.org/10.1007/s00245-018-9533-6

Download citation

Published: 23 October 2018
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00245-018-9533-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Abstract

Access this article

Similar content being viewed by others

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Balder’s Lemma

Lemma A.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Abstract

Access this article

Similar content being viewed by others

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Balder’s Lemma

Appendix A: Balder’s Lemma

Lemma A.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation