Abstract
This note concerns with semi-Markov decision chains evolving on a denumerable state space. The system is directed by a risk-averse controller with constant risk-sensitivity, and the performance of a decision policy is measured by a long-run average criterion associated with bounded holding cost rates and one-step cost function. Under mild conditions on the sojourn times and the transition law, restrictions on the cost structure are given to ensure that the optimal average cost can be characterized via a bounded solution of the optimality equation. Such a result is used to establish a general characterization of the optimal average cost in terms of an optimality inequality from which an optimal stationary policy can be derived.
Similar content being viewed by others
References
Alanís-Durán A, Cavazos-Cadena R (2012) An optimality system for finite average Markov decision chains under risk-aversion. Kybernetika 48:83–104
Arapostathis A, Borkar VS, Fernández-Gaucherand E, Gosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criteria: A survey. SIAM J Contr Optim 31:282–334
Bäuerle N, Reider U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39(1):105–120
Bäuerle N, Reider U (2011) Markov Decision ProceN. sses with Applications to Finance. Springer, New York
Baykal-Gürsoy M (2010) Semi-Markov Decision Processes. Wiley Encyclopedia of Operations Research and Management Sciences
Bhabak A, Saha S (2022) Risk-sensitive semi-Markov decision problems with discounted cost and general utilities. Stat Probab Lett 184. https://doi.org/10.1016/j.spl.2022.109408
Biswas A, Pradhan S (2022) Ergodic risk-sensitive control of Markov processes on countable state space revisited. ESAIM Control Optim Calc Var 28:26
Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision process with monotone cost. Math Oper Res 27(1):192–209
Camilo-Garay C, Cavazos-Cadena RR, Cruz-Suárez H (2022) Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space. J Optim Theory Appl 192:271–291
Cavazos-Cadena R (2009) Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria. Math Methods Oper Res 70:541–566
Cavazos-Cadena R (2016) A Poisson equation for the risk-sensitive average cost in semi-Markov chains. Discrete Event Dyn Syst 26:633–656
Cavazos-Cadena R (2018) Characterization of the Optimal Risk-Sensitive Average Cost in Denumerable Markov Decision Chains. Math Oper Res 43(3):1025–1050. https://doi.org/10.1287/moor.2017.0893
Cavazos-Cadena R, Fernández-Gaucherand E (1999) Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. Math Method Optim Res 43(1999):121–139
Chávez-Rodríguez S, Cavazos-Cadena R, Cruz-Suárez HH (2016) Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion. J Optim Theory Appl 170:670–686
Di Masi GB, Stettner L (1999) Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite horizon. SIAM J Control Optim 38(1):61–78
Di Masi GB, Stettner L (2000) Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Syst Control Lett 40(1):305–321
Di Masi GB, Stettner L (2007) Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46(1):231–252
Ghosh MK, Saha S (2014) Risk-sensitive control of continuous time Markov chains. Stochast Int J Probab Stochast Process 86(4):655–675
Howard AR, Matheson JE (1972) Risk-sensitive Markov decision processes. Manage Sci 18:356–369
Hu Q, Yue W (2003) Optimal replacement of a system according to a semi-Markov decision process in a semi-Markov environment. Optim Methods Softw 18:181–196
Huang Y, Lian Z, Guo X (2018) Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria. Adv Appl Prob 50:783–804. https://doi.org/10.1017/apr.2018.36
Huo H, Wen X (2022) The exponential cost optimality for finite horizon semi-Markov decision processes. Int J Inst Inf Theory Autom 58(3):301–319
Jaśkiewicz A (2007) Average optimality for risk sensitive control with general state space. Ann App Probab 17(2):654–675
Luque-Vásquez F, Hernández-Lerma O (1999) Semi-Markov control models with average costs. Applicationes Math 26(3):315–331
Meyer CD (2000) Matrix Analysis and Applied Linear Algebra. SIAM
Munkres J (2014) Topology. Second Edition, Pearson
Pinedo M (2008) Scheduling: Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley
Royden HL (1988) Real Analysis. Macmillan, London
Sennott L (1999) Stochastic dynamic programming and the control of queueing systems. Wiley-Interscience
Sladký K (2008) Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44(2):205–226
Sladký K (2018) Risk-Sensitive Average Optimality in Markov Decision Processes. Kybernetika 54(6):1218–1230
Stidham S Jr, Weber RR (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314
Tijms HC (2003) A first course in stochastic models. Wiley, New York
Wei Q, Chen X (2016) Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Oper Res Lett 44(4):457–462
Acknowledgements
The authors are deeply grateful to the reviewers and the Associate Editor for their careful reading of the original manuscript and their helpful suggestions to improve the paper.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 5.2
Let \((x, a)\in \mathbb {K}\) be arbitrary and notice that
by Definition 5.1
-
(i)
Now, let \(y\in S\) and \(\varepsilon > 0\) be arbitrary but fixed, and pick \(b\in (0, \infty )\) such that
$$\begin{aligned} F_{x, a, y}(\cdot )\ \text {is continuous at}\ b\ \text {and}\ F_{x, a, y}(b) \ge 1-\varepsilon . \end{aligned}$$(6.13)Combining Assumption 2.1(ii) with the tube lemma in Munkres (2014, p. 168) there exists a neighborhood of the action \(a\in A(x)\), say \(V(a)\subset A(x)\), such that \(|\rho _{x,\tilde{a}, y}(t) -\rho _{x,a, y}(t)|\le \varepsilon \) if \(t\in [0, b]\) and \(\tilde{a} \in V( a )\), and in this case, if \(s\in [0, b]\) then
$$\begin{aligned} \left| \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt- g s\right| \le (\varepsilon + |\tilde{g} - g|)s \le (\varepsilon + |\tilde{g} - g|)b, \end{aligned}$$for every \(g,\tilde{g}\in \mathbb {R}\). Combining this last display with the inequality \(|e^x - 1|\le |x|e^{|x|}\), it follows that
$$\begin{aligned}{} & {} \left| e^{\lambda \left[ \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\lambda \tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt-\lambda g s\right] } - 1 \right| \\{} & {} \qquad \qquad \le \lambda b (\varepsilon + |g - \tilde{g}|) e^{\lambda b (\varepsilon + |g - \tilde{g}|)},\quad s\in [0, b], \quad \tilde{a} \in V(a). \end{aligned}$$Using that \(a_k\rightarrow a\), select a positive integer \(N^* \) such that \(a_k\in V(a)\) for \(k\ge N^*\), so that the above display leads to
$$\begin{aligned}{} & {} \left| \int _0^b e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ be^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| \\{} & {} \qquad = \left| \int _0^b\left[ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }- e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }\right] F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \le \lambda b^2 (\varepsilon + |g _k- g|) e^{\lambda b (\varepsilon + |g_k - g|)} ,\quad k\ge N^*. \end{aligned}$$On the other hand, recalling that \(g, g_k\ge 0\), observe that the inequalities
$$\begin{aligned} e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }, \ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} \le e^{\lambda B_\rho } \end{aligned}$$(6.14)always hold, by (2.4), so that
$$\begin{aligned}{} & {} \left| \int _b^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _ b^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le 2 e^{B_\rho } (1-F_{x, a_k, y}(b)). \end{aligned}$$Recalling that \(\{F_{x, a_k, y}(\cdot )\}_{k\in \mathbb {N}}\) converges weakly to \(F_{x, a, y}(\cdot )\) and that this function is continuous at b, via (5.31) the two last displays together yield that
$$\begin{aligned}{} & {} \limsup _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le \limsup _{k\rightarrow \infty } \left[ \lambda b^2 (\varepsilon + |g_k - g|) e^{\lambda b (\varepsilon + |g_k - \tilde{g}|)} + 2 e^{B_\rho } (1-F_{x, a_k, y}(b))\right] \\{} & {} \qquad \qquad = \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } (1-F_{x, a, y}(b))\le \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } \varepsilon ; \end{aligned}$$where (6.13) was used to set the second inequality. Since \(\varepsilon > 0\) is arbitrary, it follows that
$$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| = 0. \end{aligned}$$Now, observing that the continuous mapping \(s\mapsto e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }\) is bounded on \(s\in [0,\infty )\), by (2.4) and the nonnegativity of g, Assumption 2.1(iv) implies that
$$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda gs} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds)\right| = 0, \end{aligned}$$and these two last displays together yield that
$$\begin{aligned} \lim _{k\rightarrow \infty } \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) = \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds). \end{aligned}$$(6.15)Combining this convergence with (5.31) and the continuity property in Assumption 2.1(ii), after taking the inferior limit as k goes to \(\infty \) in both sides of (6.12) an application of Fatou’s lemma yields that
$$\begin{aligned}{} & {} \liminf _{k \rightarrow \infty } E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g_k S_1 + h_k(X_1)}\right| X_0 = x, A_0 = a_k\right] \\{} & {} \qquad \qquad = \liminf _{k\rightarrow \infty } \sum _{y\in S} p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad \ge \sum _{y\in S} \liminf _{k\rightarrow \infty } p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = \sum _{y\in S} p_{x, y}(a)\left( e^{\lambda C(x, a, y) + h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g S_1 + h(X_1)}\right| X_0 = x, A_0 = a\right] , \end{aligned}$$where the Definition 5.1(i) was used to set the equality.
-
(ii)
Note that, by Definition 5.1(i), the assertion is equivalent to part (i).
-
(iii)
Using (6.14) it follows that
$$\begin{aligned} e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds) \le e^{\lambda (\Vert C\Vert + \Vert h_k\Vert + B_\rho )} \end{aligned}$$always holds, whereas Assumption 2.1(ii), (5.31) and (6.15) together imply that
$$\begin{aligned}{} & {} \lim _{k\rightarrow \infty } e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\\{} & {} \qquad \qquad \qquad \qquad = e^{\lambda C(x, a, y) + \lambda h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a, y}(ds). \end{aligned}$$Since \(\lim _{k\rightarrow \infty } p_{x, y}(a_k) = p_{x, y}(a)\), by Asumption 2.1(ii), if \(\sup _k \Vert h_k\Vert < \infty \) then the two last displays allow to use Proposition 18 in (Royden 1988, p. 270) to obtain (5.33).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cavazos-Cadena, R., Cruz-Suárez, H. & Montes-De-Oca, R. Average criteria in denumerable semi-Markov decision chains under risk-aversion. Discrete Event Dyn Syst 33, 221–256 (2023). https://doi.org/10.1007/s10626-023-00376-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10626-023-00376-w
Keywords
- Exponential utility function
- Certainty equivalent
- Total relative cost
- Verification theorem
- Cost structure with bounded support