Average criteria in denumerable semi-Markov decision chains under risk-aversion

Cavazos-Cadena, Rolando; Cruz-Suárez, Hugo; Montes-De-Oca, Raúl

doi:10.1007/s10626-023-00376-w

Average criteria in denumerable semi-Markov decision chains under risk-aversion

Published: 21 August 2023

Volume 33, pages 221–256, (2023)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Rolando Cavazos-Cadena ORCID: orcid.org/0000-0002-0973-9296¹,
Hugo Cruz-Suárez² &
Raúl Montes-De-Oca³

119 Accesses
Explore all metrics

Abstract

This note concerns with semi-Markov decision chains evolving on a denumerable state space. The system is directed by a risk-averse controller with constant risk-sensitivity, and the performance of a decision policy is measured by a long-run average criterion associated with bounded holding cost rates and one-step cost function. Under mild conditions on the sojourn times and the transition law, restrictions on the cost structure are given to ensure that the optimal average cost can be characterized via a bounded solution of the optimality equation. Such a result is used to establish a general characterization of the optimal average cost in terms of an optimality inequality from which an optimal stationary policy can be derived.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space

Article 22 November 2021

Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion

Article 11 March 2016

References

Alanís-Durán A, Cavazos-Cadena R (2012) An optimality system for finite average Markov decision chains under risk-aversion. Kybernetika 48:83–104
MathSciNet MATH Google Scholar
Arapostathis A, Borkar VS, Fernández-Gaucherand E, Gosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criteria: A survey. SIAM J Contr Optim 31:282–334
Article MATH Google Scholar
Bäuerle N, Reider U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39(1):105–120
Article MathSciNet MATH Google Scholar
Bäuerle N, Reider U (2011) Markov Decision ProceN. sses with Applications to Finance. Springer, New York
Baykal-Gürsoy M (2010) Semi-Markov Decision Processes. Wiley Encyclopedia of Operations Research and Management Sciences
Bhabak A, Saha S (2022) Risk-sensitive semi-Markov decision problems with discounted cost and general utilities. Stat Probab Lett 184. https://doi.org/10.1016/j.spl.2022.109408
Biswas A, Pradhan S (2022) Ergodic risk-sensitive control of Markov processes on countable state space revisited. ESAIM Control Optim Calc Var 28:26
Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision process with monotone cost. Math Oper Res 27(1):192–209
Article MathSciNet MATH Google Scholar
Camilo-Garay C, Cavazos-Cadena RR, Cruz-Suárez H (2022) Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space. J Optim Theory Appl 192:271–291
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R (2009) Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria. Math Methods Oper Res 70:541–566
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R (2016) A Poisson equation for the risk-sensitive average cost in semi-Markov chains. Discrete Event Dyn Syst 26:633–656
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R (2018) Characterization of the Optimal Risk-Sensitive Average Cost in Denumerable Markov Decision Chains. Math Oper Res 43(3):1025–1050. https://doi.org/10.1287/moor.2017.0893
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R, Fernández-Gaucherand E (1999) Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. Math Method Optim Res 43(1999):121–139
MATH Google Scholar
Chávez-Rodríguez S, Cavazos-Cadena R, Cruz-Suárez HH (2016) Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion. J Optim Theory Appl 170:670–686
Article MathSciNet MATH Google Scholar
Di Masi GB, Stettner L (1999) Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite horizon. SIAM J Control Optim 38(1):61–78
Article MathSciNet MATH Google Scholar
Di Masi GB, Stettner L (2000) Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Syst Control Lett 40(1):305–321
Article MathSciNet MATH Google Scholar
Di Masi GB, Stettner L (2007) Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46(1):231–252
Article MathSciNet MATH Google Scholar
Ghosh MK, Saha S (2014) Risk-sensitive control of continuous time Markov chains. Stochast Int J Probab Stochast Process 86(4):655–675
Article MathSciNet MATH Google Scholar
Howard AR, Matheson JE (1972) Risk-sensitive Markov decision processes. Manage Sci 18:356–369
Article MathSciNet MATH Google Scholar
Hu Q, Yue W (2003) Optimal replacement of a system according to a semi-Markov decision process in a semi-Markov environment. Optim Methods Softw 18:181–196
Article MathSciNet MATH Google Scholar
Huang Y, Lian Z, Guo X (2018) Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria. Adv Appl Prob 50:783–804. https://doi.org/10.1017/apr.2018.36
Article MathSciNet MATH Google Scholar
Huo H, Wen X (2022) The exponential cost optimality for finite horizon semi-Markov decision processes. Int J Inst Inf Theory Autom 58(3):301–319
MathSciNet MATH Google Scholar
Jaśkiewicz A (2007) Average optimality for risk sensitive control with general state space. Ann App Probab 17(2):654–675
MathSciNet MATH Google Scholar
Luque-Vásquez F, Hernández-Lerma O (1999) Semi-Markov control models with average costs. Applicationes Math 26(3):315–331
Article MathSciNet MATH Google Scholar
Meyer CD (2000) Matrix Analysis and Applied Linear Algebra. SIAM
Munkres J (2014) Topology. Second Edition, Pearson
Pinedo M (2008) Scheduling: Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley
Royden HL (1988) Real Analysis. Macmillan, London
MATH Google Scholar
Sennott L (1999) Stochastic dynamic programming and the control of queueing systems. Wiley-Interscience
Sladký K (2008) Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44(2):205–226
MathSciNet MATH Google Scholar
Sladký K (2018) Risk-Sensitive Average Optimality in Markov Decision Processes. Kybernetika 54(6):1218–1230
MathSciNet MATH Google Scholar
Stidham S Jr, Weber RR (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314
Article MathSciNet MATH Google Scholar
Tijms HC (2003) A first course in stochastic models. Wiley, New York
Book MATH Google Scholar
Wei Q, Chen X (2016) Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Oper Res Lett 44(4):457–462
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are deeply grateful to the reviewers and the Associate Editor for their careful reading of the original manuscript and their helpful suggestions to improve the paper.

Author information

Authors and Affiliations

Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Boulevard Antonio Narro 1923, Buenavista, Saltillo, COAH, 25315, México
Rolando Cavazos-Cadena
Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Ave. San Claudio y Río Verde, Col. San Manuel CU, Puebla, PUE, 72570, México
Hugo Cruz-Suárez
Departamento de Matemáticas, Universidad Autónoma Metropolitana-Iztapalapa, Av. Ferrocaril San Rafael Atlixco 186, Col. Leyes de Reforma Primera Sección, Alcaldía Iztapalapa, CDMX, 09310, México
Raúl Montes-De-Oca

Authors

Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Cruz-Suárez
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Montes-De-Oca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rolando Cavazos-Cadena or Hugo Cruz-Suárez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 5.2

Let $(x, a)\in \mathbb {K}$ be arbitrary and notice that

$$\begin{aligned}{} & {} E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g_k S_1 + h_k(X_1)}\right| X_0 = x, A_0 = a_k\right] \nonumber \\{} & {} \qquad \qquad \qquad = \sum _{y\in S} p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \end{aligned}$$

(6.12)

by Definition 5.1

(i)
Now, let $y\in S$ and $\varepsilon > 0$ be arbitrary but fixed, and pick $b\in (0, \infty )$ such that
$$\begin{aligned} F_{x, a, y}(\cdot )\ \text {is continuous at}\ b\ \text {and}\ F_{x, a, y}(b) \ge 1-\varepsilon . \end{aligned}$$
(6.13)
Combining Assumption 2.1(ii) with the tube lemma in Munkres (2014, p. 168) there exists a neighborhood of the action $a\in A(x)$, say $V(a)\subset A(x)$, such that $|\rho _{x,\tilde{a}, y}(t) -\rho _{x,a, y}(t)|\le \varepsilon $ if $t\in [0, b]$ and $\tilde{a} \in V( a )$, and in this case, if $s\in [0, b]$ then
$$\begin{aligned} \left| \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt- g s\right| \le (\varepsilon + |\tilde{g} - g|)s \le (\varepsilon + |\tilde{g} - g|)b, \end{aligned}$$
for every $g,\tilde{g}\in \mathbb {R}$. Combining this last display with the inequality $|e^x - 1|\le |x|e^{|x|}$, it follows that
$$\begin{aligned}{} & {} \left| e^{\lambda \left[ \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\lambda \tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt-\lambda g s\right] } - 1 \right| \\{} & {} \qquad \qquad \le \lambda b (\varepsilon + |g - \tilde{g}|) e^{\lambda b (\varepsilon + |g - \tilde{g}|)},\quad s\in [0, b], \quad \tilde{a} \in V(a). \end{aligned}$$
Using that $a_k\rightarrow a$, select a positive integer $N^* $ such that $a_k\in V(a)$ for $k\ge N^*$, so that the above display leads to
$$\begin{aligned}{} & {} \left| \int _0^b e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ be^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| \\{} & {} \qquad = \left| \int _0^b\left[ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }- e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }\right] F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \le \lambda b^2 (\varepsilon + |g _k- g|) e^{\lambda b (\varepsilon + |g_k - g|)} ,\quad k\ge N^*. \end{aligned}$$
On the other hand, recalling that $g, g_k\ge 0$, observe that the inequalities
$$\begin{aligned} e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }, \ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} \le e^{\lambda B_\rho } \end{aligned}$$
(6.14)
always hold, by (2.4), so that
$$\begin{aligned}{} & {} \left| \int _b^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _ b^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le 2 e^{B_\rho } (1-F_{x, a_k, y}(b)). \end{aligned}$$
Recalling that $\{F_{x, a_k, y}(\cdot )\}_{k\in \mathbb {N}}$ converges weakly to $F_{x, a, y}(\cdot )$ and that this function is continuous at b, via (5.31) the two last displays together yield that
$$\begin{aligned}{} & {} \limsup _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le \limsup _{k\rightarrow \infty } \left[ \lambda b^2 (\varepsilon + |g_k - g|) e^{\lambda b (\varepsilon + |g_k - \tilde{g}|)} + 2 e^{B_\rho } (1-F_{x, a_k, y}(b))\right] \\{} & {} \qquad \qquad = \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } (1-F_{x, a, y}(b))\le \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } \varepsilon ; \end{aligned}$$
where (6.13) was used to set the second inequality. Since $\varepsilon > 0$ is arbitrary, it follows that
$$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| = 0. \end{aligned}$$
Now, observing that the continuous mapping $s\mapsto e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }$ is bounded on $s\in [0,\infty )$, by (2.4) and the nonnegativity of g, Assumption 2.1(iv) implies that
$$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda gs} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds)\right| = 0, \end{aligned}$$
and these two last displays together yield that
$$\begin{aligned} \lim _{k\rightarrow \infty } \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) = \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds). \end{aligned}$$
(6.15)
Combining this convergence with (5.31) and the continuity property in Assumption 2.1(ii), after taking the inferior limit as k goes to $\infty $ in both sides of (6.12) an application of Fatou’s lemma yields that
$$\begin{aligned}{} & {} \liminf _{k \rightarrow \infty } E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g_k S_1 + h_k(X_1)}\right| X_0 = x, A_0 = a_k\right] \\{} & {} \qquad \qquad = \liminf _{k\rightarrow \infty } \sum _{y\in S} p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad \ge \sum _{y\in S} \liminf _{k\rightarrow \infty } p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = \sum _{y\in S} p_{x, y}(a)\left( e^{\lambda C(x, a, y) + h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g S_1 + h(X_1)}\right| X_0 = x, A_0 = a\right] , \end{aligned}$$
where the Definition 5.1(i) was used to set the equality.
(ii)
Note that, by Definition 5.1(i), the assertion is equivalent to part (i).
(iii)
Using (6.14) it follows that
$$\begin{aligned} e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds) \le e^{\lambda (\Vert C\Vert + \Vert h_k\Vert + B_\rho )} \end{aligned}$$
always holds, whereas Assumption 2.1(ii), (5.31) and (6.15) together imply that
$$\begin{aligned}{} & {} \lim _{k\rightarrow \infty } e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\\{} & {} \qquad \qquad \qquad \qquad = e^{\lambda C(x, a, y) + \lambda h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a, y}(ds). \end{aligned}$$
Since $\lim _{k\rightarrow \infty } p_{x, y}(a_k) = p_{x, y}(a)$, by Asumption 2.1(ii), if $\sup _k \Vert h_k\Vert < \infty $ then the two last displays allow to use Proposition 18 in (Royden 1988, p. 270) to obtain (5.33).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Cruz-Suárez, H. & Montes-De-Oca, R. Average criteria in denumerable semi-Markov decision chains under risk-aversion. Discrete Event Dyn Syst 33, 221–256 (2023). https://doi.org/10.1007/s10626-023-00376-w

Download citation

Received: 26 November 2022
Accepted: 08 April 2023
Published: 21 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10626-023-00376-w

Keywords

AMS Subject Classifications:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Average criteria in denumerable semi-Markov decision chains under risk-aversion

Abstract

Access this article

Similar content being viewed by others

Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space

Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendix

Proof of Lemma 5.2

Rights and permissions

About this article

Cite this article

Keywords

AMS Subject Classifications:

Navigation

Average criteria in denumerable semi-Markov decision chains under risk-aversion

Abstract

Access this article

Similar content being viewed by others

Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space

Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 5.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS Subject Classifications:

Search

Navigation