Skip to main content
Log in

Average criteria in denumerable semi-Markov decision chains under risk-aversion

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

This note concerns with semi-Markov decision chains evolving on a denumerable state space. The system is directed by a risk-averse controller with constant risk-sensitivity, and the performance of a decision policy is measured by a long-run average criterion associated with bounded holding cost rates and one-step cost function. Under mild conditions on the sojourn times and the transition law, restrictions on the cost structure are given to ensure that the optimal average cost can be characterized via a bounded solution of the optimality equation. Such a result is used to establish a general characterization of the optimal average cost in terms of an optimality inequality from which an optimal stationary policy can be derived.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alanís-Durán A, Cavazos-Cadena R (2012) An optimality system for finite average Markov decision chains under risk-aversion. Kybernetika 48:83–104

    MathSciNet  MATH  Google Scholar 

  • Arapostathis A, Borkar VS, Fernández-Gaucherand E, Gosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criteria: A survey. SIAM J Contr Optim 31:282–334

    Article  MATH  Google Scholar 

  • Bäuerle N, Reider U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39(1):105–120

    Article  MathSciNet  MATH  Google Scholar 

  • Bäuerle N, Reider U (2011) Markov Decision ProceN. sses with Applications to Finance. Springer, New York

  • Baykal-Gürsoy M (2010) Semi-Markov Decision Processes. Wiley Encyclopedia of Operations Research and Management Sciences

  • Bhabak A, Saha S (2022) Risk-sensitive semi-Markov decision problems with discounted cost and general utilities. Stat Probab Lett 184. https://doi.org/10.1016/j.spl.2022.109408

  • Biswas A, Pradhan S (2022) Ergodic risk-sensitive control of Markov processes on countable state space revisited. ESAIM Control Optim Calc Var 28:26

  • Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision process with monotone cost. Math Oper Res 27(1):192–209

    Article  MathSciNet  MATH  Google Scholar 

  • Camilo-Garay C, Cavazos-Cadena RR, Cruz-Suárez H (2022) Contractive Approximations in Risk-Sensitive Average Semi-Markov Decision Chains on a Finite State Space. J Optim Theory Appl 192:271–291

    Article  MathSciNet  MATH  Google Scholar 

  • Cavazos-Cadena R (2009) Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria. Math Methods Oper Res 70:541–566

    Article  MathSciNet  MATH  Google Scholar 

  • Cavazos-Cadena R (2016) A Poisson equation for the risk-sensitive average cost in semi-Markov chains. Discrete Event Dyn Syst 26:633–656

    Article  MathSciNet  MATH  Google Scholar 

  • Cavazos-Cadena R (2018) Characterization of the Optimal Risk-Sensitive Average Cost in Denumerable Markov Decision Chains. Math Oper Res 43(3):1025–1050. https://doi.org/10.1287/moor.2017.0893

    Article  MathSciNet  MATH  Google Scholar 

  • Cavazos-Cadena R, Fernández-Gaucherand E (1999) Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. Math Method Optim Res 43(1999):121–139

    MATH  Google Scholar 

  • Chávez-Rodríguez S, Cavazos-Cadena R, Cruz-Suárez HH (2016) Controlled Semi-Markov Chains with Risk-Sensitive Average Cost Criterion. J Optim Theory Appl 170:670–686

    Article  MathSciNet  MATH  Google Scholar 

  • Di Masi GB, Stettner L (1999) Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite horizon. SIAM J Control Optim 38(1):61–78

    Article  MathSciNet  MATH  Google Scholar 

  • Di Masi GB, Stettner L (2000) Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Syst Control Lett 40(1):305–321

    Article  MathSciNet  MATH  Google Scholar 

  • Di Masi GB, Stettner L (2007) Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46(1):231–252

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh MK, Saha S (2014) Risk-sensitive control of continuous time Markov chains. Stochast Int J Probab Stochast Process 86(4):655–675

    Article  MathSciNet  MATH  Google Scholar 

  • Howard AR, Matheson JE (1972) Risk-sensitive Markov decision processes. Manage Sci 18:356–369

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Q, Yue W (2003) Optimal replacement of a system according to a semi-Markov decision process in a semi-Markov environment. Optim Methods Softw 18:181–196

    Article  MathSciNet  MATH  Google Scholar 

  • Huang Y, Lian Z, Guo X (2018) Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria. Adv Appl Prob 50:783–804. https://doi.org/10.1017/apr.2018.36

    Article  MathSciNet  MATH  Google Scholar 

  • Huo H, Wen X (2022) The exponential cost optimality for finite horizon semi-Markov decision processes. Int J Inst Inf Theory Autom 58(3):301–319

    MathSciNet  MATH  Google Scholar 

  • Jaśkiewicz A (2007) Average optimality for risk sensitive control with general state space. Ann App Probab 17(2):654–675

    MathSciNet  MATH  Google Scholar 

  • Luque-Vásquez F, Hernández-Lerma O (1999) Semi-Markov control models with average costs. Applicationes Math 26(3):315–331

    Article  MathSciNet  MATH  Google Scholar 

  • Meyer CD (2000) Matrix Analysis and Applied Linear Algebra. SIAM

  • Munkres J (2014) Topology. Second Edition, Pearson

  • Pinedo M (2008) Scheduling: Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley

  • Royden HL (1988) Real Analysis. Macmillan, London

    MATH  Google Scholar 

  • Sennott L (1999) Stochastic dynamic programming and the control of queueing systems. Wiley-Interscience

  • Sladký K (2008) Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44(2):205–226

    MathSciNet  MATH  Google Scholar 

  • Sladký K (2018) Risk-Sensitive Average Optimality in Markov Decision Processes. Kybernetika 54(6):1218–1230

    MathSciNet  MATH  Google Scholar 

  • Stidham S Jr, Weber RR (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314

    Article  MathSciNet  MATH  Google Scholar 

  • Tijms HC (2003) A first course in stochastic models. Wiley, New York

    Book  MATH  Google Scholar 

  • Wei Q, Chen X (2016) Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Oper Res Lett 44(4):457–462

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are deeply grateful to the reviewers and the Associate Editor for their careful reading of the original manuscript and their helpful suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rolando Cavazos-Cadena or Hugo Cruz-Suárez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 5.2

Let \((x, a)\in \mathbb {K}\) be arbitrary and notice that

$$\begin{aligned}{} & {} E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g_k S_1 + h_k(X_1)}\right| X_0 = x, A_0 = a_k\right] \nonumber \\{} & {} \qquad \qquad \qquad = \sum _{y\in S} p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \end{aligned}$$
(6.12)

by Definition 5.1

  1. (i)

    Now, let \(y\in S\) and \(\varepsilon > 0\) be arbitrary but fixed, and pick \(b\in (0, \infty )\) such that

    $$\begin{aligned} F_{x, a, y}(\cdot )\ \text {is continuous at}\ b\ \text {and}\ F_{x, a, y}(b) \ge 1-\varepsilon . \end{aligned}$$
    (6.13)

    Combining Assumption 2.1(ii) with the tube lemma in Munkres (2014, p. 168) there exists a neighborhood of the action \(a\in A(x)\), say \(V(a)\subset A(x)\), such that \(|\rho _{x,\tilde{a}, y}(t) -\rho _{x,a, y}(t)|\le \varepsilon \) if \(t\in [0, b]\) and \(\tilde{a} \in V( a )\), and in this case, if \(s\in [0, b]\) then

    $$\begin{aligned} \left| \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt- g s\right| \le (\varepsilon + |\tilde{g} - g|)s \le (\varepsilon + |\tilde{g} - g|)b, \end{aligned}$$

    for every \(g,\tilde{g}\in \mathbb {R}\). Combining this last display with the inequality \(|e^x - 1|\le |x|e^{|x|}\), it follows that

    $$\begin{aligned}{} & {} \left| e^{\lambda \left[ \int _0^s \rho _{x,\tilde{a}, y}(t)\, dt -\lambda \tilde{g} s - \int _0^s \rho _{x,a, y}(t)\, dt-\lambda g s\right] } - 1 \right| \\{} & {} \qquad \qquad \le \lambda b (\varepsilon + |g - \tilde{g}|) e^{\lambda b (\varepsilon + |g - \tilde{g}|)},\quad s\in [0, b], \quad \tilde{a} \in V(a). \end{aligned}$$

    Using that \(a_k\rightarrow a\), select a positive integer \(N^* \) such that \(a_k\in V(a)\) for \(k\ge N^*\), so that the above display leads to

    $$\begin{aligned}{} & {} \left| \int _0^b e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ be^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| \\{} & {} \qquad = \left| \int _0^b\left[ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }- e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }\right] F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \le \lambda b^2 (\varepsilon + |g _k- g|) e^{\lambda b (\varepsilon + |g_k - g|)} ,\quad k\ge N^*. \end{aligned}$$

    On the other hand, recalling that \(g, g_k\ge 0\), observe that the inequalities

    $$\begin{aligned} e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }, \ e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} \le e^{\lambda B_\rho } \end{aligned}$$
    (6.14)

    always hold, by (2.4), so that

    $$\begin{aligned}{} & {} \left| \int _b^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _ b^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le 2 e^{B_\rho } (1-F_{x, a_k, y}(b)). \end{aligned}$$

    Recalling that \(\{F_{x, a_k, y}(\cdot )\}_{k\in \mathbb {N}}\) converges weakly to \(F_{x, a, y}(\cdot )\) and that this function is continuous at b, via (5.31) the two last displays together yield that

    $$\begin{aligned}{} & {} \limsup _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a_k, y}(ds)\right| \\{} & {} \qquad \qquad \le \limsup _{k\rightarrow \infty } \left[ \lambda b^2 (\varepsilon + |g_k - g|) e^{\lambda b (\varepsilon + |g_k - \tilde{g}|)} + 2 e^{B_\rho } (1-F_{x, a_k, y}(b))\right] \\{} & {} \qquad \qquad = \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } (1-F_{x, a, y}(b))\le \lambda b^2 \varepsilon e^{\lambda b \varepsilon } + 2 e^{B_\rho } \varepsilon ; \end{aligned}$$

    where (6.13) was used to set the second inequality. Since \(\varepsilon > 0\) is arbitrary, it follows that

    $$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s } F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s } F_{x, a_k, y}(ds)\right| = 0. \end{aligned}$$

    Now, observing that the continuous mapping \(s\mapsto e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }\) is bounded on \(s\in [0,\infty )\), by (2.4) and the nonnegativity of g, Assumption 2.1(iv) implies that

    $$\begin{aligned} \lim _{k\rightarrow \infty } \left| \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda gs} F_{x, a_k, y}(ds) - \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds)\right| = 0, \end{aligned}$$

    and these two last displays together yield that

    $$\begin{aligned} \lim _{k\rightarrow \infty } \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s} F_{x, a_k, y}(ds) = \int _0^ \infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s} F_{x, a, y}(ds). \end{aligned}$$
    (6.15)

    Combining this convergence with (5.31) and the continuity property in Assumption 2.1(ii), after taking the inferior limit as k goes to \(\infty \) in both sides of (6.12) an application of Fatou’s lemma yields that

    $$\begin{aligned}{} & {} \liminf _{k \rightarrow \infty } E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g_k S_1 + h_k(X_1)}\right| X_0 = x, A_0 = a_k\right] \\{} & {} \qquad \qquad = \liminf _{k\rightarrow \infty } \sum _{y\in S} p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad \ge \sum _{y\in S} \liminf _{k\rightarrow \infty } p_{x, y}(a_k)\left( e^{\lambda C(x, a_k, y) + h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = \sum _{y\in S} p_{x, y}(a)\left( e^{\lambda C(x, a, y) + h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a_k, y}(ds)\right) \\{} & {} \qquad \qquad = E\left. \left[ e^{\lambda D(X_0, A_0, X_1, S_1) - \lambda g S_1 + h(X_1)}\right| X_0 = x, A_0 = a\right] , \end{aligned}$$

    where the Definition 5.1(i) was used to set the equality.

  2. (ii)

    Note that, by Definition 5.1(i), the assertion is equivalent to part (i).

  3. (iii)

    Using (6.14) it follows that

    $$\begin{aligned} e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds) \le e^{\lambda (\Vert C\Vert + \Vert h_k\Vert + B_\rho )} \end{aligned}$$

    always holds, whereas Assumption 2.1(ii), (5.31) and (6.15) together imply that

    $$\begin{aligned}{} & {} \lim _{k\rightarrow \infty } e^{\lambda C(x, a_k, y) + \lambda h_k(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a_k, y}(t)\, dt - \lambda g_k s }F_{x, a_k, y}(ds)\\{} & {} \qquad \qquad \qquad \qquad = e^{\lambda C(x, a, y) + \lambda h(y)} \int _0^\infty e^{\lambda \int _0 ^s \rho _{x, a, y}(t)\, dt - \lambda g s }F_{x, a, y}(ds). \end{aligned}$$

    Since \(\lim _{k\rightarrow \infty } p_{x, y}(a_k) = p_{x, y}(a)\), by Asumption 2.1(ii), if \(\sup _k \Vert h_k\Vert < \infty \) then the two last displays allow to use Proposition 18 in (Royden 1988, p. 270) to obtain (5.33).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cavazos-Cadena, R., Cruz-Suárez, H. & Montes-De-Oca, R. Average criteria in denumerable semi-Markov decision chains under risk-aversion. Discrete Event Dyn Syst 33, 221–256 (2023). https://doi.org/10.1007/s10626-023-00376-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-023-00376-w

Keywords

AMS Subject Classifications:

Navigation