Skip to main content
Log in

Zero-sum semi-Markov games with state-action-dependent discount factors

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

Semi-Markov model is one of the most general models for stochastic dynamic systems. This paper deals with a two-person zero-sum game for semi-Markov processes. We focus on the expected discounted criterion with state-action-dependent discount factors. The state and action spaces are both Polish spaces, and the reward rate function is ω-bounded. We first construct a fairly general model of semi-Markov games under a given semi-Markov kernel and a pair of strategies. Next, based on the standard regularity condition and the continuity-compactness condition for semi-Markov games, we derive a “drift condition” on the semi-Markov kernel and suppose that the discount factors have a positive lower bound, under which the existence of the value function and an optimal pair of stationary strategies of our semi-Markov game are proved by using a general Shapley equation. Moreover, in the scenario of finite state and action spaces, a value iteration-type algorithm for approximating the value function and an optimal pair of stationary strategies is developed. The convergence and the error bound of the algorithm are also proved. Finally, we conduct numerical examples to demonstrate the main results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ash R B, Robert B, Doleans-Dade CA, Catherine A (2000) Probability and measure theory. Academic Press

  • Barron EN (2013) Game Theory: An Introduction, 2nd Edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Chen F, Guo X, Liao ZW (2021) Discounted semi-Markov games with incomplete information on one side. arXiv:210707499

  • Fan K (1953) Minimax theorems. Proc Natl Acad Sci U S A 39 (1):42–47

    Article  MathSciNet  MATH  Google Scholar 

  • Feinberg E A, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19(1):152–168

    Article  MathSciNet  MATH  Google Scholar 

  • Filar J, Vrieze K (2012) Competitive Markov Decision Processes. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • González-Hernández J, López-martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45(5):737–754

    MathSciNet  MATH  Google Scholar 

  • González-Sánchez D, Luque-Vásquez F, Minjárez-Sosa J A (2019) Zero-Sum Markov games with random State-Actions-Dependent discount factors: Existence of optimal strategies. Dyn Games Appl 9(1):103–121

    Article  MathSciNet  MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (2012a) Discrete-time markov control processes: basic optimality criteria, vol 30. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (2012b) Further topics on discrete-time markov control processes, vol 42. Springer Science & Business Media

  • Hoffman A J, Karp R M (1966) On nonterminating stochastic games. Manag Sci 12(5):359–370

    Article  MathSciNet  MATH  Google Scholar 

  • Huang Y, Guo X (2011) Finite horizon semi-Markov decision processes with application to maintenance systems. Eur J Oper Res 212(1):131–140

    Article  MathSciNet  MATH  Google Scholar 

  • Huang Y H, Guo X P (2010) Discounted semi-Markov decision processes with nonnegative costs. Acta Math Sinica (Chinese Series) 53:503–514

    MathSciNet  MATH  Google Scholar 

  • Kirman A P, Sobel M J (1974) Dynamic oligopoly with inventories. Econometrica: Journal of the Econometric Society, 279–287

  • Lal A K, Sinha S (1992) Zero-sum two-person semi-Markov games. J Appl Probab 29(1):56–72

    Article  MathSciNet  MATH  Google Scholar 

  • Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning

  • Luque-Vásquez F (2002a) Zero-sum semi-Markov game in Borel spaces with discounted payoff. Morfismos 6(1):15–29

    MATH  Google Scholar 

  • Luque-Vásquez F (2002b) Zero-sum semi-Markov games in Borel spaces: discounted and average payoff. Bol Soc Mat Mexicana 8:227–241

    MathSciNet  MATH  Google Scholar 

  • Minjárez-Sosa J A (2015) Markov control models with unknown random state–action-dependent discount factors. Top 23(3):743–772

    Article  MathSciNet  MATH  Google Scholar 

  • Minjárez-Sosa J A, Luque-Vásquez F (2008) Two person zero-sum semi-Markov games with unknown holding times distribution on one side: a discounted payoff criterion. Appl Math Optim 57(3):289–305

    Article  MathSciNet  MATH  Google Scholar 

  • Mondal P, Sinha S, Neogy S K, Das A K (2016) On discounted AR-AT semi-Markov games and its complementarity formulations. Int J Game Theory 45(3):567–583

    Article  MathSciNet  MATH  Google Scholar 

  • Mondal P, Neogy S, Gupta A, Ghorui D (2020) A policy improvement algorithm for solving a mixture class of perfect information and AR-AT semi-Markov games. Int Game Theory Rev 22(02):2040008

    Article  MathSciNet  MATH  Google Scholar 

  • Nowak A S (1984) On zero-sum stochastic games with general state space I. Probab Math Stat 4(1):13–32

    MathSciNet  MATH  Google Scholar 

  • Pollatschek M, Avi-Itzhak B (1969) Algorithms for stochastic games with geometrical interpretation. Manag Sci 15(7):399–415

    Article  MathSciNet  MATH  Google Scholar 

  • Raut L K (1990) Two-sided altruism lindahl equilibrium and pareto efficiency in overlapping generations models. Department of Economics, University of California

  • Ross SM (1970) Average cost semi-Markov decision processes. J Appl Probability 7(3):649–656

    Article  MathSciNet  MATH  Google Scholar 

  • Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 32(3):179–196

    Article  MathSciNet  MATH  Google Scholar 

  • Shapley L S (1953) Stochastic games. Proc Natl Acad Sci 39 (10):1095–1100

    Article  MathSciNet  MATH  Google Scholar 

  • Tanaka K, Wakuta K (1976) On semi-Markov games. Science Reports of Niigata University Series A, Mathematics 13:55–64

    MathSciNet  MATH  Google Scholar 

  • Vega-Amaya Ó, Luque-Vásquez F, Castro-Enríquez M (2022) Zero-sum average cost semi-markov games with weakly continuous transition probabilities and a minimax semi-markov inventory problem. Acta Appl Math 177(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39(5):369–374

    MathSciNet  MATH  Google Scholar 

  • Wu X, Zhang J (2016) Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst 26(4):669–683

    Article  MathSciNet  MATH  Google Scholar 

  • Ye L, Guo X (2012) Continuous-time Markov decision processes with state-dependent discount factors. Acta Applicandae Math 121(1):5–27

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62073346, 11931018, U1811462), the Guangdong Basic and Applied Basic Research Foundation (2021A1515011984), and the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2020B1212060032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Xia.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix : A: Proofs of Lemmas 1-5

Proof

(Lemma 1) For each fixed (x,a,b) ∈ K, integrating by parts and we have

$$ \begin{array}{@{}rcl@{}} {\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} H(d t | x, a, b) &=&\alpha(x,a,b) {\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} H(t | x, a, b) d t \\ &=&\alpha(x,a,b)\Big[{\int}_{0}^{\theta} e^{-\alpha(x,a,b) t} H(t | x, a, b) d t+{\int}_{\theta}^{\infty} e^{-\alpha(x,a,b) t} H(t |x, a, b) d t\Big] \\ &\leq& \alpha(x,a,b)\Big[(1-\delta){\int}_{0}^{\theta} e^{-\alpha(x,a,b) t} d t+{\int}_{\theta}^{\infty} e^{-\alpha(x,a,b) t} dt\Big] \\ &=&1-\delta \left( 1-e^{-\alpha(x,a,b) \theta}\right)\\ &\leqslant& 1-\delta+\delta e^{-\alpha_{0}\theta}<1. \end{array} $$

Let \(\gamma =1-\delta +\delta e^{-\alpha _{0}\theta }\), which yields (4). □

Proof

(Lemma 2)

By Assumption 2 and formulation (6), for each given function uBω(X) and (x,a,b) ∈ K, we can easily get

$$ |G(u, x, a, b)|\leq\frac{M}{\alpha_{0}}\cdot\omega(x)+\eta\gamma\|u\|_{\omega} \cdot \omega(x). $$

The above inequality yields \(\|G(u, \cdot , a, b)\|_{\omega }\leq \frac {M}{\alpha _{0}}+\eta \gamma \|u\|_{\omega }\), which implies G(u,x,a,b) is in Bω(X), and so TuBω(X).

On the one hand, by Assumption 4, it follows that G(u,x,⋅,b) is upper semi-continuous on A(x), then for each fixed \(\lambda \in \mathbb {B}(x)\), by the Fatou’s theorem, the function

$$ a \longmapsto {\int}_{B(x)} G(u, x, a, b) \lambda(d b) $$

is also upper semi-continuous on A(x). Moreover, since the probability measures on \({\mathscr{B}}(X)\) endowed with the topology of weak convergence, by Theorem 2.8.1 in Ash et al. (2000), the function G(u,x,⋅,λ) is upper semi-continuous on \(\mathbb {A}(x)\). Similarly, G(u,x,μ,⋅) is lower semi-continuous on \(\mathbb {B}(x)\). Thus, by Theorem A.2.3 in Ash et al. (2000), the supremum and the infimum are indeed attained in (3), which means

$$ Tu(x)=\underset{\mu \in \mathbb{A}(x) }{\max}~ \underset{\lambda \in \mathbb{B}(x) }{\min} G(u, x, \mu, \lambda). $$

Then, by the Fan’s minimax Theorem (Fan 1953), we obtain (7).

On the other hand, since \(\mathbb {A}(x)\) and \(\mathbb {B}(x)\) are compact, by the well-known measurable selection theorem for minimax problems (Nowak 1984), there exists a pair of stationary strategies (φ1,φ2) ∈Φ1 ×Φ2 that satisfies (8). □

Proof

(Lemma 3)

First, it is easy to verify that the operator T(φ1,φ2) is monotonically increasing. Let u,vBω(X), by the definition of ω-norm, u(⋅) ≤ v(⋅) + ∥uvωω(⋅), it follows that for each fixed xX, we have

$$ \begin{array}{@{}rcl@{}} T(\varphi_{1},\varphi_{2}) u(x) & \leqslant& T(\varphi_{1},\varphi_{2})(v+\omega \|u-v\|_{\omega})(x) \\ &=&T(\varphi_{1},\varphi_{2})v(x)\\ &&+ \|u-v\|_{\omega} {\int}_{A(x)} {\int}_{B(x)}\Big[{\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} {\int}_{X}\omega (y)Q(d t, d y| x, a, b)\Big] \varphi_{1}(da|x) \varphi_{2}(db|x)\\ &\leqslant& T(\varphi_{1},\varphi_{2})v(x)+\eta\gamma \|u-v\|_{\omega}\omega (x), \end{array} $$
(13)

where the last inequality is followed by formulation (6). Furthermore, taking maximum of φ1 ∈Φ1 and minimum of φ2 ∈Φ2 on both sides of the inequality (13), we have

$$ \underset{\varphi_{1} \in {\Phi}_{1}}{\max} \underset{\varphi_{2}\in {\Phi}_{2}}{\min}T(\varphi_{1},\varphi_{2})u(x)\leq \underset{\varphi_{1} \in {\Phi}_{1}}{\max} \underset{\varphi_{2}\in {\Phi}_{2}}{\min}T(\varphi_{1},\varphi_{2})v(x)+\eta\gamma \|u-v\|_{\omega}\omega (x), $$

i.e.,

$$ Tu(x)\leq Tv(x)+\eta\gamma \|u-v\|_{\omega}\omega (x). $$

Similarly, interchanging u and v, we obtain

$$ Tv(x)\leq Tu(x)+\eta\gamma \|v-u\|_{\omega}\omega (x). $$

Combining the two inequalities above, we have

$$ |Tu(x)-Tv(x)|\leq \eta\gamma \|u-v\|_{\omega}\omega (x) ,\quad \forall x\in X, $$

i.e.,

$$ \|Tu-Tv\|_{\omega} \leq \eta\gamma \|u-v\|_{\omega}, $$

which implies T is a contraction operator with modulus ηγ < 1. Using the same arguments, we can prove that T(φ1,φ2) is also a contraction operator with modulus ηγ < 1. □

Proof

(Lemma 4)

$$ \begin{array}{@{}rcl@{}} {V(x, \pi^{1}, \pi^{2})}&=& \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[{\int}_{0}^{\infty}e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds}r(X(t), A(t), B(t))dt\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{0}^{T_{1}} e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt\Big]\\ &&+ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty} e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt\Big]\\ &=&\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{0}^{\infty}\mathbbm{1}_{\{T_{1}>t\}} e^{-\alpha(x,A_{0},B_{0})t} r(x, A_{0},B_{0})dt\Big]\\ &&+ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\bigg[ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty}e^{-\alpha(x,A_{0},B_{0})T_{1}} e^{-{\int}_{T_{1}}^{t}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt|H_{1}\Big]\bigg]\\ &=& {\int}_{A(x)}{\int}_{B(x)}\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t| x, a, b)\big]r(x,a,b)dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\bigg[e^{-\alpha(x,A_{0},B_{0})T_{1}} \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty} e^{-{\int}_{T_{1}}^{t}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt|H_{1}\Big]\bigg]\\ &=& {\int}_{A(x)}{\int}_{B(x)}\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t| x, a, b)\big]r(x,a,b)dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-\alpha(x,A_{0},B_{0})T_{1}}V(X_{1},^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})\Big]\\ &=& {\int}_{A(x)}{\int}_{B(x)}r(x,a,b)\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t|x, a, b)\big]dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+{\int}_{A(x)}{\int}_{B(x)}\Big [{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}{\int}_{X} V(y,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})Q(d t,dy| x, a, b){\Big]\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &=& {\int}_{A(x)}{\int}_{B(x)}\left\{r(x,a,b)\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t|x, a, b)\big]dt\Big]+\right.\\ &&\left. {\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\Big [{\int}_{X} V(y,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})Q(d t,dy| x, a, b)\Big] \right\}{\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x), \end{array} $$

where the third and fourth equalities are ensured by the property of conditional expectation. The fifth equality follows from the strong Markovian property. Hence,

$$ V(x,\pi^{1}, \pi^{2})=T(\pi_{0}^{1\infty}, \pi_{0}^{2\infty})V(x,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2}), $$

which is required. □

Proof

(Lemma 5)

For ∀n ≥ 1 and xX, we have

$$ \begin{array}{@{}rcl@{}} &&\bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})\Big]\bigg|\\ &=& \bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\bigg|\\ &= &\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-{\int}_{T_{n-1}}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\bigg|\\ &=&\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\big[{\int}_{0}^{\infty}e^{-\alpha(X_{n-1},A_{n-1},B_{n-1})t}{\int}_{X}\omega(y)Q(d t,dy| X_{n-1}, A_{n-1},B_{n-1})\big]\Big]\bigg|\\ &\leq& \eta\gamma\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n-1})\Big]\bigg|, \end{array} $$

where the first and second equalities are ensured by the property of conditional expectation. The last inequality follows from formulation (6). Through iteration we have

$$ \begin{array}{@{}rcl@{}} \bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}u(X_{n})\Big]\bigg|&\leq& \|u\|_{\omega}\bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})\Big]\bigg|\\ &\leq& (\eta\gamma)^{n}\|u\|_{\omega}\omega(x), \end{array} $$

which yields Lemma 5. □

Appendix : B: Proofs of Propositions 1-2

Proof

(Proposition 1)

By the property of conditional expectation and Lemma 1, we have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big] &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-T_{n}}|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-(T_{n}-T_{n-1})}|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\big[{\int}_{0}^{\infty}e^{-t}H(d t| X_{n-1}, A_{n-1},B_{n-1})\big]\Big]\\ &\leq& (1-\delta+\delta e^{-\theta})\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\Big], \end{array} $$

where the last inequality follows directly from the proof of Lemma 1 by taking α0 := 1. Through iteration we have,

$$ \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big]\leq (1-\delta+\delta e^{-\theta})^{n}. $$

For any given t > 0, by the Chebychev inequality, we obtain

$$ \mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(T_{n}\leq t)=\mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(e^{-T_{n}}\ge e^{-t}) \leq e^{t}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big]\leq e^{t}(1-\delta+\delta e^{-\theta})^{n}, $$

notice that 1 − δ + δe𝜃 < 1, we have

$$ \mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(\lim\limits_{n\rightarrow\infty}T_{n}\leq t)=\lim\limits_{n\rightarrow\infty}\mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(T_{n}\leq t)=0, $$

since t is arbitrary, Proposition 1 holds. □

Proof

(Proposition 2)

Using the marginal distribution formula, the corresponding distribution of sojourn time is given as follows,

$$ H(t|x,a,b)=Q(t,+\infty|x,a,b)=1-e^{-\beta (x,a,b)t}. $$

Easy to show that Assumption 2 holds by choosing \(\alpha _{0}=e^{\underline {a}+\underline {b}}\), ω(x) = x2 + 1 and \(M=3\max \limits \{|\bar {a}|,|\underline {a}|,|\bar {b}|,|\underline {b}|,1\}\). Now, let \(\theta =\frac {1}{\alpha _{0}}\ln \left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )\) and \(\delta =\left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )^{-\bar {\beta }/\alpha _{0}}\), then for each (x,a,b) ∈ K,

$$ \begin{array}{@{}rcl@{}} H(\theta|x,a,b)=1-e^{-\beta(x,a,b)\theta}\leq 1-e^{-\bar{\beta}\theta}=1-\delta, \end{array} $$

thus, Assumption 1 holds.

Next we verify Assumption 3. According to Lemma 1 and its proof, we can choose \(\gamma =1-\delta +\delta e^{-\alpha _{0} \theta }\) and further take \(\eta =\frac {2}{1+\gamma }\), then

$$ \begin{array}{@{}rcl@{}} {\int}_{X} \omega(y)Q(t,dy|x,a,b)&=&{\int}_{X} (y^{2}+1){\Phi}\bigg(\frac{1+t}{2+t}y\bigg)F(t)d\bigg(\frac{1+t}{2+t}y\bigg)\\ &=&\Big[\left( \frac{2+t}{1+t}\right)^{2} (\mu^{2}(x,a,b)+\sigma^{2}(x,a,b))+1\Big]F(t)\\ &\leq& \bigg[4\left( \frac{1}{4}x^{2}+\frac{e^{\underline{a}+\underline{b}-1}}{8(\bar{\beta}+e^{\underline{a}+\underline{b}})}\right)+1\bigg] F(t)\\ & \leq& \Big[x^{2}+\frac{\delta\alpha_{0}}{2(\bar{\beta}+\alpha_{0})}+1\Big] F(t)\\ & <& \Big[x^{2}+\frac{1-\gamma}{1+\gamma}+1\Big] F(t)\\ & \leq &\eta \omega(x) H(t|x,a,b), \end{array} $$

which implies that Assumption 3 holds. Finally, since the reward rate r(x,a,b), discount factor α(x,a,b) and semi-Markov kernel Q(t,y|,x,a,b) are continuous on K, Assumption 4 holds. Hence, the SMG of Example 1 has an optimal pair of stationary strategies. □

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Z., Guo, X. & Xia, L. Zero-sum semi-Markov games with state-action-dependent discount factors. Discrete Event Dyn Syst 32, 545–571 (2022). https://doi.org/10.1007/s10626-022-00366-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-022-00366-4

Keywords

Navigation