Zero-sum semi-Markov games with state-action-dependent discount factors

Yu, Zhihui; Guo, Xianping; Xia, Li

doi:10.1007/s10626-022-00366-4

Zero-sum semi-Markov games with state-action-dependent discount factors

Published: 05 November 2022

Volume 32, pages 545–571, (2022)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

223 Accesses
Explore all metrics

Abstract

Semi-Markov model is one of the most general models for stochastic dynamic systems. This paper deals with a two-person zero-sum game for semi-Markov processes. We focus on the expected discounted criterion with state-action-dependent discount factors. The state and action spaces are both Polish spaces, and the reward rate function is ω-bounded. We first construct a fairly general model of semi-Markov games under a given semi-Markov kernel and a pair of strategies. Next, based on the standard regularity condition and the continuity-compactness condition for semi-Markov games, we derive a “drift condition” on the semi-Markov kernel and suppose that the discount factors have a positive lower bound, under which the existence of the value function and an optimal pair of stationary strategies of our semi-Markov game are proved by using a general Shapley equation. Moreover, in the scenario of finite state and action spaces, a value iteration-type algorithm for approximating the value function and an optimal pair of stationary strategies is developed. The convergence and the error bound of the algorithm are also proved. Finally, we conduct numerical examples to demonstrate the main results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Article 03 March 2018

Continuous-Time Zero-Sum Games for Markov Decision Processes with Discounted Risk-Sensitive Cost Criterion

Article 28 May 2021

Constrained Discounted Stochastic Games

Article 13 April 2022

References

Ash R B, Robert B, Doleans-Dade CA, Catherine A (2000) Probability and measure theory. Academic Press
Barron EN (2013) Game Theory: An Introduction, 2nd Edn. Wiley, New York
Book MATH Google Scholar
Chen F, Guo X, Liao ZW (2021) Discounted semi-Markov games with incomplete information on one side. arXiv:210707499
Fan K (1953) Minimax theorems. Proc Natl Acad Sci U S A 39 (1):42–47
Article MathSciNet MATH Google Scholar
Feinberg E A, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19(1):152–168
Article MathSciNet MATH Google Scholar
Filar J, Vrieze K (2012) Competitive Markov Decision Processes. Springer Science & Business Media, Berlin
MATH Google Scholar
González-Hernández J, López-martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45(5):737–754
MathSciNet MATH Google Scholar
González-Sánchez D, Luque-Vásquez F, Minjárez-Sosa J A (2019) Zero-Sum Markov games with random State-Actions-Dependent discount factors: Existence of optimal strategies. Dyn Games Appl 9(1):103–121
Article MathSciNet MATH Google Scholar
Hernández-Lerma O, Lasserre JB (2012a) Discrete-time markov control processes: basic optimality criteria, vol 30. Springer Science & Business Media, Berlin
MATH Google Scholar
Hernández-Lerma O, Lasserre JB (2012b) Further topics on discrete-time markov control processes, vol 42. Springer Science & Business Media
Hoffman A J, Karp R M (1966) On nonterminating stochastic games. Manag Sci 12(5):359–370
Article MathSciNet MATH Google Scholar
Huang Y, Guo X (2011) Finite horizon semi-Markov decision processes with application to maintenance systems. Eur J Oper Res 212(1):131–140
Article MathSciNet MATH Google Scholar
Huang Y H, Guo X P (2010) Discounted semi-Markov decision processes with nonnegative costs. Acta Math Sinica (Chinese Series) 53:503–514
MathSciNet MATH Google Scholar
Kirman A P, Sobel M J (1974) Dynamic oligopoly with inventories. Econometrica: Journal of the Econometric Society, 279–287
Lal A K, Sinha S (1992) Zero-sum two-person semi-Markov games. J Appl Probab 29(1):56–72
Article MathSciNet MATH Google Scholar
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning
Luque-Vásquez F (2002a) Zero-sum semi-Markov game in Borel spaces with discounted payoff. Morfismos 6(1):15–29
MATH Google Scholar
Luque-Vásquez F (2002b) Zero-sum semi-Markov games in Borel spaces: discounted and average payoff. Bol Soc Mat Mexicana 8:227–241
MathSciNet MATH Google Scholar
Minjárez-Sosa J A (2015) Markov control models with unknown random state–action-dependent discount factors. Top 23(3):743–772
Article MathSciNet MATH Google Scholar
Minjárez-Sosa J A, Luque-Vásquez F (2008) Two person zero-sum semi-Markov games with unknown holding times distribution on one side: a discounted payoff criterion. Appl Math Optim 57(3):289–305
Article MathSciNet MATH Google Scholar
Mondal P, Sinha S, Neogy S K, Das A K (2016) On discounted AR-AT semi-Markov games and its complementarity formulations. Int J Game Theory 45(3):567–583
Article MathSciNet MATH Google Scholar
Mondal P, Neogy S, Gupta A, Ghorui D (2020) A policy improvement algorithm for solving a mixture class of perfect information and AR-AT semi-Markov games. Int Game Theory Rev 22(02):2040008
Article MathSciNet MATH Google Scholar
Nowak A S (1984) On zero-sum stochastic games with general state space I. Probab Math Stat 4(1):13–32
MathSciNet MATH Google Scholar
Pollatschek M, Avi-Itzhak B (1969) Algorithms for stochastic games with geometrical interpretation. Manag Sci 15(7):399–415
Article MathSciNet MATH Google Scholar
Raut L K (1990) Two-sided altruism lindahl equilibrium and pareto efficiency in overlapping generations models. Department of Economics, University of California
Ross SM (1970) Average cost semi-Markov decision processes. J Appl Probability 7(3):649–656
Article MathSciNet MATH Google Scholar
Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 32(3):179–196
Article MathSciNet MATH Google Scholar
Shapley L S (1953) Stochastic games. Proc Natl Acad Sci 39 (10):1095–1100
Article MathSciNet MATH Google Scholar
Tanaka K, Wakuta K (1976) On semi-Markov games. Science Reports of Niigata University Series A, Mathematics 13:55–64
MathSciNet MATH Google Scholar
Vega-Amaya Ó, Luque-Vásquez F, Castro-Enríquez M (2022) Zero-sum average cost semi-markov games with weakly continuous transition probabilities and a minimax semi-markov inventory problem. Acta Appl Math 177(1):1–27
Article MathSciNet MATH Google Scholar
Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39(5):369–374
MathSciNet MATH Google Scholar
Wu X, Zhang J (2016) Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst 26(4):669–683
Article MathSciNet MATH Google Scholar
Ye L, Guo X (2012) Continuous-time Markov decision processes with state-dependent discount factors. Acta Applicandae Math 121(1):5–27
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62073346, 11931018, U1811462), the Guangdong Basic and Applied Basic Research Foundation (2021A1515011984), and the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2020B1212060032).

Author information

Authors and Affiliations

School of Business, Sun Yat-sen University, Guangzhou, China
Zhihui Yu & Li Xia
School of Mathematics, Sun Yat-sen University, Guangzhou, China
Xianping Guo
Guangdong Province Key Laboratory of Computational Science, Guangzhou, China
Xianping Guo & Li Xia

Authors

Zhihui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Li Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Xia.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix : A: Proofs of Lemmas 1-5

Proof

(Lemma 1) For each fixed (x,a,b) ∈ K, integrating by parts and we have

$$ \begin{array}{@{}rcl@{}} {\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} H(d t | x, a, b) &=&\alpha(x,a,b) {\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} H(t | x, a, b) d t \\ &=&\alpha(x,a,b)\Big[{\int}_{0}^{\theta} e^{-\alpha(x,a,b) t} H(t | x, a, b) d t+{\int}_{\theta}^{\infty} e^{-\alpha(x,a,b) t} H(t |x, a, b) d t\Big] \\ &\leq& \alpha(x,a,b)\Big[(1-\delta){\int}_{0}^{\theta} e^{-\alpha(x,a,b) t} d t+{\int}_{\theta}^{\infty} e^{-\alpha(x,a,b) t} dt\Big] \\ &=&1-\delta \left( 1-e^{-\alpha(x,a,b) \theta}\right)\\ &\leqslant& 1-\delta+\delta e^{-\alpha_{0}\theta}<1. \end{array} $$

Let $\gamma =1-\delta +\delta e^{-\alpha _{0}\theta }$, which yields (4). □

Proof

(Lemma 2)

By Assumption 2 and formulation (6), for each given function u ∈ B_ω(X) and (x,a,b) ∈ K, we can easily get

$$ |G(u, x, a, b)|\leq\frac{M}{\alpha_{0}}\cdot\omega(x)+\eta\gamma\|u\|_{\omega} \cdot \omega(x). $$

The above inequality yields $\|G(u, \cdot , a, b)\|_{\omega }\leq \frac {M}{\alpha _{0}}+\eta \gamma \|u\|_{\omega }$, which implies G(u,x,a,b) is in B_ω(X), and so Tu ∈ B_ω(X).

On the one hand, by Assumption 4, it follows that G(u,x,⋅,b) is upper semi-continuous on A(x), then for each fixed $\lambda \in \mathbb {B}(x)$, by the Fatou’s theorem, the function

$$ a \longmapsto {\int}_{B(x)} G(u, x, a, b) \lambda(d b) $$

is also upper semi-continuous on A(x). Moreover, since the probability measures on ${\mathscr{B}}(X)$ endowed with the topology of weak convergence, by Theorem 2.8.1 in Ash et al. (2000), the function G(u,x,⋅,λ) is upper semi-continuous on $\mathbb {A}(x)$. Similarly, G(u,x,μ,⋅) is lower semi-continuous on $\mathbb {B}(x)$. Thus, by Theorem A.2.3 in Ash et al. (2000), the supremum and the infimum are indeed attained in (3), which means

$$ Tu(x)=\underset{\mu \in \mathbb{A}(x) }{\max}~ \underset{\lambda \in \mathbb{B}(x) }{\min} G(u, x, \mu, \lambda). $$

Then, by the Fan’s minimax Theorem (Fan 1953), we obtain (7).

On the other hand, since $\mathbb {A}(x)$ and $\mathbb {B}(x)$ are compact, by the well-known measurable selection theorem for minimax problems (Nowak 1984), there exists a pair of stationary strategies (φ₁,φ₂) ∈Φ₁ ×Φ₂ that satisfies (8). □

Proof

(Lemma 3)

First, it is easy to verify that the operator T(φ₁,φ₂) is monotonically increasing. Let u,v ∈ B_ω(X), by the definition of ω-norm, u(⋅) ≤ v(⋅) + ∥u − v∥_ωω(⋅), it follows that for each fixed x ∈ X, we have

$$ \begin{array}{@{}rcl@{}} T(\varphi_{1},\varphi_{2}) u(x) & \leqslant& T(\varphi_{1},\varphi_{2})(v+\omega \|u-v\|_{\omega})(x) \\ &=&T(\varphi_{1},\varphi_{2})v(x)\\ &&+ \|u-v\|_{\omega} {\int}_{A(x)} {\int}_{B(x)}\Big[{\int}_{0}^{\infty} e^{-\alpha(x,a,b) t} {\int}_{X}\omega (y)Q(d t, d y| x, a, b)\Big] \varphi_{1}(da|x) \varphi_{2}(db|x)\\ &\leqslant& T(\varphi_{1},\varphi_{2})v(x)+\eta\gamma \|u-v\|_{\omega}\omega (x), \end{array} $$

(13)

where the last inequality is followed by formulation (6). Furthermore, taking maximum of φ₁ ∈Φ₁ and minimum of φ₂ ∈Φ₂ on both sides of the inequality (13), we have

$$ \underset{\varphi_{1} \in {\Phi}_{1}}{\max} \underset{\varphi_{2}\in {\Phi}_{2}}{\min}T(\varphi_{1},\varphi_{2})u(x)\leq \underset{\varphi_{1} \in {\Phi}_{1}}{\max} \underset{\varphi_{2}\in {\Phi}_{2}}{\min}T(\varphi_{1},\varphi_{2})v(x)+\eta\gamma \|u-v\|_{\omega}\omega (x), $$

i.e.,

$$ Tu(x)\leq Tv(x)+\eta\gamma \|u-v\|_{\omega}\omega (x). $$

Similarly, interchanging u and v, we obtain

$$ Tv(x)\leq Tu(x)+\eta\gamma \|v-u\|_{\omega}\omega (x). $$

Combining the two inequalities above, we have

$$ |Tu(x)-Tv(x)|\leq \eta\gamma \|u-v\|_{\omega}\omega (x) ,\quad \forall x\in X, $$

i.e.,

$$ \|Tu-Tv\|_{\omega} \leq \eta\gamma \|u-v\|_{\omega}, $$

which implies T is a contraction operator with modulus ηγ < 1. Using the same arguments, we can prove that T(φ₁,φ₂) is also a contraction operator with modulus ηγ < 1. □

Proof

(Lemma 4)

$$ \begin{array}{@{}rcl@{}} {V(x, \pi^{1}, \pi^{2})}&=& \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[{\int}_{0}^{\infty}e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds}r(X(t), A(t), B(t))dt\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{0}^{T_{1}} e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt\Big]\\ &&+ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty} e^{-{{\int}_{0}^{t}}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt\Big]\\ &=&\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{0}^{\infty}\mathbbm{1}_{\{T_{1}>t\}} e^{-\alpha(x,A_{0},B_{0})t} r(x, A_{0},B_{0})dt\Big]\\ &&+ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\bigg[ \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty}e^{-\alpha(x,A_{0},B_{0})T_{1}} e^{-{\int}_{T_{1}}^{t}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt|H_{1}\Big]\bigg]\\ &=& {\int}_{A(x)}{\int}_{B(x)}\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t| x, a, b)\big]r(x,a,b)dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\bigg[e^{-\alpha(x,A_{0},B_{0})T_{1}} \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[{\int}_{T_{1}}^{\infty} e^{-{\int}_{T_{1}}^{t}\alpha(X(s),A(s),B(s))ds} r(X(t), A(t), B(t))dt|H_{1}\Big]\bigg]\\ &=& {\int}_{A(x)}{\int}_{B(x)}\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t| x, a, b)\big]r(x,a,b)dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-\alpha(x,A_{0},B_{0})T_{1}}V(X_{1},^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})\Big]\\ &=& {\int}_{A(x)}{\int}_{B(x)}r(x,a,b)\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t|x, a, b)\big]dt\Big] {\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &&+{\int}_{A(x)}{\int}_{B(x)}\Big [{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}{\int}_{X} V(y,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})Q(d t,dy| x, a, b){\Big]\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x)\\ &=& {\int}_{A(x)}{\int}_{B(x)}\left\{r(x,a,b)\Big[{\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\big[1-H(t|x, a, b)\big]dt\Big]+\right.\\ &&\left. {\int}_{0}^{\infty}e^{-\alpha(x,a,b)t}\Big [{\int}_{X} V(y,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2})Q(d t,dy| x, a, b)\Big] \right\}{\pi_{0}^{1}}(da|x) {\pi_{0}^{2}}(db|x), \end{array} $$

where the third and fourth equalities are ensured by the property of conditional expectation. The fifth equality follows from the strong Markovian property. Hence,

$$ V(x,\pi^{1}, \pi^{2})=T(\pi_{0}^{1\infty}, \pi_{0}^{2\infty})V(x,^{(1)}\!\pi^{1},^{(1)}\!\pi^{2}), $$

which is required. □

Proof

(Lemma 5)

For ∀n ≥ 1 and x ∈ X, we have

$$ \begin{array}{@{}rcl@{}} &&\bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})\Big]\bigg|\\ &=& \bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\bigg|\\ &= &\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-{\int}_{T_{n-1}}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\bigg|\\ &=&\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\big[{\int}_{0}^{\infty}e^{-\alpha(X_{n-1},A_{n-1},B_{n-1})t}{\int}_{X}\omega(y)Q(d t,dy| X_{n-1}, A_{n-1},B_{n-1})\big]\Big]\bigg|\\ &\leq& \eta\gamma\bigg|\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-{\int}_{0}^{T_{n-1}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n-1})\Big]\bigg|, \end{array} $$

where the first and second equalities are ensured by the property of conditional expectation. The last inequality follows from formulation (6). Through iteration we have

$$ \begin{array}{@{}rcl@{}} \bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}u(X_{n})\Big]\bigg|&\leq& \|u\|_{\omega}\bigg|\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-{\int}_{0}^{T_{n}}\alpha(X(s),A(s),B(s))ds}\omega(X_{n})\Big]\bigg|\\ &\leq& (\eta\gamma)^{n}\|u\|_{\omega}\omega(x), \end{array} $$

which yields Lemma 5. □

Appendix : B: Proofs of Propositions 1-2

Proof

(Proposition 1)

By the property of conditional expectation and Lemma 1, we have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big] &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-T_{n}}|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\big[e^{-(T_{n}-T_{n-1})}|H_{n-1},A_{n-1},B_{n-1}\big]\Big]\\ &=& \mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\big[{\int}_{0}^{\infty}e^{-t}H(d t| X_{n-1}, A_{n-1},B_{n-1})\big]\Big]\\ &\leq& (1-\delta+\delta e^{-\theta})\mathbb{E}_{x}^{\pi^{1},\pi^{2}}\Big[e^{-T_{n-1}}\Big], \end{array} $$

where the last inequality follows directly from the proof of Lemma 1 by taking α₀ := 1. Through iteration we have,

$$ \mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big]\leq (1-\delta+\delta e^{-\theta})^{n}. $$

For any given t > 0, by the Chebychev inequality, we obtain

$$ \mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(T_{n}\leq t)=\mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(e^{-T_{n}}\ge e^{-t}) \leq e^{t}\mathbb{E}_{x}^{\pi^{1}, \pi^{2}}\Big[e^{-T_{n}}\Big]\leq e^{t}(1-\delta+\delta e^{-\theta})^{n}, $$

notice that 1 − δ + δe^−𝜃 < 1, we have

$$ \mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(\lim\limits_{n\rightarrow\infty}T_{n}\leq t)=\lim\limits_{n\rightarrow\infty}\mathbb{P}_{x}^{\pi^{1}, \pi^{2}}(T_{n}\leq t)=0, $$

since t is arbitrary, Proposition 1 holds. □

Proof

(Proposition 2)

Using the marginal distribution formula, the corresponding distribution of sojourn time is given as follows,

$$ H(t|x,a,b)=Q(t,+\infty|x,a,b)=1-e^{-\beta (x,a,b)t}. $$

Easy to show that Assumption 2 holds by choosing $\alpha _{0}=e^{\underline {a}+\underline {b}}$, ω(x) = x² + 1 and $M=3\max \limits \{|\bar {a}|,|\underline {a}|,|\bar {b}|,|\underline {b}|,1\}$. Now, let $\theta =\frac {1}{\alpha _{0}}\ln \left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )$ and $\delta =\left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )^{-\bar {\beta }/\alpha _{0}}$, then for each (x,a,b) ∈ K,

$$ \begin{array}{@{}rcl@{}} H(\theta|x,a,b)=1-e^{-\beta(x,a,b)\theta}\leq 1-e^{-\bar{\beta}\theta}=1-\delta, \end{array} $$

thus, Assumption 1 holds.

Next we verify Assumption 3. According to Lemma 1 and its proof, we can choose $\gamma =1-\delta +\delta e^{-\alpha _{0} \theta }$ and further take $\eta =\frac {2}{1+\gamma }$, then

$$ \begin{array}{@{}rcl@{}} {\int}_{X} \omega(y)Q(t,dy|x,a,b)&=&{\int}_{X} (y^{2}+1){\Phi}\bigg(\frac{1+t}{2+t}y\bigg)F(t)d\bigg(\frac{1+t}{2+t}y\bigg)\\ &=&\Big[\left( \frac{2+t}{1+t}\right)^{2} (\mu^{2}(x,a,b)+\sigma^{2}(x,a,b))+1\Big]F(t)\\ &\leq& \bigg[4\left( \frac{1}{4}x^{2}+\frac{e^{\underline{a}+\underline{b}-1}}{8(\bar{\beta}+e^{\underline{a}+\underline{b}})}\right)+1\bigg] F(t)\\ & \leq& \Big[x^{2}+\frac{\delta\alpha_{0}}{2(\bar{\beta}+\alpha_{0})}+1\Big] F(t)\\ & <& \Big[x^{2}+\frac{1-\gamma}{1+\gamma}+1\Big] F(t)\\ & \leq &\eta \omega(x) H(t|x,a,b), \end{array} $$

which implies that Assumption 3 holds. Finally, since the reward rate r(x,a,b), discount factor α(x,a,b) and semi-Markov kernel Q(t,y|,x,a,b) are continuous on K, Assumption 4 holds. Hence, the SMG of Example 1 has an optimal pair of stationary strategies. □

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, Z., Guo, X. & Xia, L. Zero-sum semi-Markov games with state-action-dependent discount factors. Discrete Event Dyn Syst 32, 545–571 (2022). https://doi.org/10.1007/s10626-022-00366-4

Download citation

Received: 28 April 2021
Accepted: 04 August 2022
Published: 05 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10626-022-00366-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-sum semi-Markov games with state-action-dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Continuous-Time Zero-Sum Games for Markov Decision Processes with Discounted Risk-Sensitive Cost Criterion

Constrained Discounted Stochastic Games

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix : A: Proofs of Lemmas 1-5

Proof

Proof

Proof

Proof

Proof

Appendix : B: Proofs of Propositions 1-2

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-sum semi-Markov games with state-action-dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Continuous-Time Zero-Sum Games for Markov Decision Processes with Discounted Risk-Sensitive Cost Criterion

Constrained Discounted Stochastic Games

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix : A: Proofs of Lemmas 1-5

Proof

Proof

Proof

Proof

Proof

Appendix : B: Proofs of Propositions 1-2

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation