Abstract
Semi-Markov model is one of the most general models for stochastic dynamic systems. This paper deals with a two-person zero-sum game for semi-Markov processes. We focus on the expected discounted criterion with state-action-dependent discount factors. The state and action spaces are both Polish spaces, and the reward rate function is ω-bounded. We first construct a fairly general model of semi-Markov games under a given semi-Markov kernel and a pair of strategies. Next, based on the standard regularity condition and the continuity-compactness condition for semi-Markov games, we derive a “drift condition” on the semi-Markov kernel and suppose that the discount factors have a positive lower bound, under which the existence of the value function and an optimal pair of stationary strategies of our semi-Markov game are proved by using a general Shapley equation. Moreover, in the scenario of finite state and action spaces, a value iteration-type algorithm for approximating the value function and an optimal pair of stationary strategies is developed. The convergence and the error bound of the algorithm are also proved. Finally, we conduct numerical examples to demonstrate the main results.
Similar content being viewed by others
References
Ash R B, Robert B, Doleans-Dade CA, Catherine A (2000) Probability and measure theory. Academic Press
Barron EN (2013) Game Theory: An Introduction, 2nd Edn. Wiley, New York
Chen F, Guo X, Liao ZW (2021) Discounted semi-Markov games with incomplete information on one side. arXiv:210707499
Fan K (1953) Minimax theorems. Proc Natl Acad Sci U S A 39 (1):42–47
Feinberg E A, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19(1):152–168
Filar J, Vrieze K (2012) Competitive Markov Decision Processes. Springer Science & Business Media, Berlin
González-Hernández J, López-martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45(5):737–754
González-Sánchez D, Luque-Vásquez F, Minjárez-Sosa J A (2019) Zero-Sum Markov games with random State-Actions-Dependent discount factors: Existence of optimal strategies. Dyn Games Appl 9(1):103–121
Hernández-Lerma O, Lasserre JB (2012a) Discrete-time markov control processes: basic optimality criteria, vol 30. Springer Science & Business Media, Berlin
Hernández-Lerma O, Lasserre JB (2012b) Further topics on discrete-time markov control processes, vol 42. Springer Science & Business Media
Hoffman A J, Karp R M (1966) On nonterminating stochastic games. Manag Sci 12(5):359–370
Huang Y, Guo X (2011) Finite horizon semi-Markov decision processes with application to maintenance systems. Eur J Oper Res 212(1):131–140
Huang Y H, Guo X P (2010) Discounted semi-Markov decision processes with nonnegative costs. Acta Math Sinica (Chinese Series) 53:503–514
Kirman A P, Sobel M J (1974) Dynamic oligopoly with inventories. Econometrica: Journal of the Econometric Society, 279–287
Lal A K, Sinha S (1992) Zero-sum two-person semi-Markov games. J Appl Probab 29(1):56–72
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning
Luque-Vásquez F (2002a) Zero-sum semi-Markov game in Borel spaces with discounted payoff. Morfismos 6(1):15–29
Luque-Vásquez F (2002b) Zero-sum semi-Markov games in Borel spaces: discounted and average payoff. Bol Soc Mat Mexicana 8:227–241
Minjárez-Sosa J A (2015) Markov control models with unknown random state–action-dependent discount factors. Top 23(3):743–772
Minjárez-Sosa J A, Luque-Vásquez F (2008) Two person zero-sum semi-Markov games with unknown holding times distribution on one side: a discounted payoff criterion. Appl Math Optim 57(3):289–305
Mondal P, Sinha S, Neogy S K, Das A K (2016) On discounted AR-AT semi-Markov games and its complementarity formulations. Int J Game Theory 45(3):567–583
Mondal P, Neogy S, Gupta A, Ghorui D (2020) A policy improvement algorithm for solving a mixture class of perfect information and AR-AT semi-Markov games. Int Game Theory Rev 22(02):2040008
Nowak A S (1984) On zero-sum stochastic games with general state space I. Probab Math Stat 4(1):13–32
Pollatschek M, Avi-Itzhak B (1969) Algorithms for stochastic games with geometrical interpretation. Manag Sci 15(7):399–415
Raut L K (1990) Two-sided altruism lindahl equilibrium and pareto efficiency in overlapping generations models. Department of Economics, University of California
Ross SM (1970) Average cost semi-Markov decision processes. J Appl Probability 7(3):649–656
Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 32(3):179–196
Shapley L S (1953) Stochastic games. Proc Natl Acad Sci 39 (10):1095–1100
Tanaka K, Wakuta K (1976) On semi-Markov games. Science Reports of Niigata University Series A, Mathematics 13:55–64
Vega-Amaya Ó, Luque-Vásquez F, Castro-Enríquez M (2022) Zero-sum average cost semi-markov games with weakly continuous transition probabilities and a minimax semi-markov inventory problem. Acta Appl Math 177(1):1–27
Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39(5):369–374
Wu X, Zhang J (2016) Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst 26(4):669–683
Ye L, Guo X (2012) Continuous-time Markov decision processes with state-dependent discount factors. Acta Applicandae Math 121(1):5–27
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (62073346, 11931018, U1811462), the Guangdong Basic and Applied Basic Research Foundation (2021A1515011984), and the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2020B1212060032).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix : A: Proofs of Lemmas 1-5
Proof
(Lemma 1) For each fixed (x,a,b) ∈ K, integrating by parts and we have
Let \(\gamma =1-\delta +\delta e^{-\alpha _{0}\theta }\), which yields (4). □
Proof
(Lemma 2)
By Assumption 2 and formulation (6), for each given function u ∈ Bω(X) and (x,a,b) ∈ K, we can easily get
The above inequality yields \(\|G(u, \cdot , a, b)\|_{\omega }\leq \frac {M}{\alpha _{0}}+\eta \gamma \|u\|_{\omega }\), which implies G(u,x,a,b) is in Bω(X), and so Tu ∈ Bω(X).
On the one hand, by Assumption 4, it follows that G(u,x,⋅,b) is upper semi-continuous on A(x), then for each fixed \(\lambda \in \mathbb {B}(x)\), by the Fatou’s theorem, the function
is also upper semi-continuous on A(x). Moreover, since the probability measures on \({\mathscr{B}}(X)\) endowed with the topology of weak convergence, by Theorem 2.8.1 in Ash et al. (2000), the function G(u,x,⋅,λ) is upper semi-continuous on \(\mathbb {A}(x)\). Similarly, G(u,x,μ,⋅) is lower semi-continuous on \(\mathbb {B}(x)\). Thus, by Theorem A.2.3 in Ash et al. (2000), the supremum and the infimum are indeed attained in (3), which means
Then, by the Fan’s minimax Theorem (Fan 1953), we obtain (7).
On the other hand, since \(\mathbb {A}(x)\) and \(\mathbb {B}(x)\) are compact, by the well-known measurable selection theorem for minimax problems (Nowak 1984), there exists a pair of stationary strategies (φ1,φ2) ∈Φ1 ×Φ2 that satisfies (8). □
Proof
(Lemma 3)
First, it is easy to verify that the operator T(φ1,φ2) is monotonically increasing. Let u,v ∈ Bω(X), by the definition of ω-norm, u(⋅) ≤ v(⋅) + ∥u − v∥ωω(⋅), it follows that for each fixed x ∈ X, we have
where the last inequality is followed by formulation (6). Furthermore, taking maximum of φ1 ∈Φ1 and minimum of φ2 ∈Φ2 on both sides of the inequality (13), we have
i.e.,
Similarly, interchanging u and v, we obtain
Combining the two inequalities above, we have
i.e.,
which implies T is a contraction operator with modulus ηγ < 1. Using the same arguments, we can prove that T(φ1,φ2) is also a contraction operator with modulus ηγ < 1. □
Proof
(Lemma 4)
where the third and fourth equalities are ensured by the property of conditional expectation. The fifth equality follows from the strong Markovian property. Hence,
which is required. □
Proof
(Lemma 5)
For ∀n ≥ 1 and x ∈ X, we have
where the first and second equalities are ensured by the property of conditional expectation. The last inequality follows from formulation (6). Through iteration we have
which yields Lemma 5. □
Appendix : B: Proofs of Propositions 1-2
Proof
(Proposition 1)
By the property of conditional expectation and Lemma 1, we have
where the last inequality follows directly from the proof of Lemma 1 by taking α0 := 1. Through iteration we have,
For any given t > 0, by the Chebychev inequality, we obtain
notice that 1 − δ + δe−𝜃 < 1, we have
since t is arbitrary, Proposition 1 holds. □
Proof
(Proposition 2)
Using the marginal distribution formula, the corresponding distribution of sojourn time is given as follows,
Easy to show that Assumption 2 holds by choosing \(\alpha _{0}=e^{\underline {a}+\underline {b}}\), ω(x) = x2 + 1 and \(M=3\max \limits \{|\bar {a}|,|\underline {a}|,|\bar {b}|,|\underline {b}|,1\}\). Now, let \(\theta =\frac {1}{\alpha _{0}}\ln \left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )\) and \(\delta =\left (1+\frac {\alpha _{0}}{\bar {\beta }}\right )^{-\bar {\beta }/\alpha _{0}}\), then for each (x,a,b) ∈ K,
thus, Assumption 1 holds.
Next we verify Assumption 3. According to Lemma 1 and its proof, we can choose \(\gamma =1-\delta +\delta e^{-\alpha _{0} \theta }\) and further take \(\eta =\frac {2}{1+\gamma }\), then
which implies that Assumption 3 holds. Finally, since the reward rate r(x,a,b), discount factor α(x,a,b) and semi-Markov kernel Q(t,y|,x,a,b) are continuous on K, Assumption 4 holds. Hence, the SMG of Example 1 has an optimal pair of stationary strategies. □
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Z., Guo, X. & Xia, L. Zero-sum semi-Markov games with state-action-dependent discount factors. Discrete Event Dyn Syst 32, 545–571 (2022). https://doi.org/10.1007/s10626-022-00366-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10626-022-00366-4