Abstract
We deal with a two-person zero-sum continuous-time Markov game \(\mathcal {G}\) with denumerable state space, general action spaces, and unbounded payoff and transition rates. We consider noncooperative equilibria for the discounted payoff criterion. We are interested in approximating numerically the value and the optimal strategies of \(\mathcal {G}\). To this end, we propose a definition of a sequence of game models \(\mathcal {G}_n\) converging to \(\mathcal {G}\), which ensures that the value and the optimal strategies of \(\mathcal {G}_n\) converge to those of \(\mathcal {G}\). For numerical purposes, we construct finite state and actions game models \(\mathcal {G}_n\) that can be explicitly solved, and we study the convergence rate of the value of the games. A game model based on a population system illustrates our results.
Similar content being viewed by others
Notes
This is not the usual statement of the Portmanteau theorem. Observe, however, that the function constructed in (Billingsley 1968, Theorem 1.2) is bounded and Lipschitz continuous, and then proceed as in the proof of (Billingsley 1968, Theorem 2.1). Another reference for this result is (Bogachev 2007, Remark 8.3.1).
References
Billingsley P (1968) Convergence of probability measures. Wiley, New York
Bogachev VI (2007) Measure theory, vol II. Springer, New York
Bolley F (2008) Separability and completeness for the Wasserstein distance. In: Séminaire de probabilités XLI. Lecture Notes in Math. 1934, Springer, Berlin, pp 371–377
Chang HS, Hu JQ, Fu MC, Marcus SI (2010) Adaptive adversarial multi-armed bandit approach to two-person zero-sum Markov games. IEEE Trans Automat Control 55:463–468
Frenk JBG, Kassay G, Kolumbán J (2004) On equivalent results in minimax theory Euro. J Oper Res 157:46–58
Guo XP, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J Appl Probab 40:327–345
Guo XP, Hernández-Lerma O (2003) Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans Automat Control 48:236–244
Guo XP, Hernández-Lerma O (2005) Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates. Bernoulli 11:1009–1029
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, New York
Guo XP, Zhang WZ (2014) Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Eur J Oper Res 238:486–496
Jaśkiewicz A, Nowak AS (2006) Approximation of noncooperative semi-Markov games. J Optim Theory Appl 131:115–134
Nowak AS, Altman E (2002) \(\epsilon \)-equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J Control Optim 40:1821–1839
Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072–1090
Prieto-Rumeau T, Hernández-Lerma O (2012) Selected topics on continuous-time controlled Markov chains and Markov games. Imperial College Press, London
Prieto-Rumeau T, Lorenzo JM (2010) Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans Automat Control 55:201–207
Acknowledgments
Research supported by Grant MTM2012-31393 from the Spanish Ministerio de Economía y Competitividad.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prieto-Rumeau, T., Lorenzo, J.M. Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23, 799–836 (2015). https://doi.org/10.1007/s11750-014-0354-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-014-0354-8