Abstract
In this paper, we consider two-person zero-sum discounted Markov games with finite state and action spaces. We show that the Newton-Raphson or policy iteration method as presented by Pollats-chek and Avi-Itzhak does not necessarily converge, contradicting a proof of Rao, Chandrasekaran, and Nair. Moreover, a set of successive approximation algorithms is presented of which Shapley's method and a total-expected-rewards version of Hoffman and Karp's method are the extreme elements.
Similar content being viewed by others
References
Shapley, L.S.,Stochastic Games, Proceedings of the National Academy of Sciences USA, Vol. 39, pp. 1095–1100, 1953.
Zachrisson, L. E.,Markov Games, Advances in Game Theory, Edited by M. Dresher, L. S. Shapley, and A. W. Tucker, Princeton University Press, Princeton, New Jersey, pp. 211–253, 1964.
Charnes, A., andSchroeder, R. G.,On Some Stochastic Tactical Anti-Submarine Games, Naval Research Logistics Quarterly, Vol. 14, pp. 291–311, 1967.
MacQueen, J.,A Modified Dynamic Programming Method for Markovian Decision Problems, Journal of Mathematical Analysis and Applications, Vol. 14, pp. 38–43, 1966.
Porteus, E. L.,Some Bounds for Discounted Sequential Decision Processes, Management Science, Vol. 18, pp. 7–11, 1971.
Van der Wal, J.,Discounted Markov Games; Successive Approximations and Stopping Times, International Journal of Game Theory, Vol. 6, pp. 11–22, 1977.
Howard, R. A.,Dynamic Programming and Markov Processes, MIT Press, Cambridge, Massachusetts, 1960.
Hoffman, A. K., andKarp, R. M.,On Nonterminating Stochastic Games, Management Science, Vol. 12, pp. 359–370, 1966.
Pollatschek, M. A., andAvi-Itzhak, B.,Algorithms for Stochastic Games with Geometrical Interpretation, Management Science, Vol. 15, pp. 399–415, 1969.
Van der Wal, J., andWessels, J.,On Markov Games, Statistica Neerlandica, Vol. 30, pp. 51–71, 1976.
Rao, S. S., Chandrasekaran, R., andNair, K. P. K.,Algorithms for Discounted Stochastic Games, Journal of Optimization Theory and Applications, Vol. 11, pp. 627–637, 1973.
Van Nunen, J. A. E. E.,A Set of Successive Approximation Methods for Discounted Markov Decision Processes, Zeitschrift für Operations Research, Vol. 30, pp. 203–208, 1976.
Van Nunen, J. A. E. E.,Contracting Markov Decision Processes, Mathematical Centre Tract 71, Mathematisch Centrum, Amsterdam, Holland, 1976.
Wessels, J.,Stopping Times and Markov Programming, Transactions of the Seventh Prague Conference and 1974 EMS, Academia, Prague, Czechoslovakia, pp. 575–585, 1977.
Author information
Authors and Affiliations
Additional information
Communicated by R. A. Howard
Rights and permissions
About this article
Cite this article
Van der Wal, J. Discounted Markov games: Generalized policy iteration method. J Optim Theory Appl 25, 125–138 (1978). https://doi.org/10.1007/BF00933260
Issue Date:
DOI: https://doi.org/10.1007/BF00933260