Discounted Markov games: Generalized policy iteration method

Van der Wal, J.

doi:10.1007/BF00933260

Discounted Markov games: Generalized policy iteration method

Contributed Papers
Published: May 1978

Volume 25, pages 125–138, (1978)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

J. Van der Wal¹

356 Accesses
29 Citations
Explore all metrics

Abstract

In this paper, we consider two-person zero-sum discounted Markov games with finite state and action spaces. We show that the Newton-Raphson or policy iteration method as presented by Pollats-chek and Avi-Itzhak does not necessarily converge, contradicting a proof of Rao, Chandrasekaran, and Nair. Moreover, a set of successive approximation algorithms is presented of which Shapley's method and a total-expected-rewards version of Hoffman and Karp's method are the extreme elements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Reward Function Design in Reinforcement Learning

References

Shapley, L.S.,Stochastic Games, Proceedings of the National Academy of Sciences USA, Vol. 39, pp. 1095–1100, 1953.
Google Scholar
Zachrisson, L. E.,Markov Games, Advances in Game Theory, Edited by M. Dresher, L. S. Shapley, and A. W. Tucker, Princeton University Press, Princeton, New Jersey, pp. 211–253, 1964.
Google Scholar
Charnes, A., andSchroeder, R. G.,On Some Stochastic Tactical Anti-Submarine Games, Naval Research Logistics Quarterly, Vol. 14, pp. 291–311, 1967.
Google Scholar
MacQueen, J.,A Modified Dynamic Programming Method for Markovian Decision Problems, Journal of Mathematical Analysis and Applications, Vol. 14, pp. 38–43, 1966.
Google Scholar
Porteus, E. L.,Some Bounds for Discounted Sequential Decision Processes, Management Science, Vol. 18, pp. 7–11, 1971.
Google Scholar
Van der Wal, J.,Discounted Markov Games; Successive Approximations and Stopping Times, International Journal of Game Theory, Vol. 6, pp. 11–22, 1977.
Google Scholar
Howard, R. A.,Dynamic Programming and Markov Processes, MIT Press, Cambridge, Massachusetts, 1960.
Google Scholar
Hoffman, A. K., andKarp, R. M.,On Nonterminating Stochastic Games, Management Science, Vol. 12, pp. 359–370, 1966.
Google Scholar
Pollatschek, M. A., andAvi-Itzhak, B.,Algorithms for Stochastic Games with Geometrical Interpretation, Management Science, Vol. 15, pp. 399–415, 1969.
Google Scholar
Van der Wal, J., andWessels, J.,On Markov Games, Statistica Neerlandica, Vol. 30, pp. 51–71, 1976.
Google Scholar
Rao, S. S., Chandrasekaran, R., andNair, K. P. K.,Algorithms for Discounted Stochastic Games, Journal of Optimization Theory and Applications, Vol. 11, pp. 627–637, 1973.
Google Scholar
Van Nunen, J. A. E. E.,A Set of Successive Approximation Methods for Discounted Markov Decision Processes, Zeitschrift für Operations Research, Vol. 30, pp. 203–208, 1976.
Google Scholar
Van Nunen, J. A. E. E.,Contracting Markov Decision Processes, Mathematical Centre Tract 71, Mathematisch Centrum, Amsterdam, Holland, 1976.
Google Scholar
Wessels, J.,Stopping Times and Markov Programming, Transactions of the Seventh Prague Conference and 1974 EMS, Academia, Prague, Czechoslovakia, pp. 575–585, 1977.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Technological University Eindhoven, Eindhoven, Holland
J. Van der Wal (Research Associate)

Authors

J. Van der Wal
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by R. A. Howard

Rights and permissions

Reprints and permissions

About this article

Cite this article

Van der Wal, J. Discounted Markov games: Generalized policy iteration method. J Optim Theory Appl 25, 125–138 (1978). https://doi.org/10.1007/BF00933260

Download citation

Issue Date: May 1978
DOI: https://doi.org/10.1007/BF00933260

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discounted Markov games: Generalized policy iteration method

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Reward Function Design in Reinforcement Learning

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Discounted Markov games: Generalized policy iteration method

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Reward Function Design in Reinforcement Learning

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation