Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Kawamura, Keigo; Mizukami, Naoki; Tsuruoka, Yoshimasa

doi:10.1007/978-3-319-75931-9_5

Keigo Kawamura¹²,
Naoki Mizukami¹² &
Yoshimasa Tsuruoka¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 818))

Included in the following conference series:

Workshop on Computer Games

1140 Accesses
4 Citations

Abstract

Computing Nash equilibrium solutions is an important problem in the domain of imperfect information games. Counterfactual Regret Minimization+ (CFR+) can be used to (essentially weakly) solve two-player limit Texas Hold’em, but it cannot be applied to large multi-player games due to the problem of space complexity. In this paper, we use Neural Fictitious Self-Play (NFSP) to calculate approximate Nash equilibrium solutions for imperfect information games with more than two players. Although there are no theoretical guarantees of convergence for NFSP in such games, we empirically demonstrate that NFSP enables us to calculate strategy profiles that are significantly less exploitable than random players in simple poker variants with three or more players.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)
Article Google Scholar
Brown, G.W.: Iterative solution of games by fictitious play. Activity Anal. Prod. Allocation 13(1), 374–376 (1951)
MathSciNet MATH Google Scholar
Heinrich, J., Lanctot, M., Silver, D.: Fictitious self-play in extensive-form games. In: Proceedings of ICML. JMLR Workshop and Conference Proceedings, pp. 805–813 (2015)
Google Scholar
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)
Johanson, M., Waugh, K., Bowling, M., Zinkevich, M.: Accelerating best response calculation in large extensive games. In: Proceedings of the 22nd IJCAI, vol. 1, pp. 258–265 (2011)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR (2014)
Google Scholar
Kuhn, H.W.: A simplified two-person poker. Contrib. Theory Games 1, 97–103 (1950)
MathSciNet MATH Google Scholar
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Advances in NIPS 22, pp. 1078–1086 (2009)
Google Scholar
Leslie, D.S., Collins, E.: Generalised weakened fictitious play. Games Econ. Behav. 56(2), 285–298 (2006)
Article MathSciNet MATH Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Risk, N.A., Szafron, D.: Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th AAMAS, vol. 1, pp. 159–166 (2010)
Google Scholar
Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Trans. Autom. Control 50(3), 312–327 (2005)
Article MathSciNet MATH Google Scholar
Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Conference on UAI, UAI 2005, pp. 550–558. AUAI Press, Arlington (2005)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tammelin, O.: Solving large imperfect information games using CFR+. arXiv:1407.5042 (2014)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Article MathSciNet MATH Google Scholar
Watkins, C.J., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
MATH Google Scholar
Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in NIPS 20, pp. 1729–1736. Curran Associates, Inc. (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Keigo Kawamura, Naoki Mizukami & Yoshimasa Tsuruoka

Authors

Keigo Kawamura
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Mizukami
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Tsuruoka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keigo Kawamura .

Editor information

Editors and Affiliations

Université Paris-Dauphine, Paris, France
Tristan Cazenave
Maastricht University, Maastricht, The Netherlands
Mark H.M. Winands
The University of New South Wales, Sydney, New South Wales, Australia
Abdallah Saffidine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawamura, K., Mizukami, N., Tsuruoka, Y. (2018). Neural Fictitious Self-Play in Imperfect Information Games with Many Players. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-75931-9_5
Published: 15 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75930-2
Online ISBN: 978-3-319-75931-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics