Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization

  • Todd W. Neller
  • Steven Hnath
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7168)

Abstract

Using the bluffing dice game Dudo as a challenge domain, we abstract information sets by an imperfect recall of actions. Even with such abstraction, the standard Counterfactual Regret Minimization (CFR) algorithm proves impractical for Dudo, since the number of recursive visits to the same abstracted information sets increase exponentially with the depth of the game graph. By holding strategies fixed across each training iteration, we show how CFR training iterations may be transformed from an exponential-time recursive algorithm into a polynomial-time dynamic-programming algorithm, making computation of an approximate Nash equilibrium for the full 2-player game of Dudo possible for the first time.

Keywords

Directed Acyclic Graph Multiagent System Game Graph Extensive Game Training Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Jacobs, G.: The World’s Best Dice Games, new edn. John N. Hansen Co., Inc., Milbrae (1993)Google Scholar
  3. 3.
    Knizia, R.: Dice Games Properly Explained. Elliot Right-Way Books, Brighton Road, Lower Kingswood, Tadworth, Surrey, KT20 6TD U.K (1999)Google Scholar
  4. 4.
    Koller, D., Megiddo, N., von Stengel, B.: Fast algorithms for finding randomized strategies in game trees. In: Proceedings of the 26th ACM Symposium on Theory of Computing (STOC 1994), pp. 750–759 (1994)Google Scholar
  5. 5.
    Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1078–1086. MIT Press (2009)Google Scholar
  6. 6.
    Mohr, M.S.: The New Games Treasury. Houghton Mifflin, Boston (1993)Google Scholar
  7. 7.
    Risk, N.A., Szafron, D.: Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, vol. 1, pp. 159–166 (2010), http://portal.acm.org/citation.cfm?id=1838206.1838229
  8. 8.
    Waugh, K., Schnizlein, D., Bowling, M.H., Szafron, D.: Abstraction pathologies in extensive games. In: Sierra, C., Castelfranchi, C., Decker, K.S., Sichman, J.S. (eds.) AAMAS (2). pp. 781–788. IFAAMAS (2009)Google Scholar
  9. 9.
    Waugh, K., Zinkevich, M., Johanson, M., Kan, M., Schnizlein, D., Bowling, M.H.: A practical use of imperfect recall. In: Bulitko, V., Beck, J.C. (eds.) SARA. AAAI (2009)Google Scholar
  10. 10.
    Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 1729–1736. MIT Press, Cambridge (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Todd W. Neller
    • 1
  • Steven Hnath
    • 1
  1. 1.Dept. of Computer ScienceGettysburg CollegeGettysburgUSA

Personalised recommendations