Advertisement

Faster algorithms for extensive-form game solving via improved smoothing functions

  • Christian KroerEmail author
  • Kevin Waugh
  • Fatma Kılınç-Karzan
  • Tuomas Sandholm
Full Length Paper Series A
  • 147 Downloads

Abstract

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate both the theoretical and practical performance improvement of first-order methods (FOMs) for solving extensive-form games through better design of the dilated entropy function—a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has only a logarithmic dependence on the branching factor of the player. This result improves the overall convergence rate of several FOMs working with dilated entropy function by a factor of \(\Omega (b^dd)\), where b is the branching factor of the player, and d is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than FOMs despite their theoretically inferior convergence rates. Using our new weighting scheme and a practical parameter tuning procedure we show that, for the first time, the excessive gap technique, a classical FOM, can be made faster than the counterfactual regret minimization algorithm in practice for large games, and that the aggressive stepsize scheme of CFR+ is the only reason that the algorithm is faster in practice.

Keywords

Extensive-form game Bilinear saddle-point problem First-order method Nash equilibrium Zero-sum game 

Mathematics Subject Classification

91A05 91A18 90C06 90C25 90C47 65K05 52A41 

Notes

Acknowledgements

The first and last authors are supported by the National Science Foundation under Grants IIS-1617590, IIS-1320620, and IIS-1546752 and the ARO under Awards W911NF-16-1-0061 and W911NF-17-1-0082. The first author is supported by the Facebook Fellowship in Economics and Computation. The third author is supported by the National Science Foundation Grant CMMI 1454548.

Supplementary material

References

  1. 1.
    Bošanskỳ, B., Čermák, J.: Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)Google Scholar
  2. 2.
    Bošanskỳ, B., Kiekintveld, C., Lisý, V., Pěchouček, M.: An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. J. Artif. Intell. Res. 51, 829–866 (2014)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)CrossRefGoogle Scholar
  4. 4.
    Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 432–438 (2016)Google Scholar
  5. 5.
    Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 7–15 (2015)Google Scholar
  6. 6.
    Brown, N., Kroer, C., Sandholm, T.: Dynamic thresholding and pruning for regret minimization. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 421–429 (2017)Google Scholar
  7. 7.
    Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Daskalakis, C., Deckelbaum, A., Kim, A.: Near-optimal no-regret algorithms for zero-sum games. Games Econ. Behav. 92, 327–348 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM 54(5), 25 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gilpin, A., Peña, J., Sandholm, T.: First-order algorithm with \(\cal{O}(\rm ln(1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2001)CrossRefGoogle Scholar
  12. 12.
    Hoda, S., Gilpin, A., Peña, J., Sandholm, T.: Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res. 35(2), 494–512 (2010)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jiang, A., Leyton-Brown, K.: Polynomial-time computation of exact correlated equilibrium in compact games. In: Proceedings of the ACM Conference on Electronic Commerce (EC), pp. 119–126 (2011)Google Scholar
  14. 14.
    Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–148. MIT Press (2012)Google Scholar
  15. 15.
    Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 149–183. MIT Press (2012)Google Scholar
  16. 16.
    Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14(2), 247–259 (1996)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kroer, C., Sandholm, T.: Extensive-form game abstraction with bounds. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 621–638. ACM (2014)Google Scholar
  19. 19.
    Kroer, C., Sandholm, T.: Imperfect-recall abstractions with bounds in games. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 459–476. ACM (2016)Google Scholar
  20. 20.
    Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster first-order methods for extensive-form game solving. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 817–834. ACM (2015)Google Scholar
  21. 21.
    Kroer, C., Farina, G., Sandholm, T.: Smoothing method for approximate extensive-form perfect equilibrium. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2017)Google Scholar
  22. 22.
    Kroer, C., Farina, G., Sandholm, T.: Robust Stackelberg equilibria in extensive-form games and extension to limited lookahead. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)Google Scholar
  23. 23.
    Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1078–1086 (2009)Google Scholar
  24. 24.
    Lanctot, M., Gibson, R., Burch, N., Zinkevich, M., Bowling, M .: No-regret learning in extensive-form games with imperfect recall. In: International Conference on Machine Learning (ICML), pp. 65–72 (2012)Google Scholar
  25. 25.
    Lipton, R., Markakis, E., Mehta, A.: Playing large games using simple strategies. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 36–41. ACM (2003)Google Scholar
  26. 26.
    Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 48–54 (2003)Google Scholar
  27. 27.
    Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh K., Johanson, M., Bowling, M.: Deepstack: expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724
  28. 28.
    Nemirovski, A.: Prox-method with rate of convergence \(\cal{O}(1/t )\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Romanovskii, I.: Reduction of a game with complete memory to a matrix game. Sov. Math. 3, 678–681 (1962)Google Scholar
  33. 33.
    Sandholm, T.: The state of solving large incomplete-information games, and application to poker. AI Magazine, pp. 13–32, special issue on Algorithmic Game Theory (2010)CrossRefGoogle Scholar
  34. 34.
    Shi, J., Littman, M.: Abstraction methods for game theoretic poker. In: CG ’00: Revised Papers from the Second International Conference on Computers and Games, pp. 333–345. Springer, London (2002)CrossRefGoogle Scholar
  35. 35.
    Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 550–558 (2005)Google Scholar
  36. 36.
    Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas hold’em. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 645–652 (2015)Google Scholar
  37. 37.
    von Stengel, B.: Efficient computation of behavior strategies. Games Econ. Behav. 14(2), 220–246 (1996)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Waugh, K., Bagnell, D.: A unified view of large-scale zero-sum equilibrium computation. In: Computer Poker and Imperfect Information Workshop at the AAAI Conference on Artificial Intelligence (AAAI) (2015)Google Scholar
  39. 39.
    Zinkevich, M., Johanson, M., Bowling, M.H., Piccione, C.: Regret minimization in games with incomplete information. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1729–1736 (2007)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2018

Authors and Affiliations

  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  3. 3.Tepper School of BusinessCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations