# Faster algorithms for extensive-form game solving via improved smoothing functions

- 147 Downloads

## Abstract

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate both the theoretical and practical performance improvement of first-order methods (FOMs) for solving extensive-form games through better design of the dilated entropy function—a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has only a logarithmic dependence on the branching factor of the player. This result improves the overall convergence rate of several FOMs working with dilated entropy function by a factor of \(\Omega (b^dd)\), where *b* is the branching factor of the player, and *d* is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than FOMs despite their theoretically inferior convergence rates. Using our new weighting scheme and a practical parameter tuning procedure we show that, for the first time, the excessive gap technique, a classical FOM, can be made faster than the counterfactual regret minimization algorithm in practice for large games, and that the aggressive stepsize scheme of CFR+ is the only reason that the algorithm is faster in practice.

## Keywords

Extensive-form game Bilinear saddle-point problem First-order method Nash equilibrium Zero-sum game## Mathematics Subject Classification

91A05 91A18 90C06 90C25 90C47 65K05 52A41## Notes

### Acknowledgements

The first and last authors are supported by the National Science Foundation under Grants IIS-1617590, IIS-1320620, and IIS-1546752 and the ARO under Awards W911NF-16-1-0061 and W911NF-17-1-0082. The first author is supported by the Facebook Fellowship in Economics and Computation. The third author is supported by the National Science Foundation Grant CMMI 1454548.

## Supplementary material

## References

- 1.Bošanskỳ, B., Čermák, J.: Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)Google Scholar
- 2.Bošanskỳ, B., Kiekintveld, C., Lisý, V., Pěchouček, M.: An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. J. Artif. Intell. Res.
**51**, 829–866 (2014)MathSciNetCrossRefGoogle Scholar - 3.Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science
**347**(6218), 145–149 (2015)CrossRefGoogle Scholar - 4.Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 432–438 (2016)Google Scholar
- 5.Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 7–15 (2015)Google Scholar
- 6.Brown, N., Kroer, C., Sandholm, T.: Dynamic thresholding and pruning for regret minimization. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 421–429 (2017)Google Scholar
- 7.Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a Nash equilibrium. SIAM J. Comput.
**39**(1), 195–259 (2009)MathSciNetCrossRefGoogle Scholar - 8.Daskalakis, C., Deckelbaum, A., Kim, A.: Near-optimal no-regret algorithms for zero-sum games. Games Econ. Behav.
**92**, 327–348 (2015)MathSciNetCrossRefGoogle Scholar - 9.Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM
**54**(5), 25 (2007)MathSciNetCrossRefGoogle Scholar - 10.Gilpin, A., Peña, J., Sandholm, T.: First-order algorithm with \(\cal{O}(\rm ln(1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program.
**133**(1–2), 279–298 (2012)MathSciNetCrossRefGoogle Scholar - 11.Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2001)CrossRefGoogle Scholar
- 12.Hoda, S., Gilpin, A., Peña, J., Sandholm, T.: Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res.
**35**(2), 494–512 (2010)MathSciNetCrossRefGoogle Scholar - 13.Jiang, A., Leyton-Brown, K.: Polynomial-time computation of exact correlated equilibrium in compact games. In: Proceedings of the ACM Conference on Electronic Commerce (EC), pp. 119–126 (2011)Google Scholar
- 14.Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–148. MIT Press (2012)Google Scholar
- 15.Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 149–183. MIT Press (2012)Google Scholar
- 16.Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst.
**1**(1), 17–58 (2011)MathSciNetCrossRefGoogle Scholar - 17.Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econ. Behav.
**14**(2), 247–259 (1996)MathSciNetCrossRefGoogle Scholar - 18.Kroer, C., Sandholm, T.: Extensive-form game abstraction with bounds. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 621–638. ACM (2014)Google Scholar
- 19.Kroer, C., Sandholm, T.: Imperfect-recall abstractions with bounds in games. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 459–476. ACM (2016)Google Scholar
- 20.Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster first-order methods for extensive-form game solving. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 817–834. ACM (2015)Google Scholar
- 21.Kroer, C., Farina, G., Sandholm, T.: Smoothing method for approximate extensive-form perfect equilibrium. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2017)Google Scholar
- 22.Kroer, C., Farina, G., Sandholm, T.: Robust Stackelberg equilibria in extensive-form games and extension to limited lookahead. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)Google Scholar
- 23.Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1078–1086 (2009)Google Scholar
- 24.Lanctot, M., Gibson, R., Burch, N., Zinkevich, M., Bowling, M .: No-regret learning in extensive-form games with imperfect recall. In: International Conference on Machine Learning (ICML), pp. 65–72 (2012)Google Scholar
- 25.Lipton, R., Markakis, E., Mehta, A.: Playing large games using simple strategies. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 36–41. ACM (2003)Google Scholar
- 26.Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 48–54 (2003)Google Scholar
- 27.Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh K., Johanson, M., Bowling, M.: Deepstack: expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724
- 28.Nemirovski, A.: Prox-method with rate of convergence \(\cal{O}(1/t )\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim.
**15**(1), 229–251 (2004)MathSciNetCrossRefGoogle Scholar - 29.Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim.
**16**(1), 235–249 (2005)MathSciNetCrossRefGoogle Scholar - 30.Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program.
**103**, 127–152 (2005)MathSciNetCrossRefGoogle Scholar - 31.Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program.
**120**(1), 221–259 (2009)MathSciNetCrossRefGoogle Scholar - 32.Romanovskii, I.: Reduction of a game with complete memory to a matrix game. Sov. Math.
**3**, 678–681 (1962)Google Scholar - 33.Sandholm, T.: The state of solving large incomplete-information games, and application to poker. AI Magazine, pp. 13–32, special issue on Algorithmic Game Theory (2010)CrossRefGoogle Scholar
- 34.Shi, J., Littman, M.: Abstraction methods for game theoretic poker. In: CG ’00: Revised Papers from the Second International Conference on Computers and Games, pp. 333–345. Springer, London (2002)CrossRefGoogle Scholar
- 35.Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 550–558 (2005)Google Scholar
- 36.Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas hold’em. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 645–652 (2015)Google Scholar
- 37.von Stengel, B.: Efficient computation of behavior strategies. Games Econ. Behav.
**14**(2), 220–246 (1996)MathSciNetCrossRefGoogle Scholar - 38.Waugh, K., Bagnell, D.: A unified view of large-scale zero-sum equilibrium computation. In: Computer Poker and Imperfect Information Workshop at the AAAI Conference on Artificial Intelligence (AAAI) (2015)Google Scholar
- 39.Zinkevich, M., Johanson, M., Bowling, M.H., Piccione, C.: Regret minimization in games with incomplete information. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1729–1736 (2007)Google Scholar