Skip to main content
Log in

Faster algorithms for extensive-form game solving via improved smoothing functions

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate both the theoretical and practical performance improvement of first-order methods (FOMs) for solving extensive-form games through better design of the dilated entropy function—a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has only a logarithmic dependence on the branching factor of the player. This result improves the overall convergence rate of several FOMs working with dilated entropy function by a factor of \(\Omega (b^dd)\), where b is the branching factor of the player, and d is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than FOMs despite their theoretically inferior convergence rates. Using our new weighting scheme and a practical parameter tuning procedure we show that, for the first time, the excessive gap technique, a classical FOM, can be made faster than the counterfactual regret minimization algorithm in practice for large games, and that the aggressive stepsize scheme of CFR+ is the only reason that the algorithm is faster in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Confirmed through author communication.

  2. This variation uses the current iterate rather than the average iterate due to decreased memory usage. It has inferior practical iteration complexity.

References

  1. Bošanskỳ, B., Čermák, J.: Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

  2. Bošanskỳ, B., Kiekintveld, C., Lisý, V., Pěchouček, M.: An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. J. Artif. Intell. Res. 51, 829–866 (2014)

    Article  MathSciNet  Google Scholar 

  3. Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)

    Article  Google Scholar 

  4. Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 432–438 (2016)

  5. Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 7–15 (2015)

  6. Brown, N., Kroer, C., Sandholm, T.: Dynamic thresholding and pruning for regret minimization. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 421–429 (2017)

  7. Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)

    Article  MathSciNet  Google Scholar 

  8. Daskalakis, C., Deckelbaum, A., Kim, A.: Near-optimal no-regret algorithms for zero-sum games. Games Econ. Behav. 92, 327–348 (2015)

    Article  MathSciNet  Google Scholar 

  9. Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM 54(5), 25 (2007)

    Article  MathSciNet  Google Scholar 

  10. Gilpin, A., Peña, J., Sandholm, T.: First-order algorithm with \(\cal{O}(\rm ln(1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)

    Article  MathSciNet  Google Scholar 

  11. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2001)

    Book  Google Scholar 

  12. Hoda, S., Gilpin, A., Peña, J., Sandholm, T.: Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res. 35(2), 494–512 (2010)

    Article  MathSciNet  Google Scholar 

  13. Jiang, A., Leyton-Brown, K.: Polynomial-time computation of exact correlated equilibrium in compact games. In: Proceedings of the ACM Conference on Electronic Commerce (EC), pp. 119–126 (2011)

  14. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–148. MIT Press (2012)

  15. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 149–183. MIT Press (2012)

  16. Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)

    Article  MathSciNet  Google Scholar 

  17. Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14(2), 247–259 (1996)

    Article  MathSciNet  Google Scholar 

  18. Kroer, C., Sandholm, T.: Extensive-form game abstraction with bounds. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 621–638. ACM (2014)

  19. Kroer, C., Sandholm, T.: Imperfect-recall abstractions with bounds in games. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 459–476. ACM (2016)

  20. Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster first-order methods for extensive-form game solving. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 817–834. ACM (2015)

  21. Kroer, C., Farina, G., Sandholm, T.: Smoothing method for approximate extensive-form perfect equilibrium. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2017)

  22. Kroer, C., Farina, G., Sandholm, T.: Robust Stackelberg equilibria in extensive-form games and extension to limited lookahead. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)

  23. Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1078–1086 (2009)

  24. Lanctot, M., Gibson, R., Burch, N., Zinkevich, M., Bowling, M .: No-regret learning in extensive-form games with imperfect recall. In: International Conference on Machine Learning (ICML), pp. 65–72 (2012)

  25. Lipton, R., Markakis, E., Mehta, A.: Playing large games using simple strategies. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 36–41. ACM (2003)

  26. Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 48–54 (2003)

  27. Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh K., Johanson, M., Bowling, M.: Deepstack: expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724

  28. Nemirovski, A.: Prox-method with rate of convergence \(\cal{O}(1/t )\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  Google Scholar 

  29. Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)

    Article  MathSciNet  Google Scholar 

  30. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  31. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)

    Article  MathSciNet  Google Scholar 

  32. Romanovskii, I.: Reduction of a game with complete memory to a matrix game. Sov. Math. 3, 678–681 (1962)

    Google Scholar 

  33. Sandholm, T.: The state of solving large incomplete-information games, and application to poker. AI Magazine, pp. 13–32, special issue on Algorithmic Game Theory (2010)

    Article  Google Scholar 

  34. Shi, J., Littman, M.: Abstraction methods for game theoretic poker. In: CG ’00: Revised Papers from the Second International Conference on Computers and Games, pp. 333–345. Springer, London (2002)

    Chapter  Google Scholar 

  35. Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 550–558 (2005)

  36. Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas hold’em. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 645–652 (2015)

  37. von Stengel, B.: Efficient computation of behavior strategies. Games Econ. Behav. 14(2), 220–246 (1996)

    Article  MathSciNet  Google Scholar 

  38. Waugh, K., Bagnell, D.: A unified view of large-scale zero-sum equilibrium computation. In: Computer Poker and Imperfect Information Workshop at the AAAI Conference on Artificial Intelligence (AAAI) (2015)

  39. Zinkevich, M., Johanson, M., Bowling, M.H., Piccione, C.: Regret minimization in games with incomplete information. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1729–1736 (2007)

Download references

Acknowledgements

The first and last authors are supported by the National Science Foundation under Grants IIS-1617590, IIS-1320620, and IIS-1546752 and the ARO under Awards W911NF-16-1-0061 and W911NF-17-1-0082. The first author is supported by the Facebook Fellowship in Economics and Computation. The third author is supported by the National Science Foundation Grant CMMI 1454548.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Kroer.

Additional information

A one-page abstract describing a preliminary version of the results in this paper was published at the 18th ACM conference on economics and computation under the title “Theoretical and Practical Advances on Smoothing for Extensive-Form Games”.

A Notation and results described from an EFG perspective

A Notation and results described from an EFG perspective

In the body of the paper we described our results using notation oriented toward the convex-optimization perspective on treeplexes: the results are for general treeplexes, and so we view them as a general convex set. In this section we give an overview of our results from a standard EFG-specific perspective, in the hopes that it may be helpful for researchers who are familiar with that literature, but not the first-order methods literature and convex analysis.

First we define notation, then we give a description of how our treeplex results map onto traditional EFG notation, and finally we give a description of the dilated entropy smoothed best response tree traversal and EGT in terms of this notation.

1.1 A.1 Extensive-form games and the sequence-form

Extensive-form games (EFGs) can be thought of as a game tree, where each node in the tree corresponds to some history of actions taken by all players. Each node belongs to some player, and the actions available to the player at a given node are represented by the branches. Uncertainty is modeled by having a special player, Nature, that moves with some predefined fixed probability distribution over actions at each node belonging to Nature. EFGs model imperfect information by having groups of nodes in information sets, which is a group of nodes all belonging to the same player such that the player cannot distinguish among them. Finally we assume perfect recall, which requires that no player ever forgets their past actions (equivalently, for each information set there is only a single possible last action taken by the player to whom the information set belongs).

Definition 4

A two-player extensive-form game with imperfect information and perfect recall\(\Gamma \) is a tuple \((H, Z, A, P, \sigma _c, \mathscr {I}, u)\) composed of:

  • H: a finite set of possible sequences (or histories) of actions, such that the empty sequence \(\emptyset \in H\), and every prefix z of h in H is also in H.

  • \(Z\subseteq H\): the set of terminal histories, i.e. those sequences that are not a proper prefix of any sequence.

  • A: a function mapping \(h \in H{\setminus } Z\) to the set of available actions at non-terminal history h.

  • P: the player function, mapping each non-terminal history \(h \in H{\setminus } Z\) to \(\{1, 2, c\}\), representing the player who takes action after h. If \(P(h)=c\), the player is Nature.

  • \(\sigma _0\): a function assigning to each \(h\in H{\setminus } Z\) such that \(P(h) = 0\) a probability mass function over A(h).

  • \(\mathscr {I}_i\), for \(i\in \{1,1\}\): partition of \(\{h\in H: P(h)=i\}\) with the property that \(A(h)=A(h')\) for each \(h,h'\) in the same set of the partition. For notational convenience, we will write A(I) to mean A(h) for any of the \(h\in I\), where \(I\in \mathscr {I}_i\). \(\mathcal{I}_i\) is the information partition of player i, while the sets in \(\mathcal{I}_i\) are called the information sets of player i.

  • \(u_i\): utility function mapping \(z\in Z\) to the utility (a real number) gained by player i when the terminal history is reached.

We further assume that all players have perfect recall.

A strategy for a player i is usually represented in behavioral form, which consists of probability distributions over actions at each information set in \(\mathscr {I}_i\). In this paper we will focus on an alternative, but strategically equivalent, representation of the set of strategies, called the sequence form [17, 32, 37]. In the sequence form, actions are instead represented by sequences. A sequence \(\sigma _i\), is an ordered list of actions taken by player i on the path to some history h. In perfect-recall games, all nodes in an information set \(I\in \mathscr {I}_i\) correspond to the same sequence for player i, we let \({\mathrm{seq}}(I)\) denote this sequence. Given a sequence \(\sigma _i\) and an action a that Player i can take immediately after \(\sigma _i\), we let \(\sigma _ia\) denote the resulting new sequence. Instead of directly choosing the probability to put on an action, in the sequence form the probability of playing the entire sequence is chosen, this is called the realization probability and is denoted by \(r_(\sigma _i)\). A choice of realization probabilities for every sequence belonging to Player i is called a realization plan and is denoted \(r_i:\Sigma _i \rightarrow [0,1]^{|\Sigma _i|}\). This representation relies on perfect recall: for any information set \(I\in {\mathscr {I}}_i\) we have that each action \(a\in A(I)\) is uniquely represented by a single sequence \(\sigma _i = {\mathrm{seq}}(I)a\), since \({\mathrm{seq}}(I)\) corresponds to exactly one sequence. In particular, this gives us a simple way to convert any strategy in sequence form to a behavioral strategy: the probability of playing action \(a\in A(I)\) at information set I is simply \(\frac{r_i({\mathrm{seq}}(I)a)}{r_i({\mathrm{seq}}(I))}\).

1.2 A.2 Mapping between convex analysis notation and EFG notation

First we describe the treeplex. A treeplex Q is used to model the sequence-form strategy space of each player. Thus in an EFG, the treeplexes would be the sets of realization probabilities \(\Sigma _1,\Sigma _2\). In the BSPP (1) the typical representation would be that \(\mathcal{X}=\Sigma _1\) and \(\mathcal{Y}= \Sigma _2\). For the remainder we will describe the notation in terms of Player 1 and \(\Sigma _1\). The set of simplexes in \(\Sigma _1\), with indices denoted by \(S_{\Sigma _1}\), is the set of information sets \(\mathscr {I}_1\) where Player 1 acts. The set \(\mathcal{D}_j^i\) of simplexes reached immediately after taking branch i in simplex j is the set of potential information sets where Player 1 may have to act next. Which one is reached of course depends on which actions are taken by Nature and Player 2. Another way to put this is that \(\mathcal{D}_j^i\) corresponds to the set of information sets \(I\in \mathscr {I}_1\) such that \(\sigma _1(I) = \sigma _1\), where \(\sigma _1\) is the sequence corresponding to taking the action i at simplex j. The table below gives an overview of treeplex notation and how it corresponds to EFG notation and concepts. Not all our concepts are easily mapped to existing EFG ideas. For example \(M_Q\) and \(M_{Q,r}\), the maximum \(\ell _1\) norm of Q and the r-depth-limited maximum \(\ell _1\) norm, are still most easily thought of in terms of norms. It is the maximum number of information sets with nonzero probability of being reached when player 1 has to follow a pure strategy while the other player may follow a mixed strategy. Intuitively the maximum \(\ell _1\) norm of \(\Sigma _1\) measures the branching factor associated with observable opponent actions and Nature actions that cause Player 1 to reach different information sets, while not measuring branching factor associated with Player 1 choosing actions at an information set (since the \(\ell _1\) norm sums to one at such information set).

Treeplex notation

EFG meaning

Q

\(\Sigma _1\), the set of realization probabilities

\(S_{\Sigma _1}\)

\(\mathscr {I}_1\), the set of information-set indices into the treeplex \(\Sigma _1\)

\(\mathcal{D}_j^i\)

Set of information sets in \(\Sigma _1\) such that the sequence corresponding to branch i at simplex j is the parent sequence

\(q_{p_j}\)

The parent sequence \(\sigma _1(I_j)\), where \(I_j\) is the information set

 

corresponding to simplex j

\(d_j\)

The length of the longest possible sequence of actions starting at \(I_j\)

\(b_{\Sigma _1}^j\)

The length of \(\sigma _1(I_j)\)

Our dilated entropy construction using the weights described in recurrence (6) can now be described in terms of EFG notation as follows:

$$\begin{aligned} \begin{array}{ll} \alpha _j = 1 + \max _{a\in A_{I_j}}\sum _{k \in \mathcal{D}^a_{I_j}} \frac{\alpha _k\beta _k}{\beta _k - \alpha _k},&{}\quad \forall I_j \in \mathscr {I}_1,\\ \beta _j> \alpha _j,&{}\quad \forall I_j \in \mathscr {I}_1~\text {s.t.}~ length(\sigma _1(I_j)) > 0,\\ \beta _j = \alpha _j,&{}\quad \forall I_j \in \mathscr {I}_1~\text {s.t.}~ length(\sigma _1(I_j)) = 0. \end{array} \end{aligned}$$
(18)

If we then instantiate EGT with this DGF and use Theorem 3 we get the following convergence rate:

$$\begin{aligned} \frac{\max _{\sigma _1 \in \Sigma _1, \sigma _2 \in \Sigma _2}|g_1(\sigma _1,\sigma _2)|\, \sqrt{M_{\Sigma _1}^22^{d_{\Sigma _1}+2}M_{\Sigma _2}^22^{d_{\Sigma _2}+2}}\, \max _{I\in \mathscr {I}}\mathop {\mathrm{log}}|A_I|}{\varepsilon }. \end{aligned}$$
figure d

1.3 A.3 EGT described as a tree traversal

Here we explain how to implement the \(\mathop {\mathrm{Prox}}\) operation when using the dilated entropy function, as well as how to compute smoothed best responses, i.e. \(x_{\mu _1}(y)\) or \(y_{\mu _2}(x)\) in Algorithm 2. Throughout we will present algorithms for computing everything from the perspective of a player trying to minimize their opponent’s utility, rather than maximize their own.

First, given \(y^t\), the gradient for Player 1 is \(Ay^t\) where A is the payoff matrix for Player 2. This can be implemented as follows: create an all-zero vector g of dimension \(|\Sigma _1|\). Traverse the game tree, and for each leaf z add \(\pi _0(z) y^t[\sigma _2(z)] u_2(z)\) at the entry in g corresponding to \(\sigma _1(z)\).

Pseudocode for computing a smoothed best response is given in Algorithm 3. This gives an algorithm for using the dilated entropy function with the negative entropy plus a constant term at each simplex \(\Delta _n\): \(\sum _{x}x\mathop {\mathrm{log}}(x) + \mathop {\mathrm{log}}(n)\). By adding \(\mathop {\mathrm{log}}(n)\) we ensure that the function is never negative; it is zero at \(x_i = \frac{1}{n}\) for all i. Since the constant does not change the second-order derivatives of the dilated entropy function we retain the same strong convexity properties.

The smoothed best response implementation given here modifies the gradient g in place. Thus it is important that g is not used to represent the gradient after a call to the function. However, the modified g can be useful because the entry in g corresponding to the empty sequence then contains the value of the smoothed best response function, which is needed for verifying e.g. the excessive gap condition.

The algorithm for smoothed best response calculation takes as input a vector g which is usually gradient at the current iteration. This vector g is of length equal to the number of sequences for the player. We assume that the sequences are ordered so that \(g[I_{start},I_{end}]\) denotes the subset of g corresponding to entries for all the sequences that have their last action taken at I. Note that in setting the value of of the indices in x corresponding to I, \(x[I_{start},I_{end}]\), we assume that \(\exp \) is an index-wise exponential operator and offset is subtracted from each entry.

Given our implementation for smoothed best responses above, the computation of the proximal operator \(\mathop {\mathrm{Prox}}_{x}(g)\) can be performed easily: it is simply a smoothed best response where we shift the gradient g by \(\nabla \omega (x)\), where x is the point that we use as the prox center. The following algorithm shifts g in place:

figure e

Note that this implementation of the prox mapping does not give the actual value of the objective, only the strategy vector that minimizes the objective. For smoothed best response the objective could be read off the entry of the modified g at the empty sequence, but it does not hold the correct value for the prox mapping. Unlike for smoothed best response, none of our EGT variants rely on the prox objective.

Once these primitives have been implemented, the high-level steps of Algorithms 1 and 2 are easy to implement. First \(\mu _1\) and \(\mu _2\) are set to appropriate initial values (for example via the theory or the \(\mu \)-fitting approach that we use), and initial sequence-form strategies \(x^0,y^0\) are computed for Players 1 and 2 using the above procedures. Then, we just take repeated alternating steps for Players 1 and 2, where the stepsize can either be set to \(\frac{2}{t+3}\), or chosen aggressively via heuristics. The most powerful stepsize heuristic is checking whether the excessive gap condition \(\bar{\phi }_{\mu _2}(x^t) \le \bar{\phi }_{\mu _1}(y^t)\) is maintained after every iteration, and then decreasing \(\tau \) and redoing the most recent step when that condition fails. Given an implementation of smoothed best response, the excessive gap value can be computed as the sum of the smoothed best response values (this works because smoothed best response was implemented to give the value for each player when they are trying to minimize their opponent’s utility). The Step algorithm with these primitives is shown below (assuming that A is the sequence-form payoff matrix for Player 2 and \(\beta _1,\beta _2\) are the information-set weights in the dilated entropy for the players):

figure f

Finally the EGT algorithm is straightforward as it just iterates calls to StepEFG. The initial points can be computed via SmoothedBR and ProxCenterGradient just as in StepEFG.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kroer, C., Waugh, K., Kılınç-Karzan, F. et al. Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Program. 179, 385–417 (2020). https://doi.org/10.1007/s10107-018-1336-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1336-7

Keywords

Mathematics Subject Classification

Navigation