Faster algorithms for extensive-form game solving via improved smoothing functions

Kroer, Christian; Waugh, Kevin; Kılınç-Karzan, Fatma; Sandholm, Tuomas

doi:10.1007/s10107-018-1336-7

Faster algorithms for extensive-form game solving via improved smoothing functions

Full Length Paper
Series A
Published: 05 October 2018

Volume 179, pages 385–417, (2020)
Cite this article

Mathematical Programming Submit manuscript

Christian Kroer ORCID: orcid.org/0000-0002-9009-8683¹,
Kevin Waugh²,
Fatma Kılınç-Karzan³ &
…
Tuomas Sandholm¹

1043 Accesses
6 Citations
Explore all metrics

Abstract

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate both the theoretical and practical performance improvement of first-order methods (FOMs) for solving extensive-form games through better design of the dilated entropy function—a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has only a logarithmic dependence on the branching factor of the player. This result improves the overall convergence rate of several FOMs working with dilated entropy function by a factor of $\Omega (b^dd)$, where b is the branching factor of the player, and d is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than FOMs despite their theoretically inferior convergence rates. Using our new weighting scheme and a practical parameter tuning procedure we show that, for the first time, the excessive gap technique, a classical FOM, can be made faster than the counterfactual regret minimization algorithm in practice for large games, and that the aggressive stepsize scheme of CFR+ is the only reason that the algorithm is faster in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational Complexity of Computing a Quasi-Proper Equilibrium

On the Computation of Value Correspondences for Dynamic Games

Article 30 January 2015

On the computation of equilibria in monotone and potential stochastic hierarchical games

Article 23 November 2022

Notes

Confirmed through author communication.
This variation uses the current iterate rather than the average iterate due to decreased memory usage. It has inferior practical iteration complexity.

References

Bošanskỳ, B., Čermák, J.: Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Bošanskỳ, B., Kiekintveld, C., Lisý, V., Pěchouček, M.: An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. J. Artif. Intell. Res. 51, 829–866 (2014)
Article MathSciNet Google Scholar
Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)
Article Google Scholar
Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 432–438 (2016)
Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 7–15 (2015)
Brown, N., Kroer, C., Sandholm, T.: Dynamic thresholding and pruning for regret minimization. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 421–429 (2017)
Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)
Article MathSciNet Google Scholar
Daskalakis, C., Deckelbaum, A., Kim, A.: Near-optimal no-regret algorithms for zero-sum games. Games Econ. Behav. 92, 327–348 (2015)
Article MathSciNet Google Scholar
Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM 54(5), 25 (2007)
Article MathSciNet Google Scholar
Gilpin, A., Peña, J., Sandholm, T.: First-order algorithm with $\cal{O}(\rm ln(1/\epsilon ))$ convergence for $\epsilon $-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)
Article MathSciNet Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2001)
Book Google Scholar
Hoda, S., Gilpin, A., Peña, J., Sandholm, T.: Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res. 35(2), 494–512 (2010)
Article MathSciNet Google Scholar
Jiang, A., Leyton-Brown, K.: Polynomial-time computation of exact correlated equilibrium in compact games. In: Proceedings of the ACM Conference on Electronic Commerce (EC), pp. 119–126 (2011)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–148. MIT Press (2012)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 149–183. MIT Press (2012)
Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)
Article MathSciNet Google Scholar
Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14(2), 247–259 (1996)
Article MathSciNet Google Scholar
Kroer, C., Sandholm, T.: Extensive-form game abstraction with bounds. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 621–638. ACM (2014)
Kroer, C., Sandholm, T.: Imperfect-recall abstractions with bounds in games. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 459–476. ACM (2016)
Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster first-order methods for extensive-form game solving. In: Proceedings of the ACM Conference on Economics and Computation (EC), pp. 817–834. ACM (2015)
Kroer, C., Farina, G., Sandholm, T.: Smoothing method for approximate extensive-form perfect equilibrium. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2017)
Kroer, C., Farina, G., Sandholm, T.: Robust Stackelberg equilibria in extensive-form games and extension to limited lookahead. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1078–1086 (2009)
Lanctot, M., Gibson, R., Burch, N., Zinkevich, M., Bowling, M .: No-regret learning in extensive-form games with imperfect recall. In: International Conference on Machine Learning (ICML), pp. 65–72 (2012)
Lipton, R., Markakis, E., Mehta, A.: Playing large games using simple strategies. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 36–41. ACM (2003)
Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. In: Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), San Diego, CA, pp. 48–54 (2003)
Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh K., Johanson, M., Bowling, M.: Deepstack: expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724
Nemirovski, A.: Prox-method with rate of convergence $\cal{O}(1/t )$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet Google Scholar
Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)
Article MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet Google Scholar
Romanovskii, I.: Reduction of a game with complete memory to a matrix game. Sov. Math. 3, 678–681 (1962)
Google Scholar
Sandholm, T.: The state of solving large incomplete-information games, and application to poker. AI Magazine, pp. 13–32, special issue on Algorithmic Game Theory (2010)
Article Google Scholar
Shi, J., Littman, M.: Abstraction methods for game theoretic poker. In: CG ’00: Revised Papers from the Second International Conference on Computers and Games, pp. 333–345. Springer, London (2002)
Chapter Google Scholar
Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 550–558 (2005)
Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas hold’em. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 645–652 (2015)
von Stengel, B.: Efficient computation of behavior strategies. Games Econ. Behav. 14(2), 220–246 (1996)
Article MathSciNet Google Scholar
Waugh, K., Bagnell, D.: A unified view of large-scale zero-sum equilibrium computation. In: Computer Poker and Imperfect Information Workshop at the AAAI Conference on Artificial Intelligence (AAAI) (2015)
Zinkevich, M., Johanson, M., Bowling, M.H., Piccione, C.: Regret minimization in games with incomplete information. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1729–1736 (2007)

Download references

Acknowledgements

The first and last authors are supported by the National Science Foundation under Grants IIS-1617590, IIS-1320620, and IIS-1546752 and the ARO under Awards W911NF-16-1-0061 and W911NF-17-1-0082. The first author is supported by the Facebook Fellowship in Economics and Computation. The third author is supported by the National Science Foundation Grant CMMI 1454548.

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, Pittsburgh, USA
Christian Kroer & Tuomas Sandholm
Department of Computing Science, University of Alberta, Edmonton, Canada
Kevin Waugh
Tepper School of Business, Carnegie Mellon University, Pittsburgh, USA
Fatma Kılınç-Karzan

Authors

Christian Kroer
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Waugh
View author publications
You can also search for this author in PubMed Google Scholar
Fatma Kılınç-Karzan
View author publications
You can also search for this author in PubMed Google Scholar
Tuomas Sandholm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Kroer.

Additional information

A one-page abstract describing a preliminary version of the results in this paper was published at the 18th ACM conference on economics and computation under the title “Theoretical and Practical Advances on Smoothing for Extensive-Form Games”.

A Notation and results described from an EFG perspective

In the body of the paper we described our results using notation oriented toward the convex-optimization perspective on treeplexes: the results are for general treeplexes, and so we view them as a general convex set. In this section we give an overview of our results from a standard EFG-specific perspective, in the hopes that it may be helpful for researchers who are familiar with that literature, but not the first-order methods literature and convex analysis.

First we define notation, then we give a description of how our treeplex results map onto traditional EFG notation, and finally we give a description of the dilated entropy smoothed best response tree traversal and EGT in terms of this notation.

1.1 A.1 Extensive-form games and the sequence-form

Extensive-form games (EFGs) can be thought of as a game tree, where each node in the tree corresponds to some history of actions taken by all players. Each node belongs to some player, and the actions available to the player at a given node are represented by the branches. Uncertainty is modeled by having a special player, Nature, that moves with some predefined fixed probability distribution over actions at each node belonging to Nature. EFGs model imperfect information by having groups of nodes in information sets, which is a group of nodes all belonging to the same player such that the player cannot distinguish among them. Finally we assume perfect recall, which requires that no player ever forgets their past actions (equivalently, for each information set there is only a single possible last action taken by the player to whom the information set belongs).

Definition 4

A two-player extensive-form game with imperfect information and perfect recall$\Gamma $ is a tuple $(H, Z, A, P, \sigma _c, \mathscr {I}, u)$ composed of:

H: a finite set of possible sequences (or histories) of actions, such that the empty sequence $\emptyset \in H$, and every prefix z of h in H is also in H.
$Z\subseteq H$: the set of terminal histories, i.e. those sequences that are not a proper prefix of any sequence.
A: a function mapping $h \in H{\setminus } Z$ to the set of available actions at non-terminal history h.
P: the player function, mapping each non-terminal history $h \in H{\setminus } Z$ to $\{1, 2, c\}$, representing the player who takes action after h. If $P(h)=c$, the player is Nature.
$\sigma _0$: a function assigning to each $h\in H{\setminus } Z$ such that $P(h) = 0$ a probability mass function over A(h).
$\mathscr {I}_i$, for $i\in \{1,1\}$: partition of $\{h\in H: P(h)=i\}$ with the property that $A(h)=A(h')$ for each $h,h'$ in the same set of the partition. For notational convenience, we will write A(I) to mean A(h) for any of the $h\in I$, where $I\in \mathscr {I}_i$. $\mathcal{I}_i$ is the information partition of player i, while the sets in $\mathcal{I}_i$ are called the information sets of player i.
$u_i$: utility function mapping $z\in Z$ to the utility (a real number) gained by player i when the terminal history is reached.

We further assume that all players have perfect recall.

A strategy for a player i is usually represented in behavioral form, which consists of probability distributions over actions at each information set in $\mathscr {I}_i$. In this paper we will focus on an alternative, but strategically equivalent, representation of the set of strategies, called the sequence form [17, 32, 37]. In the sequence form, actions are instead represented by sequences. A sequence $\sigma _i$, is an ordered list of actions taken by player i on the path to some history h. In perfect-recall games, all nodes in an information set $I\in \mathscr {I}_i$ correspond to the same sequence for player i, we let ${\mathrm{seq}}(I)$ denote this sequence. Given a sequence $\sigma _i$ and an action a that Player i can take immediately after $\sigma _i$, we let $\sigma _ia$ denote the resulting new sequence. Instead of directly choosing the probability to put on an action, in the sequence form the probability of playing the entire sequence is chosen, this is called the realization probability and is denoted by $r_(\sigma _i)$. A choice of realization probabilities for every sequence belonging to Player i is called a realization plan and is denoted $r_i:\Sigma _i \rightarrow [0,1]^{|\Sigma _i|}$. This representation relies on perfect recall: for any information set $I\in {\mathscr {I}}_i$ we have that each action $a\in A(I)$ is uniquely represented by a single sequence $\sigma _i = {\mathrm{seq}}(I)a$, since ${\mathrm{seq}}(I)$ corresponds to exactly one sequence. In particular, this gives us a simple way to convert any strategy in sequence form to a behavioral strategy: the probability of playing action $a\in A(I)$ at information set I is simply $\frac{r_i({\mathrm{seq}}(I)a)}{r_i({\mathrm{seq}}(I))}$.

1.2 A.2 Mapping between convex analysis notation and EFG notation

First we describe the treeplex. A treeplex Q is used to model the sequence-form strategy space of each player. Thus in an EFG, the treeplexes would be the sets of realization probabilities $\Sigma _1,\Sigma _2$. In the BSPP (1) the typical representation would be that $\mathcal{X}=\Sigma _1$ and $\mathcal{Y}= \Sigma _2$. For the remainder we will describe the notation in terms of Player 1 and $\Sigma _1$. The set of simplexes in $\Sigma _1$, with indices denoted by $S_{\Sigma _1}$, is the set of information sets $\mathscr {I}_1$ where Player 1 acts. The set $\mathcal{D}_j^i$ of simplexes reached immediately after taking branch i in simplex j is the set of potential information sets where Player 1 may have to act next. Which one is reached of course depends on which actions are taken by Nature and Player 2. Another way to put this is that $\mathcal{D}_j^i$ corresponds to the set of information sets $I\in \mathscr {I}_1$ such that $\sigma _1(I) = \sigma _1$, where $\sigma _1$ is the sequence corresponding to taking the action i at simplex j. The table below gives an overview of treeplex notation and how it corresponds to EFG notation and concepts. Not all our concepts are easily mapped to existing EFG ideas. For example $M_Q$ and $M_{Q,r}$, the maximum $\ell _1$ norm of Q and the r-depth-limited maximum $\ell _1$ norm, are still most easily thought of in terms of norms. It is the maximum number of information sets with nonzero probability of being reached when player 1 has to follow a pure strategy while the other player may follow a mixed strategy. Intuitively the maximum $\ell _1$ norm of $\Sigma _1$ measures the branching factor associated with observable opponent actions and Nature actions that cause Player 1 to reach different information sets, while not measuring branching factor associated with Player 1 choosing actions at an information set (since the $\ell _1$ norm sums to one at such information set).

Treeplex notation	EFG meaning
Q	$\Sigma _1$, the set of realization probabilities
$S_{\Sigma _1}$	$\mathscr {I}_1$, the set of information-set indices into the treeplex $\Sigma _1$
$\mathcal{D}_j^i$	Set of information sets in $\Sigma _1$ such that the sequence corresponding to branch i at simplex j is the parent sequence
$q_{p_j}$	The parent sequence $\sigma _1(I_j)$, where $I_j$ is the information set
	corresponding to simplex j
$d_j$	The length of the longest possible sequence of actions starting at $I_j$
$b_{\Sigma _1}^j$	The length of $\sigma _1(I_j)$

Our dilated entropy construction using the weights described in recurrence (6) can now be described in terms of EFG notation as follows:

$$\begin{aligned} \begin{array}{ll} \alpha _j = 1 + \max _{a\in A_{I_j}}\sum _{k \in \mathcal{D}^a_{I_j}} \frac{\alpha _k\beta _k}{\beta _k - \alpha _k},&{}\quad \forall I_j \in \mathscr {I}_1,\\ \beta _j> \alpha _j,&{}\quad \forall I_j \in \mathscr {I}_1~\text {s.t.}~ length(\sigma _1(I_j)) > 0,\\ \beta _j = \alpha _j,&{}\quad \forall I_j \in \mathscr {I}_1~\text {s.t.}~ length(\sigma _1(I_j)) = 0. \end{array} \end{aligned}$$

(18)

If we then instantiate EGT with this DGF and use Theorem 3 we get the following convergence rate:

$$\begin{aligned} \frac{\max _{\sigma _1 \in \Sigma _1, \sigma _2 \in \Sigma _2}|g_1(\sigma _1,\sigma _2)|\, \sqrt{M_{\Sigma _1}^22^{d_{\Sigma _1}+2}M_{\Sigma _2}^22^{d_{\Sigma _2}+2}}\, \max _{I\in \mathscr {I}}\mathop {\mathrm{log}}|A_I|}{\varepsilon }. \end{aligned}$$

1.3 A.3 EGT described as a tree traversal

Here we explain how to implement the $\mathop {\mathrm{Prox}}$ operation when using the dilated entropy function, as well as how to compute smoothed best responses, i.e. $x_{\mu _1}(y)$ or $y_{\mu _2}(x)$ in Algorithm 2. Throughout we will present algorithms for computing everything from the perspective of a player trying to minimize their opponent’s utility, rather than maximize their own.

First, given $y^t$, the gradient for Player 1 is $Ay^t$ where A is the payoff matrix for Player 2. This can be implemented as follows: create an all-zero vector g of dimension $|\Sigma _1|$. Traverse the game tree, and for each leaf z add $\pi _0(z) y^t[\sigma _2(z)] u_2(z)$ at the entry in g corresponding to $\sigma _1(z)$.

Pseudocode for computing a smoothed best response is given in Algorithm 3. This gives an algorithm for using the dilated entropy function with the negative entropy plus a constant term at each simplex $\Delta _n$: $\sum _{x}x\mathop {\mathrm{log}}(x) + \mathop {\mathrm{log}}(n)$. By adding $\mathop {\mathrm{log}}(n)$ we ensure that the function is never negative; it is zero at $x_i = \frac{1}{n}$ for all i. Since the constant does not change the second-order derivatives of the dilated entropy function we retain the same strong convexity properties.

The smoothed best response implementation given here modifies the gradient g in place. Thus it is important that g is not used to represent the gradient after a call to the function. However, the modified g can be useful because the entry in g corresponding to the empty sequence then contains the value of the smoothed best response function, which is needed for verifying e.g. the excessive gap condition.

The algorithm for smoothed best response calculation takes as input a vector g which is usually gradient at the current iteration. This vector g is of length equal to the number of sequences for the player. We assume that the sequences are ordered so that $g[I_{start},I_{end}]$ denotes the subset of g corresponding to entries for all the sequences that have their last action taken at I. Note that in setting the value of of the indices in x corresponding to I, $x[I_{start},I_{end}]$, we assume that $\exp $ is an index-wise exponential operator and offset is subtracted from each entry.

Given our implementation for smoothed best responses above, the computation of the proximal operator $\mathop {\mathrm{Prox}}_{x}(g)$ can be performed easily: it is simply a smoothed best response where we shift the gradient g by $\nabla \omega (x)$, where x is the point that we use as the prox center. The following algorithm shifts g in place:

Note that this implementation of the prox mapping does not give the actual value of the objective, only the strategy vector that minimizes the objective. For smoothed best response the objective could be read off the entry of the modified g at the empty sequence, but it does not hold the correct value for the prox mapping. Unlike for smoothed best response, none of our EGT variants rely on the prox objective.

Once these primitives have been implemented, the high-level steps of Algorithms 1 and 2 are easy to implement. First $\mu _1$ and $\mu _2$ are set to appropriate initial values (for example via the theory or the $\mu $-fitting approach that we use), and initial sequence-form strategies $x^0,y^0$ are computed for Players 1 and 2 using the above procedures. Then, we just take repeated alternating steps for Players 1 and 2, where the stepsize can either be set to $\frac{2}{t+3}$, or chosen aggressively via heuristics. The most powerful stepsize heuristic is checking whether the excessive gap condition $\bar{\phi }_{\mu _2}(x^t) \le \bar{\phi }_{\mu _1}(y^t)$ is maintained after every iteration, and then decreasing $\tau $ and redoing the most recent step when that condition fails. Given an implementation of smoothed best response, the excessive gap value can be computed as the sum of the smoothed best response values (this works because smoothed best response was implemented to give the value for each player when they are trying to minimize their opponent’s utility). The Step algorithm with these primitives is shown below (assuming that A is the sequence-form payoff matrix for Player 2 and $\beta _1,\beta _2$ are the information-set weights in the dilated entropy for the players):

Finally the EGT algorithm is straightforward as it just iterates calls to StepEFG. The initial points can be computed via SmoothedBR and ProxCenterGradient just as in StepEFG.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kroer, C., Waugh, K., Kılınç-Karzan, F. et al. Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Program. 179, 385–417 (2020). https://doi.org/10.1007/s10107-018-1336-7

Download citation

Received: 14 July 2017
Accepted: 24 September 2018
Published: 05 October 2018
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10107-018-1336-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Treeplex notation	EFG meaning
Q	\(\Sigma _1\), the set of realization probabilities
\(S_{\Sigma _1}\)	\(\mathscr {I}_1\), the set of information-set indices into the treeplex \(\Sigma _1\)
\(\mathcal{D}_j^i\)	Set of information sets in \(\Sigma _1\) such that the sequence corresponding to branch i at simplex j is the parent sequence
\(q_{p_j}\)	The parent sequence \(\sigma _1(I_j)\), where \(I_j\) is the information set
	corresponding to simplex j
\(d_j\)	The length of the longest possible sequence of actions starting at \(I_j\)
\(b_{\Sigma _1}^j\)	The length of \(\sigma _1(I_j)\)

Faster algorithms for extensive-form game solving via improved smoothing functions

Abstract

Access this article

Similar content being viewed by others

Computational Complexity of Computing a Quasi-Proper Equilibrium

On the Computation of Value Correspondences for Dynamic Games

On the computation of equilibria in monotone and potential stochastic hierarchical games

Notes

References

Acknowledgements