In this section we address two extensions to our previous results:
-
(Section 6.1) We extend the algorithm UCN to large games with a more general largeness parameter \(\gamma = \frac {c}{n} \in [0, 1]\), where c is a constant.
-
(Section 6.2) We consider large games with k actions and largeness parameter \(\frac {c}{n}\) (previously we focused on k = 2). Our algorithm used a new uncoupled approach that is substantially different from the previous ones we have presented.
Continuous Dynamics for Binary-action Games with Arbitrary γ
We recall that for large games, the largeness parameter γ denotes the extent to which players can affect each others’ utilities. Instead of assuming that \(\gamma = \frac {1}{n}\) we now let \(\gamma = \frac {c}{n} \in [0, 1]\) for some constant c. We show that we can extend UCN and still ensure a better than \(\frac {1}{2}\)-equilibrium. We recall that for the original UCN, players converge to a linear subspace of strategy/payoff states and achieve a bounded regret. For arbitrary \(\gamma = \frac {c}{n}\), we can extend this subspace of strategy/payoff states as follows:
$$\mathcal{P}_{\gamma} = \left\{ (p^{*},D) \ | \ p^{*} = \min \left( \frac{1}{2} + \frac{D}{2c}, 1 \right) \right\} $$
where D and p∗ represent respectively a player’s discrepancy and probability allocated to the best response. For c = 1 we recover the subspace \(\mathcal {P}\) as in UCN. Furthermore, if \(|\dot {{p}^{*}}| \leq 1\) for each player, then \(| \dot {D} | \leq 2c\), which means that we can implement an update as follows:
$$\dot{{p}^{*}} = \frac{\dot{D}}{2c} $$
This leads us to the following natural extension to Theorem 3:
Theorem 8
Under the initial conditions
\(p_{i}(0) = \frac {1}{2}\)
for all
i, the following continuous dynamic,
UCN
-
γ
, has all players
reach
\(\mathcal {P}_{\gamma }\)
in at most
\(\frac {1}{2}\)
time units. Furthermore,
upon reaching
\(\mathcal {P}_{\gamma }\)
a player never
leaves.
$$\dot{{p}^{*}_{i}}(t) = f(D_{i}(t), \dot{D}_{i}(t)) = \left\{\begin{array}{cl} 1 & \text{ if } s_{i} \notin \mathcal{P}_{\gamma} \\ 0 & \text{ if } s_{i} \in \mathcal{P}_{\gamma} \text{ and } p^{*}_{i} > \frac{1}{2} + \frac{D_{i}}{2c} \\ \frac{\dot{D_{i}}}{2c} & \text{ otherwise } \end{array}\right. $$
Notice that unlike UCN, this dynamic is no longer necessarily a continuously differentiable function with respect to time when c > 1. However, it is still continuous.
Once again, we note that for all strategy/payoff states, regret can be expressed as
from which we can prove the following:
Theorem 9
Suppose that\(\gamma = \frac {c}{n}\)and that a player’s strategy/payoff state lies on\(\mathcal {P}_{\gamma }\),then her regret is at most\(\frac {c}{8}\)forc ≤ 2and her regret is at most\(\frac {1}{2} - \frac {1}{2c}\)forc > 2.Furthermore, the equilibria obtained are alsoc-WSNE.
Proof
If c ≤ 2, then regret is maximised when \(D = \frac {c}{2}\) and consequently when \(p^{*} = \frac {3}{4}\). This results in a regret of \(\frac {c}{8}\). On the other hand, if c > 2, then regret is maximised when D = 1 and consequently \(p^{*} = \frac {1}{2} + \frac {1}{2c}\). This results in a regret of \(\frac {1}{2} - \frac {1}{2c}\).
As for the second part of the theorem, from the definition of \(\mathcal {P}_{\gamma }\) and from the definition of ε-WSNE in Section 2 it is straightforward to see that when D ≥ c, p∗ = 1 which means that no weight is put on the strategy whose utility is at most c from that of the best response. □
Thus we obtain a regret that is better than simply randomising between both strategies, although as should be expected, the advantage goes to zero as the largeness parameter increases.
Discretisation and Query Complexity
In the same way as UN-(α, η), where we discretised UN, Theorem 7 can be discretised to yield the following result.
Theorem 10
For a given accuracy parameterαand correctness probabilityη,we can implement a query-based discretisation ofUCN-γthat withprobability 1 − ηcorrectlycomputes anε-approximateNash equilibrium for
$$\varepsilon = \left\{\begin{array}{cl} \frac{c}{8} + \alpha & \text{ if } c \leq 2 \\ \frac{1}{2} - \frac{1}{2c} + \alpha \hfill & \text{ if } c > 2 \end{array}\right. $$
Furthermore the discretisation uses\(O \left (\frac {1}{\alpha ^{4}} \left (\frac {n}{\alpha \eta } \right ) \right )\)queries.
Equilibrium Computation for k-action Games
When the number of pure strategies per player is k > 2, the initial “strawman” idea corresponding to Observation 2 is to have all n players randomise uniformly over their k strategies. Notice that the resulting regret may in general be as high as \(1-\frac {1}{k}\). In this section we give a new uncoupled-dynamics approach for computing approximate equilibria in k-action games where (for largeness parameter \(\gamma =\frac {1}{n}\)) the worst-case regret approaches \(\frac {3}{4}\) as k increases, hence improving over uniform randomisation over all strategies. Recall that in general we are considering \(\gamma = \frac {c}{n}\) for fixed c ∈ [0, n]. The following is just a simple extension of the payoff oracle \(\mathcal {Q}_{\beta ,\delta }\) to the setting with k actions: for any input mixed strategy profile p, the oracle will with probability at least 1 − δ, output payoff estimates for p with error at most β for all n players.
Estimating Payoffs for Mixed Profiles in k-action Games
Given a payoff oracle \(\mathcal {Q}\) and any target accuracy parameter β and confidence parameter δ, consider the following procedure to implement an oracle \(\mathcal {Q}_{\beta , \delta }\):
-
For any input mixed strategy profile p, compute a new mixed strategy profile \(p^{\prime } = (1 - \frac {\beta }{2})p + (\frac {\beta }{2k})\mathbf {1}\) such that each player i is playing uniform distribution with probability \(\frac {\beta }{2}\) and playing distribution pi with probability \(1 - \frac {\beta }{2}\).
-
Let \(m = \frac {64k^{2}}{\beta ^{3}} \log \left (8n /\delta \right )\), and sample m payoff queries randomly from p′, and call the oracle \(\mathcal {Q}\) with each query as input to obtain a payoff vector.
-
Let \(\widehat u_{i,j}\) be the average sampled payoff to player i for playing action j.Footnote 3 Output the payoff vector \((\widehat {u}_{ij})_{i\in [n], j\in \{0, 1\}}\).
As in previous sections, we begin by assuming that our algorithm has access to \(\mathcal {Q}_{M}\), the more powerful query oracle that returns exact expected payoffs with regards to mixed strategies. We will eventually show in Section 6.2.1 that this does not result in a loss of generality, as when utilising \(\mathcal {Q}_{\beta , \delta }\) we incur a bounded additive loss with regards to the approximate equilibria we obtain.
The general idea of Algorithm 2 is as follows. For a parameter \(N\in \mathbb {N}\), every player uses a mixed strategy consisting of a discretised distribution in which a player’s probability is divided into N quanta of probability \(\frac {1}{N}\), each of which is allocated to a single pure strategy. We refer to these quanta as “blocks” and label them B1, …, BN. Initially, blocks may be allocated arbitrarily to pure strategies. Then in time step t, for t = 1, …, N, block t is reallocated to the player’s best response to the other players’ current mixed strategies.
The general idea of the analysis of Algorithm 2 is the following. In each time step, a player’s utilities change by at most nγ/N = c/N. Hence, at the completion of Algorithm 2, block N is allocated to a nearly-optimal strategy, and generally, block N − r is allocated to a strategy whose closeness to optimality goes down as r increases, but enables us to derive the improved overall performance of each player’s mixed strategy.
Theorem 11
BU
returns a mixed strategy profile
\((\vec {p}_{i})_{i \in [n]}\)
that is
an
ε
-NE
when:
$$\varepsilon = \left\{\begin{array}{cl} c \left( 1 + \frac{1}{N} \right) & \text{ if } c \leq \frac{1}{2} \\ 1 - \frac{1}{4c} + \frac{1}{2N} & \text{ if } c > \frac{1}{2} \end{array}\right. $$
Notice for example that for \(\gamma =\frac {1}{n}\) (i.e. putting c = 1), each player’s regret is at most \(\frac {3}{4}+\frac {1}{2N}\), so we can make this arbitrarily close to \(\frac {3}{4}\) since N is a parameter of the algorithm.
Proof
For an arbitrary player i ∈ [n], in each step t = 1, ..., N, probability block Bt is re-assigned to i’s current best response.
Since every player is doing the same transfer of probability, by the largeness condition of the game, one can see that every block’s assigned strategy incurs a regret that increases by at most \(\frac {2c}{N}\) at every time step. This means that at the end of N rounds, the j-th block will at worst be assigned to a strategy that has \(\frac {\min \{1, (2c)j \}}{N}\) regret. This means we can bound a player’s total regret as follows:
$$R \leq \sum\limits_{i = 1}^{N} \frac{\min \{1, (2c)i \}}{N} \cdot \frac{1}{N} $$
There are two important cases for this sum: when 2c ≤ 1 and when 2c > 1. In the first case:
$$R \leq \sum\limits_{i = 1}^{N} \frac{2c i}{N^{2}} = n\gamma \left( 1 + \frac{1}{N} \right) $$
And in the second:
$$R \leq \left( \sum\limits_{i = 1}^{N/2c} \frac{2c i}{N^{2}} \right) + \left( N - \frac{N}{2c} \right) \cdot \frac{1}{N} = 1 - \frac{1}{4c} + \frac{1}{2N} $$
□
In fact, we can slightly improve the bounds in Theorem 9 via introducing a dependence on k. In order to do so, we need to introduce some definitions first.
Definition 8
We denote \(\mathcal {A}^{b,h}\) as the truncated triangle in the cartesian plane under the line y = hx for x ∈ [0, b] and height capped at y = 1. Note that if bh ≤ 1 the truncated triangle is the entire triangle, unlike the case where bh > 1. See Fig. 2 for a visualisation.
Definition 9
For a given truncated triangle \(\mathcal {A}^{b,h}\) and a partition of the base, \(\mathcal {P} = \{x_{1}, ...,x_{r}\}\) where 0 ≤ x1 ≤… ≤ xr ≤ b, we denote the left sum of \(\mathcal {A}^{b,h}\) under \(\mathcal {P}\) by \(LS(\mathcal {A}^{b,h}, \mathcal {P})\) (for reference see Fig. 3) and define it as follows:
$$LS(\mathcal{A}^{b,h}, \mathcal{P}) = \sum\limits_{i = 1}^{|\mathcal{P}|} (hx_{i})(x_{i + 1} - x_{i}) $$
With these definitions in hand, we can set up a correspondence between the worst case regret of BU and left sums of \(\mathcal {A}^{(1+\frac {1}{N}), 2c}\). Suppose in the process of BU a player has blocks B1, ..., BN in the queue. Furthermore, without loss of generality, suppose that her k strategies are sorted in ascending order of utility so that u1, ..., uk where uj is the expected utility of the j-th strategy at the end of the process. Furthermore, let Rj = u1 − uj (i.e. the regret of strategy j), so that we also have 0 = R1 ≤ R2 ≤ ... ≤ Rk ≤ 1. If N is much larger than k, then by the pigeon-hole principle, many blocks will be assigned to the same strategy, and hence will incur the same regret. However, as in the analysis of the previous bounds, each block has restrictions as to how much regret their assigned strategy can incur due to the largeness condition of the game. In particular, the assigned strategy of block Bb can only be assigned to a strategy j such that \(R_{j} \leq \min \{1, (2c)\} \cdot \left (\frac {b}{N} \right )\). For such an assignment, since the block has probability mass \(\frac {1}{N}\), it contributes a value of \(R_{j} \cdot \left (\frac {j}{N} \right ) \left (\frac {1}{N} \right )\) to the overall regret of a player. Hence for fixed regret values (R1, .., Rk), we can pick a valid assignment of these values to blocks and get an expression for total regret that can be visualised geometrically in Fig. 4.
The next important question is what valid assignment of blocks to regret values results in the maximal amount of total regret for a player. In Fig. 4, Block 1 is assigned to strategy 1, Blocks 2,3, and 7 are assigned to strategy 2, blocks 4 and 5 are assigned to strategy 3, block 5 is assigned to strategy 4 and finally blocks 8 and 9 are assigned to strategy 5.
One can see that this does not result in maximal regret. Rather it is simple to see that a greedy allotment of blocks to regret values results in maximal total regret. Such a greedy allotment can be described as follows: assign as many possible (their regret constraints permitting) blocks at the end of the queue to Rk, then repeat this process one-by-one for Ri earlier in the queue. This is visualised in Fig. 5, and naturally leads to the following result:
Theorem 12
For any fixedR1, ..., Rk,the worst case assignment of probability blocksBbto strategies corresponds to a left sum of\(\mathcal {A}^{(1+\frac {1}{N}), 2c}\)for some partition of\([0, 1+\frac {1}{N}]\)with cardinality at mostk − 1.
This previous theorem reduces the problem of computing worst case regret to that of computing maximal left sums under arbitrary partitions. To that end, we define the precise worst-case partition value we will be interested in.
Definition 10
For a given \(\mathcal {A}^{b,h}\), let us denote the maximal left sum under partitions of cardinality k by \(\mathcal {A}^{b,h}_{k}\). Mathematically, the value is defined as follows:
$$\mathcal{A}^{b,h}_{k} = \sup\limits_{|\mathcal{P}| = k} LS(\mathcal{A}^{b,h}, \mathcal{P}) $$
We can explicity compute these values which in turn will bound a player’s maximal regret.
Lemma 8
\(\mathcal {A}^{1,1}_{k} = \left (\frac {1}{2} \right ) \left (\frac {k}{k + 1} \right )\)
which is obtained on the partition
\(\mathcal {P} = \{\frac {1}{k + 1}, \frac {2}{k + 1}, ...,\frac {k}{k + 1}\}\)
Proof
This result follows from induction and self-similarity of the original triangle. For k = 1, our partitions consist of a single point x ∈ [0, 1] hence the area under the triangle will be \(\mathcal {A}^{1,1}_{1}(x) = (1-x)x\) which as a quadratic function of x has a maximum at \(x = \frac {1}{2}\). At this point we get \(\mathcal {A}^{1,1}_{1}(x) = \frac {1}{2} \cdot \frac {1}{2}\) as desired.
Now let us assume that the lemma holds for k = n, we wish to show that it holds for k = n + 1. Any k = n + 1 element partition must have a left-most element, x1. We let \(\mathcal {A}^{\prime }(x)\) be the maximal truncated area for an n + 1 element partition, given that x1 = x. By fixing x we add an area of x(1 − x) under the triangle and we are left with n points to partition [x,1]. We notice however that we are thus maximising truncated area under a similar triangle to the original that has been scaled by a factor of (1 − x). We can therefore use the inductive assumption and get the following expression:
$$\mathcal{A}^{\prime}(x) = (1-x)x + (1-x)^{2} \mathcal{A}^{1,1}_{n} = (1-x)x + \frac{1}{2} (1-x)^{2} \left( \frac{n}{n + 1} \right) $$
It is straightforward to see that \(\mathcal {A}^{\prime }(x)\) is maximised when \(x = \frac {1}{k + 2}\). Consequently the maximal truncated area arises from the partition where \(x_{i} = \frac {i}{n + 2}\) which in turn proves our claim. □
Via linear scaling, one can extend the above result to arbitrary base and height values b, h.
Corollary 2
Forbh ≤ 1,\(\mathcal {A}^{b,h}_{k} = \left (\frac {bh}{2} \right ) \left (\frac {k}{k + 1} \right )\)which is obtained on the partition\(\mathcal {P} = \{\frac {b}{k + 1}, \frac {2b}{k + 1}, ...,\frac {kb}{k + 1}\}\)
Corollary 3
Forbh > 1,we obtain the following expressions for\(\mathcal {A}^{b,h}_{k}\):
$$\mathcal{A}^{b,h}_{k} = \left\{\begin{array}{cl} \left( \frac{bh}{2} \right) \left( \frac{k}{k + 1} \right) & \text{ if } \frac{k}{k + 1} \leq \frac{b}{h} \\ b(1 - \frac{1}{h} - \frac{1}{2hk}) & \text{ otherwise } \end{array}\right. $$
Proof
For the first case (when \(\frac {k}{k + 1} \leq \frac {b}{h}\)), let us consider \(\mathcal {B}^{b,h}\) to be the the triangle with base b and height h that unlike \(\mathcal {A}^{b,h}\) is not truncated at unit height. From scaling our previous result from Corollary 2, the largest k-element left sum for \(\mathcal {B}^{b,h}\) occurs for the partition \(\mathcal {P} = \{\frac {b}{k + 1}, \frac {2b}{k + 1}, ...,\frac {bk}{k + 1} \}\). However, from the fact that \(\mathcal {A}^{b,h} \subset \mathcal {B}^{b,h}\), at precisely these values the left sums of \(\mathcal {P}\) for both geometric figures coincide. It follows that this partition also gives a maximal k-element partition for left sums of \(\mathcal {A}^{b,h}\) and thus the claim holds.
On the other hand, let us know consider the case where \(\frac {k}{k + 1} > \frac {b}{h}\). In a similar spirit to previous proofs, let us define \(\mathcal {A}(x): [0,b] \rightarrow \mathbb {R}\) to be the maximal left-sum under \(\mathcal {A}^{b,h}\) for a given partition \(\mathcal {P}\) whose right-most element is x. From Figs. 4 and 5, it should be clear that we should only consider \(x \in [0,\frac {b}{h}]\), because if ever we have a \(x \geq \frac {b}{h}\), that would correspond to some block being assigned a regret value of Rj = 1 for some strategy j. However with the existence of such a maximal regret strategy, the greedy allotment of blocks to strategies would assign the most blocks possible to strategy j (or some other maximal regret strategy), which would correspond again to the final element in our partition being \(\frac {b}{h}\).
Now that we have restricted our focus to \(x \in [0,\frac {b}{h}]\), we wish to consider the triangle \(\mathcal {B}^{\ell ,\frac {k + 1}{k}}\) of base length \(\ell = \frac {(k + 1)b}{kh}\), and height \(\frac {k + 1}{k}\) which is not truncated at height 1. Let us define \(\mathcal {B}(x)\) to be a similar function that computes the maximal k-element left sum under \(\mathcal {B}^{\ell ,\frac {k + 1}{k}}\) given that the right-most partition element is x ∈ [0, cbh]. Geometrically, one can see that we get the following identity:
$$\mathcal{A}(x) = \mathcal{B}(x) + \frac{hx}{b} \left( b - \frac{b}{h} \right) $$
However, from Corollary 2, the optimal k-element partition on \(\mathcal {B}^{\ell ,\frac {k + 1}{k}}\) has a right-most element of \(\frac {\ell k}{k + 1} = \frac {b}{h}\), it follows that \(\mathcal {B}(x)\) is maximised at \(x = \frac {b}{h}\). Furthermore, the second part of the above sum is also maximised at this value, therefore \(\mathcal {A}(x)\) is maximised at \(\frac {b}{h}\). Concretely, this means that the maximal k-element partition for \(\mathcal {A}^{b,h}\) is \(\mathcal {P} = \{\frac {b}{hk}, \frac {2b}{hk}, ...,\frac {(k-1)b}{hk}, \frac {b}{h}\}\). This partition results in a maximal left sum of \(\mathcal {A}^{\frac {b}{h},1}_{k-1} + \left (b - \frac {b}{h} \right )\) which after simplification gives us the value \(b(1 - \frac {1}{h} - \frac {1}{2hk})\) as desired. □
Finally, we can combine everything above to obtain:
Theorem 13
With access to a query oracle that computes exact
expected utilities for mixed strategy profiles,
BU
returns an
ε
-approximate
Nash equilibrium for
$$\varepsilon = \left\{\begin{array}{cl} c \left( \frac{k-1}{k} \right) \left( 1 + \frac{1}{N} \right) & \text{ if } c \leq \frac{1}{2} \\ c \left( \frac{k-1}{k} \right)\left( 1 + \frac{1}{N} \right) & \text{ if } c > \frac{1}{2} \text{ and } \frac{k-1}{k} \leq \frac{1}{2c} \\ \left( 1 - \frac{1}{4c} - \frac{1}{4c (k-1)} \right) \left( 1 + \frac{1}{N} \right) & \text{ if } c > \frac{1}{2} \text{ and } \frac{k-1}{k} > \frac{1}{2c} \end{array}\right. $$
Proof
This just a straightforward application of Theorem 10 and Corollaries 2 and 3. □
Query Complexity of Block Method
In the above analysis we assumed access to a mixed strategy oracle as we computed expected payoffs at each time-step for all players. When using \(\mathcal {Q}_{\beta , \delta }\) however, there is an additive error and a bounded correctness probability to take into account.
In terms of the additive error, if we assume that there is an additive error of β on each of the N queries in BU, then at any time step, the b-th block will be assigned to a strategy that incurs at most \(\left (\frac {\min \{1, (2c)b \}}{N} + \beta \right )\) regret, which can visualised geometrically in Fig. 6, and which leads to the following extension of Theorem 10.
Theorem 14
InBU, if queries incorporate an additive error ofαon expected utilities, for any fixed choice ofR1, ..., Rk,the worst case assignment of probability blocksBbto strategies corresponds to a left sum of\(\mathcal {A}^{(1+\frac {1}{N} + \frac {\beta }{2c}), 2c}\)for some partition of\([0, 1+\frac {1}{N}]\)with cardinality at mostk − 1.
Finally, since our approximate query oracle is correct with a bounded probability, in order to assure that the same additive error of β holds on all N queries of BU, we need to impose a correctness probability of \(\frac {\delta }{N}\) in order to achieve the former with a union bound. This leads to the following query complexity result for BU.
Theorem 15
For anyα, η > 0, if weimplementBUusing\(\mathcal {Q}_{\beta ,\delta }\)withβ = αand\(\delta = \frac {\eta }{N}\), with probability 1 − η, we will obtainanε-approximateNash equilibrium for
$$\varepsilon = \left\{\begin{array}{cl} c \left( \frac{k-1}{k} \right) \left( 1 + \frac{1}{N} + \frac{\alpha}{2c} \right)& \text{ if } c \leq \frac{1}{2} \\ c \left( \frac{k-1}{k} \right)\left( 1 + \frac{1}{N} + \frac{\alpha}{2c}\right)& \text{ if } c > \frac{1}{2} \text{ and } \frac{k-1}{k} \leq \frac{1}{2c} \\ \left( 1 - \frac{1}{4c} - \frac{1}{4c (k-1)} \right) \left( 1 + \frac{1}{N} + \frac{\alpha}{2c} \right)& \text{ if } c > \frac{1}{2} \text{ and } \frac{k-1}{k} > \frac{1}{2c} \end{array}\right. $$
The total numberof queries used is\(\frac {64k^{2}}{\alpha ^{3}} \log \left (\frac {8nN}{\delta } \right )\)
Once again, it is interesting to note that the first regret bounds we derived do not depend on k. It is also important to note the regret has an extra term of the form \(O(\frac {1}{N})\) in the number of probability blocks. Although this can be minimised in the limit, there is a price to be paid in query complexity, as this would involve a larger number of rounds in the computation of approximate equilibria.
Comparison Between Both Methods
We can compare the guarantees from our methods from Sections 6.1 and 6.2 when we let the number of strategies k = 2 and we consider largeness parameters \(\gamma = \frac {c}{n} \in [0, 1]\). Furthermore, we consider how both methods compare when N →∞.
|
c ≤ 1
|
1 ≤ c ≤ 2
|
c ≥ 2
|
---|
UNC
|
\(\frac {c}{8}\)
|
\(\frac {c}{8}\)
|
\(\frac {1}{2} - \frac {1}{2c}\)
|
BU
|
\(\frac {c}{2}\)
|
\(1 - \frac {1}{2c}\)
|
\(1 - \frac {1}{2c}\)
|
One can see that UNC does better by a multiplicative factor of \(\frac {1}{4}\) in the case of small c and better by an additive factor of \(\frac {1}{2}\) for large c.