Lipschitz Continuity and Approximate Equilibria

In this paper, we study games with continuous action spaces and non-linear payoff functions. Our key insight is that Lipschitz continuity of the payoff function allows us to provide algorithms for finding approximate equilibria in these games. We begin by studying Lipschitz games, which encompass, for example, all concave games with Lipschitz continuous payoff functions. We provide an efficient algorithm for computing approximate equilibria in these games. Then we turn our attention to penalty games, which encompass biased games and games in which players take risk into account. Here we show that if the penalty function is Lipschitz continuous, then we can provide a quasi-polynomial time approximation scheme. Finally, we study distance biased games, where we present simple strongly polynomial time algorithms for finding best responses in L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document} and L22\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2^2$$\end{document} biased games, and then use these algorithms to provide strongly polynomial algorithms that find 2/3 and 5/7 approximate equilibria for these norms, respectively.


INTRODUCTION
The Nash equilibrium [Nash 1951] is the central solution concept that is studied in game theory.However, recent advances have shown that computing an exact Nash equilibrium is PPAD-complete [Chen et al. 2009;Daskalakis et al. 2009], and so there are unlikely to be polynomial time algorithms for this problem.The hardness of computing exact equilibria has lead to the study of approximate equilibria: while an exact equilibrium requires that all players have no incentive to deviate from their current strategy, an ǫ-approximate equilibrium requires only that their incentive to deviate is less than ǫ.
A fruitful line of work has developed studying the best approximations that can be found in polynomial-time for bimatrix games, which are two-player strategic form games. There, after a number of papers [Bosse et al. 2010;Daskalakis et al. 2007Daskalakis et al. , 2009]], the best known algorithm was given by Tsaknakis and Spirakis [2008], who provide a polynomial time algorithm that finds a 0.3393-equilibrium.A prominent open problem is whether there exists a PTAS for this problem.The existence of an FPTAS was ruled out by Chen et al. [2009] unless PPAD = P.While the existence of a PTAS remains open, there is however a quasi-polynomial approximation scheme given by Lipton et al. [2003].
In a strategic form game, the game is specified by giving each player a finite number of strategies, and then specifying a table of payoffs that contains one entry for every possible combination of strategies that the players might pick.The players are allowed to use mixed strategies, and so ultimately the payoff function is a convex combination of the payoffs given in the table.However, some games can only be modelled in a more general setting where the action spaces are continuous, or the payoff functions are non-linear.
For example, Rosen's seminal work [Rosen 1965] considered a more general setting of games, called concave games, where each player picks a vector from a convex set.The payoff to each player is specified by a function that satisfies the following condition: if every other player's strategy is fixed, then the payoff to a player is a convex function over his strategy space.Rosen proved that concave games always have an equilibrium.A natural subclass of concave games, studied by Caragiannis et al. [2014], is the class of biased games.A biased game is defined by a strategic form game, a base strategy and a penalty function.The players play the strategic form game as normal, but they all suffer a penalty for deviating from their base strategy.This penalty can be a non-linear function, such as the L 2 2 norm.In this paper, we study the computation of approximate equilibria in such games.Our main observation is that Lipschitz continuity of the players' payoff functions allows us to provide algorithms that find approximate equilibria.Several papers have studied how the Lipschitz continuity of the players' payoff functions affects the existence, the quality, and the complexity of the equilibria of the underlying game.Azrieli and Shmaya [2013] studied many player games and derived bounds for the Lipschitz constant of the utility functions for the players that guarantees the existence of pure approximate equilibrium for the game.Daskalakis and Papadimitriou [2014] proved that anonymous games posses pure approximate equilibria whose quality depends on the Lipschitz constant of the payoff functions and the number of pure strategies the players have and proved that this approximate equilibrium can be computed in polynomial time.Furthermore, they gave a polynomial-time approximation scheme for anonymous games with many players and constant number of pure strategies.Babichenko [2013] presented a best-reply dynamic for n players Lipschitz anonymous games with two strategies which reaches an approximate pure equilibrium in O(n log n) steps.Recently, Chen et al. [2015] proved that it is PPAD-complete to compute an ǫ-equilibrium in anonymous games with seven pure strategies, when ǫ is exponentially small in the number of the players.Deb and Kalai [2015] studied how some variants of the Lipschitz continuity of the utility functions are sufficient to guarantee hindsight stability of equilibria.

Our contribution.
Lipschitz games.We begin by studying a very general class of games, where each player's strategy space is continuous, and represented by a convex set of vectors, and where the only restriction is that the payoff function is Lipschitz continuous.This class encompasses, for example, every concave game in which the payoffs are Lipschitz continuous.This class is so general that exact equilibria, and even approximate equilibria may not exist.Nevertheless, we give an efficient algorithm that either outputs an ǫ-equilibrium, or determines that game has no exact equilibria.More precisely, for M player games that are λ-continuous in the L p norm, for p ≥ 2, and where γ = max x p over all x in the strategy space, we either compute an ǫ-equilibrium or determine that no exact equilibrium exists in time O M n Mk+l , where k = O λ 2 Mpγ 2 ǫ 2 and l = O λ 2 pγ 2 ǫ 2 .Observe that this is a polynomial time algorithm when λ, p, γ, M , and ǫ are constant.
To prove this result, we utilize a recent result of Barman [2015], which states that for every vector in a convex set, there is another vector that is ǫ close to the original in the L p norm, and is a convex combination of b points on the convex hull, where b depends on p and ǫ, but does not depend on the dimension.Using this result, and the Lipschitz continuity of the payoffs, allows us to reduce the task of finding an ǫequilibrium to checking only a small number of strategy profiles, and thus we get a brute-force algorithm that is reminiscent of the QPTAS given by Lipton et al. [2003] for bimatrix games.
However, life is not so simple for us.Since we study a very general class of games, verifying whether a given strategy profile is an ǫ-equilibrium is a non-trivial task.It requires us to compute a regret for each player, which is the difference between the player's best response payoff and their actual payoff.Computing a best response in a bimatrix game is trivial, but for Lipschitz games, computing a best response may be a hard problem.We get around this problem by instead giving an algorithm to compute approximate best responses.Hence we find approximate regrets, and it turns out that this is sufficient for our algorithm to work.

Penalty games.
We then turn our attention to penalty games.In these games, the players play a strategic form game, and their utility is the payoff achieved in the game minus a penalty.The penalty function can be an arbitrary function that depends on the player's strategy.This is a general class of games that encompasses a number of games that have been studied before.The biased games studied by Caragiannis et al. [2014], are penalty games where the penalty is determined by the amount that a player deviates from a specified base strategy.The biased model was studied in the past by psychologists [Tversky and Kahneman 1974] and it is close to what they call anchoring [Chapman and Johnson 1999;Kahneman 1992].In their seminal paper, Fiat and Papadimitriou [2010] introduced a model for risk prone games.This model resembles penalty games since the risk component can be encoded in the penalty function.Mavronicolas and Monien [2015] followed this line of research and provided results on the complexity of deciding if such games possess an equilibrium.
We again show that Lipschitz continuity helps us to find approximate equilibria.The only assumption that we make is that the penalty function is Lipschitz continuous in an L p norm with p ≥ 2. Again, this is a weak restriction, and it does not guarantee that exact equilibria exist.Even so, we give a quasi-polynomial time algorithm that either finds an ǫ-equilibrium, or verifies that the game has no exact equilibrium.
Our result can be seen as a generalisation of the QPTAS given by Lipton et al. [2003] for bimatrix games.Their approach is to show the existence of an approximate equilibrium with a logarithmic support.They proved this via the probabilistic method: if we know an exact equilibrium of a bimatrix game, then we can take logarithmically many samples from the strategies, and with positive probability playing the sampled strategies uniformly will be an approximate equilibrium.
We take a similar approach, but since our games are more complicated, our proof is necessarily more involved.In particular, for Lipton et al. [2003], proving that the sampled strategies are an approximate equilibrium only requires showing that the expected payoff is close the payoff of a pure best response.In penalty games, best response strategies are not necessarily pure, and so the events that we must consider are more complex.
Distance biased games.Finally, we consider distance biased games, which are a subclass of penalty games that have been studied recently by Caragiannis et al. [2014].They showed that, under very mild assumptions on the bias function, biased games always have an exact equilibrium.Furthermore, for the case where the bias function is either the L 1 norm, or the L 2 2 norm, they give an exponential time algorithm for finding an exact equilibrium.
Our results for penalty games already give a QPTAS for biased games, but we are also interested in whether there are polynomial-time algorithms that can find nontrivial approximations.We give a positive answer to this question for games where the bias is the L 1 norm, the L 2 2 norm, or the L ∞ norm.We follow the well-known approach of Daskalakis et al. [2009], who gave a simple algorithm for finding a 0.5-approximate equilibrium in a bimatrix game.Their approach is as follows: start with an arbitrary strategy x for player 1, compute a best response j for player 2 against x, and then compute a best response i for player 1 against j.Player 1 mixes uniformly between x and i, while player 2 plays j.
We show that this algorithm also works for biased games, although the generalisation is not entirely trivial.Again, this is because best responses cannot be trivially computed in biased games.For the L 1 and L ∞ norms, best responses can be computed via linear programming, and for the L 2 2 norm, best responses can be formulated as a quadratic program, and it turns out that this particular QP can be solved in polynomial time by the ellipsoid method.However, none of these algorithms are strongly polynomial.We show that, for each of the norms, best responses can be found by a simple strongly-polynomial combinatorial algorithm.We then analyse the quality of approximation provided by the technique of Daskalakis et al. [2009].We obtain a strongly polynomial algorithm for finding a 2/3 approximation in L 1 and L ∞ biased games, and a strongly polynomial algorithm for finding a 5/7 approximation in L 2 2 biased games.For the latter result, in the special case where the bias function is the inner product of the player's strategy we find a 13/21 approximation.

PRELIMINARIES
We start by fixing some notation.For each positive integer n we use [n] to denote the set {1, 2, . . ., n}, we use ∆ n to denote the (n − 1)-dimensional simplex, and x p to denote the p-norm of a vector x ∈ R d , i.e.
. Given a set X = {x 1 , x 2 , . . ., x n } ⊂ R d , we use conv(X) to denote the convex hull of X.

Games and strategies.
A game with M -players can be described by a set of available actions for each player and a utility function for each player that depends both on his chosen action and the actions the rest of the players chose.For each player i ∈ [M ] we use S i to denote his set of available actions and we call it strategy space.We will use x i ∈ S i to denote a specific action chosen by player i and we will call it as the strategy of player i.Furthermore, we use x = (x 1 , . . ., x M ) to denote a strategy profile of the game.We use T i (x i , x −i ) to denote the utility of player i when he plays the strategy x i and the rest of the players play according to the strategy profile x −i .A strategy xi is a best response against the strategy profile The regret player i suffers under a strategy profile x is the difference between the utility of his best response and his utility under x, i.e.T i (x i , x −i ) − T i (x i , x −i ).
λ p -Lipschitz Games.We will use the notion of the λ p -Lipschitz continuity.
We call the game L := (M, n, λ, p, γ, T ) λ p -Lipschitz if for each player i ∈ [M ] -the strategy space S i is the convex hull of n vectors y 1 , . . ., y n in R d , -max xi∈Si x i p ≤ γ -the utility function T i (x) ∈ T is λ p -Lipschitz continuous.
Two Player Penalty Games.A two player penalty game P is defined by a tuple R, C, f r (x), f c (y) , where (R, C) is a bimatrix game and f r (x) and f c (y) are the penalty functions for the row and the column player respectively.The utilities for the players under a strategy profile (x, y), denoted by T r (x, y) and T c (x, y), are given by We will use P λ to denote two player penalty games with λ p -Lipschitz penalty functions.
A special class of penalty games is when f r (x) = x T x and f c (y) = y T y.We call these games as inner product penalty games.
Two Player Biased Games.This is a subclass of penalty games, where extra constraints are added to the penalty functions f r (x) and f c (y) of the players.In this class of games there is a base strategy and for each player and the penalty they receive is increasing with the distance between the strategy they choose and their base strategy.Formally, the row player has a base strategy p ∈ ∆ n , the column player has a base strategy q and their strictly increasing penalty functions are defined as f r ( x − p s t ) and f c ( y − q l m ) respectively.Two Player Distance Biased Games.This is a special class of biased games where the penalty function is a fraction of the distance between the base strategy of the player and his chosen strategy.Formally, a two player distance biased game B is defined by a tuple R, C, b r (x, p), b c (y, q), d r , d c , where (R, C) is a bimatrix game, p ∈ ∆ n is a base strategy for the row player, q ∈ ∆ n is a base strategy for the column player, b r (x, p) = x − p s t and b c (y, q) = y − q l m are penalty functions for the row and the column player respectively.The utilities for the players under a strategy profile (x, y), denoted by T r (x, y) and T c (x, y), are given by where d r and d c are non negative constants.
Solution Concepts.The standard solution concept in game theory is the notion of equilibrium.A strategy profile is an equilibrium if no player can increase his utility by unilaterally changing his strategy.A relaxed version of this concept is the approximate equilibrium, or ǫ-equilibrium.Intuitively, a strategy profile is an ǫ-equilibrium if no player can increase his utility more than ǫ by unilaterally changing his strategy.Formally, a strategy profile x is an ǫ-equilibrium in a game L if for every player i ∈ [M ] it holds that In [Chen et al. 2009] it was proven that, unless P = PPAD, there is no FPTAS for computing an ǫ-NE in bimatrix games.The same result holds for the class of penalty games where the penalty functions f for the players depend on n, the size of the underlying bimatrix game, and lim n→∞ f = 0 for every player.Let P ′ to denote this class of games.
THEOREM 2.2.Unless P = PPAD, there is no FPTAS for computing an ǫ-equilibrium in penalty games in P ′ .PROOF.For the sake of contradiction suppose that there is an FPTAS for computing an ǫ-equilibrium for penalty games in P ′ .Then given an n × n bimatrix game (R, C), define the penalty game R, C, f r (x), f c (y) from the family P ′ where lim n→∞ f r (x) = 0 and lim n→∞ f c (y) = 0. Let (x * , y * ) be an ǫ-equilibrium for the penalty game.This means that for all , C).This means that if there is an FPTAS for computing an ǫ-equilibrium in a penalty game in P ′ then there is an FPTAS for computing an ǫ-NE in (R, C) which is a contradiction, unless P = PPAD.

APPROXIMATE EQUILIBRIA IN λP -LIPSCHITZ GAMES
In this section, we give an algorithm for computing approximate equilibria in λ p Lipschitz games.Note that, our definition of a λ p -Lipschitz game does not guarantee that an equilibrium always exists.Our technique can be applied irrespective of whether an exact equilibrium exists.If an exact equilibrium does exist, then our technique will always find an ǫ-equilibrium.If an exact equilibrium does not exist, then our then our algorithm either finds an ǫ-equilibrium or reports that the game does not have an exact equilibrium.
We will utilize the following theorem that was recently proved in Barman [2015].THEOREM 3.1 ([BARMAN 2015]).Given a set of vectors X = {x 1 , x 2 , . . ., x n } ⊂ R d , let conv(X) denote the convex hull of X.Furthermore, let γ := max x∈X x p for some 2 ≤ p < ∞.For every ǫ > 0 and every µ ∈ conv(X), there exists an 4pγ 2 ǫ 2 uniform vector µ If we combine the Theorem 3.1 with the Definition 2.1 we get the following lemma.
, where γ := max x∈X x p .Furthermore, let f (x * ) be the optimum value of f .Then we can compute a k-uniform PROOF.From Theorem 3.1 we know that for the chosen value of k there exists a In order to compute this point we have to exhaustively evaluate the function f in all k-uniform points and choose the point that it maximizes/minimizes its value.Since there are n+k−1 k = O(n k ) possible k-uniform points, the theorem follows.
We now prove our result about Lipschitz games.In what follows we will study a λ p -Lipschitz game L := (M, n, λ, p, γ, T ).Assuming the existence of an exact Nash equilibrium, we establish the existence of a k-uniform approximate equilibrium in the game L, where k depends on M, λ, p and γ.Note that λ depends heavily on p and the utility functions for the players.
Since by the definition of λ p -Lipschitz games the strategy space S i for every player i is the convex hull of n vectors y 1 , . . ., y n in R d , any x i ∈ S i can be written as a convex combination of y j s.Hence, x i = n j=1 α j y j , where α j > 0 for every j ∈ [n] and n j=1 α j = 1.Then, α = (α 1 , . . ., α n ) is a probability distribution over the vectors y 1 , . . ., y n , i.e. vector y j is drawn with probability α j .Thus, we can sample a strategy x i by the probability distribution α.
So, let x * be an equilibrium for L and let x ′ be a sampled uniform strategy profile from x * .For each player i we define the following events for some p > 0. (3) Notice that if all the events π i occur at the same time, then the sampled profile x ′ is an ǫ-equilibrium.We will show that if for a player i the events φ i and j ψ j hold, then the event π i has to be true too.
PROOF.Suppose that both events φ i and j ψ j∈[M] hold.We will show that the event π i must be true too.Let x i be an arbitrary strategy, let x * −i be a strategy profile for the rest of the players, and let x ′ −i be a sampled strategy profile from x * −i .Since we assume that the events ψ j is true for all j we get x ′ Furthermore, since by assumption the utility functions for the players are λ p -Lipschitz continuous we have that This means that ) is an equilibrium of the game.Furthermore, since by assumption the event φ i is true we get that Hence, if we combine the inequalities ( 4) and ( 5) we get that + ǫ for all possible x i .Thus, if the events φ i and ψ j for every j ∈ [M ] hold, then the event π i holds too.
We are ready to prove the main result of the section.THEOREM 3.4.In any game λ p -Lipschitz game L that posses an equilibrium and any ǫ > 0, there is a k-uniform strategy profile, with k = 16M 2 λ 2 pγ 2 ǫ 2 that is an ǫequilibrium.
PROOF.In order to prove the claim, it suffices to show that there is a strategy profile where every player plays a k-uniform strategy, for the chosen value of k, such that the events π i hold for all i ∈ [M ].Since the utility functions in L are λ p -Lipschitz continuous it holds that i∈[n] ψ i ⊆ i∈[n] φ i .Furthermore, combining that with the Lemma 3.3 we get that i∈ [n] From the Theorem 3.1 we get that for each i such that the event ψ i occurs with positive probability.The claim follows.Theorem 3.4 establishes the existence of a k-uniform approximate equilibrium, but this does not immediately give us our approximation algorithm.The obvious approach is to perform a brute force check of all k-uniform strategies, and then output the one the provides the best approximation.There is a problem with this, however, since computing the quality of approximation requires us to compute the regret for each player, which in turn requires us to compute a best response for each player.Computing an exact best response in a Lipschitz game is a hard problem in general, since we make no assumptions about the utility functions of the players.Fortunately, it is sufficient to instead compute an approximate best response for each player, and Lemma 3.2 can be used to do this.The following Lemma is a consequence of Lemma 3.2.LEMMA 3.5.Let x be a strategy profile for a λ p -Lipschitz game L, and let xi be a best response for the player i against the profile x −i .There is a 4λ 2 pγ 2 ǫ 2 -uniform strategy Our goal is to approximate the approximation guarantee for a given strategy profile.More formally, given a strategy profile x that is an ǫ-equilibrium, and a constant δ > 0, we want an algorithm that outputs a number within the range [ǫ − δ, ǫ + δ].Lemma 3.5 allows us to do this.For a given strategy profile x, we first compute δ-approximate best responses for each player, then we can use these to compute δ-approximate regrets for each player.The maximum over the δ-approximate regrets then gives us an approximation ǫ with a tolerance of δ.This is formalised in the following algorithm.

Algorithm 1. Evaluation of approximation guarantee
Input: A strategy profile x for L, and a constant δ > 0.
Utilising the above algorithm, we can now produce an algorithm to find an approximate equilibrium in Lipschitz games.The algorithm checks all k-uniform strategy profiles, using the value of k given by Theorem 3.4, and for each one, computes an approximation of the quality approximation using the algorithm given above.Algorithm 2. 3ǫ-equilibrium for λ p -Lipschitz game L Input: Game L and ǫ > 0. Output: An 3ǫ-equilibrium for L.
If the algorithm returns a strategy profile x, then it must be a 3ǫ equilibrium.This is because we check that an ǫ-approximation of α(x) is less than 2ǫ, and therefore α(x) ≤ 3ǫ.Secondly, we argue that if the game has an exact Nash equilibrium, then this procedure will always output a 3ǫ-approximate equilibrium.From Theorem 3.4 we know that if k > 16λ 2 Mpγ 2 ǫ 2 , then there is a k-uniform strategy profile x that is an ǫ-equilibrium for L. When we apply our approximate regret algorithm to x, to find an ǫ-approximation of α(x), the algorithm will return a number that is less than 2ǫ, hence x will be returned by the algorithm.
To analyse the running time, observe that there are n+k−1 k = O(n k ) possible kuniform strategies for each player, thus O(n Mk ) k-uniform strategy profiles.Furthermore, our regret approximation algorithm runs in time O(M n l ), where l = 4λ 2 pγ 2 ǫ 2 .Hence, we get the next theorem.THEOREM 3.6.Given a λ p -Lipschitz game L that posses an equilibrium and any ǫ > 0, a 3ǫ-equilibrium can be computed in time Notice that in might be computationally hard to decide whether a game posses an equilibrium or not.Nevertheless, our algorithm can be applied in any λ p -Lipschitz game, without being affected by the existence or not of an exact equilibrium.If the game does not posses an exact equilibrium then our algorithm either finds an approximate equilibrium or decides that there is no k-uniform strategy profile that is an ǫ-equilibrium for the game, thus the game does not posses an exact equilibrium.THEOREM 3.7.For any game λ p -Lipschitz game L in time O M n Mk+l , we can either compute a 3ǫ-equilibrium, or decide that L does not posses an exact equilibrium, where

A QUASI-POLYNOMIAL ALGORITHM FOR PENALTY GAMES
In this section we present an algorithm that, for any ǫ > 0, can compute an ǫequilibrium for any penalty game in P λ in quasi-polynomial time.For the algorithm, we take the same approach as we did in the previous section for Lipschitz games: We show that if an exact equilibrium exists, then a k-uniform approximate equilibrium always exists too, and provide a brute-force search algorithm for finding it.Once again, since best response computation may be hard for this class of games, we must provide an approximation algorithm for finding the quality of an approximate equilibrium.The majority of this section is dedicated to proving an appropriate bound for k, to ensure that k-uniform approximate equilibria always exist.
We first focus on penalty games that posses an exact equilibrium.So, let (x * , y * ) be an equilibrium of the game and let (x ′ , y ′ ) be a k-uniform strategy profile sampled from this equilibrium.We define the following four events: The goal is to derive a value for k such that all the four events above are true, or equivalently P r(φ r ∩ π r ∩ φ c ∩ π r ) > 0.
Note that in order to prove that (x ′ , y ′ ) is an ǫ-equilibrium we only have to consider the events π r and π c .Nevertheless, as we show in the Lemma 4.1, the events φ r and φ c are crucial in our analysis.The proof of the main theorem boils down to the the events φ r and φ c .Furthermore, proving that there is a k-uniform profile (x ′ , y ′ ) that fulfills the events φ r and φ c too, proves that the approximate equilibrium we compute approximates the utilities the players receive under an exact equilibrium too.
In what follows we will focus only on the row player, since similar analysis can be applied for the column player too.Firstly we study the event π r and we show how we can relate it with the event φ r .LEMMA 4.1.For all penalty games it holds that P r(π c r ) ≤ n • e − kǫ 2 2 + P r(φ c r ).PROOF.We begin by introducing the following auxiliary events for all i ∈ [n] We prove how the events ψ ri and the event φ r are related with the event π r .Assume that the event φ r and the events ψ ri for all i ∈ [n] are true .Let x be any mixed strategy for the row player.Since by assumption R i y ′ < R i y * + ǫ 2 and since x is a probability distribution, it holds that x T Ry ′ < x T Ry * + ǫ 2 .If we subtract f r (x) from each side we get that This means that T r (x, y ′ ) < T r (x, y * ) + ǫ 2 for all x.But we know that T r (x, y * ) ≤ T r (x * , y * ) for all x ∈ ∆ n , since (x * , y * ) is an equilibrium.Thus, we get that T r (x, y ′ ) < T r (x * , y * )+ ǫ 2 for all possible x.Furthermore, since the event φ r is true too, we get that T r (x, y ′ ) < T r (x ′ , y ′ ) + ǫ.Thus, if the events φ r and ψ ri for all i ∈ [n] are true, then the event π r must be true as well.Formally, φ r i∈[n] ψ ri ⊆ π r .Thus, P r(π c r ) ≤ P r(φ c r ) + i ψ ri .Using the Hoeffding bound, we get that P r(ψ c ri ) ≤ e − kǫ 2 2 for all i ∈ [n].Our claim follows.
With Lemma 4.1 in hand, we can see that in order to compute a value for k it is sufficient to study the event φ r .We introduce the following auxiliary events that we will study seperately: It is easy to see that if both φ rb and φ ru are true, then the event φ r must be true too, formally φ rb ∩ φ ru ⊆ φ r .Using the analysis from [Lipton et al. 2003] we can prove that P r(φ c ru ) ≤ 2e − kǫ 2 8 .Thus, it remains to study the the event φ c rb .LEMMA 4.2.Pr(φ c rb ) k .PROOF.Since we assume that the penalty function f r (x ′ ) is λ p -Lipschitz continuous the event φ rb can be replaced by the event φ rb ′ = x ′ − x * p < ǫ/4λ .It is easy to see that φ rb ⊆ φ rb ′ .Then, using the proof of Theorem 2 from [Barman 2015] we get that k .Thus, using Markov's inequality we get that P r( We are ready to prove our theorem THEOREM 4.3.For any equilibrium (x * , y * ) of a penalty game from the class P λ , any ǫ > 0, and any k ∈ Ω(λ 2 log n) ǫ 2 , there exists a k-uniform strategy profile (x ′ , y ′ ) that: PROOF.Let us define the event GOOD = φ r ∩ φ c ∩ π r ∩ π c .In order to prove our theorem it suffices to prove that P r(GOOD) > 0. Notice that for the events φ c and π c we can use the same analysis as for φ r and π r and get the same bounds.
Thus, using Lemma 4.1 and the analysis for the events φ ru and φ rb we get that P r(GOOD c ) ≤ P r(φ c r ) + P r(π c r ) + P r(φ c c ) + P r(π c c ) < 1 for the chosen value of k.
Thus, P r(GOOD) > 0 and our claim follows.
The Theorem 4.3 establishes the existence of a k-uniform strategy profile (x ′ , y ′ ) that is an ǫ-equilibrium.However, as with the previous section, we must provide an efficient method for approximating the quality of approximation provided by a given strategy profile.To do so, we first give the following lemma, which shows that approximate best responses can be computed in quasi-polynomial time for penalty games.LEMMA 4.4.Let (x, y) be a strategy profile for a penalty game P λ , and let x be a best response against y.There is an l-uniform strategy x ′ , with l = , that is an ǫ-best response against y, i.e.T r (x, y) < T r (x ′ , y) + ǫ.
PROOF.We will prove that |T r (x, y) − T r (x ′ , y)| < ǫ which implies our claim.Let φ 1 = {|x T Ry − x ′T Ry| ≤ ǫ/2} and φ 2 = {|f r (x) − f r (x ′ )| < ǫ/2} Notice that Lemma 4.2 does not use anywhere the fact that x * is an equilibrium strategy, thus it holds even if x * is replaced by x.Thus, P r(φ c 2 ) ≤ 4λ √ p ǫ √ k .Furthermore, using the analysis from [Lipton et al. 2003] again, we can prove that P r(φ c 1 ) ≤ 2e − kǫ 2 4 and using similar arguments as in the proof of Theorem 4.3 it can be easily proved that for the chosen of l it holds that P r(φ c 1 ) + P r(φ c 2 ) < 1, thus the events φ 1 and φ 2 occur with positive probability and our claim follows.
Having given this Lemma, we can reuse Algorithm 1, but with l set equal to , to provide an algorithm that aproximates the quality of approximation of a given strategy profile.Then, we can reuse Algorithm 2 with k = Ω(λ 2 log n) ǫ 2 to provide a quasipolynomial time algorithm that finds approximate equilibia in penalty games.Notice again that our algorithm can be applied in games that it is computationally hard to verify whether an exact equilibrium exists.Our algorithm either will compute an approximate equilibrium or it will fail to find one, thus it will decide that the game does not posses an exact equilibrium.THEOREM 4.5.In any penalty game P λ with constant number of players and any ǫ > 0, in quasi polynomial time we can either compute a 3ǫ-equilibrium, or decide that P λ does not posses an exact equilibrium.

DISTANCE BIASED GAMES
In this section, we focus on three particular classes of distance biased games, and we provide polynomial-time approximation algorithms for these games.We focus on the following three penalty functions: Our approach is to follow the well-known technique of Daskalakis et al. [2009] that finds a 0.5-NE in a bimatrix game.The algorithm that we will use for all three penalty functions is given below.

Algorithm 3. The Base Algorithm
(1) Compute a best response y * against p.
While this is a well-known technique for bimatrix games, note that it cannot immediately be applied to penalty games.This is because the algorithm requires us to compute two best response strategies, and while computing a best-response is trivial in bimatrix games, this is not the case for penalty games.Best responses for L 1 and L ∞ penalties can be computed in polynomial-time via linear programming, and for L 2 2 penalties, the ellipsoid algorithm can be applied.However, these methods do not provide strongly polynomial algorithms.
In this section, for each of the penalties, we develop a simple combinatorial algorithm for computing best response strategies for each of these penalties.Our algorithms are strongly polynomial.Then, we determine the quality of the approximation given by the base algorithm when our best response techniques are used.In what follows we make the common assumption that the payoffs of the underlying bimatrix game (R, C) are in [0, 1].

A 2/3-approximation algorithm for L1-biased games
We start by considering L 1 -biased games.Suppose that we want to compute a bestresponse for the row player against a fixed strategy y of the column player.We will show that best response strategies in L 1 -biased games have a very particular form: if b is the best response strategy in the (unbiased) bimatrix game (R, C), then the bestresponse places all of its probability on b except for a certain set of rows S where it is too costly to shift probability away from p.The rows i ∈ S will be played with p i to avoid taking the penalty for deviating.
The characterisation for whether it is too expensive to shift away from p is given by the following lemma.
LEMMA 5.1.Let j be a pure strategy, let k be a pure strategy with p k > 0, and let x be a strategy with x k = p k .The utility for the row player increases when we shift probability from k to j if and only if R j y − R k y − 2d r > 0.
PROOF.Suppose that we shift δ probability from k to j, where δ ∈ (0, p k ].Then the utility for the row player is equal to T r (x, y)+ δ •(R j y − R k y − 2d r ), where the final term is the penalty for shifting away from k.Thus, the utility for the row player increases under this shift if and only if R j y − R k y − 2d r > 0.
Observe that, if we are able to shift probability away from a strategy k, then we should obviously shift it to a best response strategy for the (unbiased) bimatrix game, since this strategy maximizes the increase in our payoff.Hence, our characterisation of best response strategies is correct.This gives us the following simple algorithm for computing best responses.Algorithm 4. Best Response Algorithm for L 1 penalty (1) Set S = 0.
(2) Compute a best response b against y in the unbiased bimatrix game (R, C).
(3) For each index i = b in the range (5) Return x.
Our characterisation has a number of consequences.Firstly, it can be seen that if d r ≥ 1/2, then there is no profitable shift of probability between any two pure strategies, since 0 ≤ R i y ≤ 1 for all i ∈ [n].Thus, we get the following corollary.COROLLARY 5.2.If d r ≥ 1/2, then p is a dominant strategy.
Moreover, since we can compute a best response in polynomial time we get the next theorem.
THEOREM 5.3.In biased games with L 1 penalty functions and max{d r , d c } ≥ 1/2, an equilibrium can be computed in polynomial time.
Finally, using the characterization of best responses we can see that there is a connection between the equilibria of the distance biased game and the well supported Nash equilibria (WSNE) of the underlying bimatrix game.PROOF.Let (x * , y * ) be an equilibrium for B. From the best response Algorithm for L 1 penalty games we can see that where b is a pure best response against y * .This means that for every i ∈ This is the definition of a 2d-WSNE for the bimatrix game (R, C).

Approximation algorithm.
We now analyse the approximation guarantee provided by the base algorithm for L 1 -biased games.So, let (x * , y * ) be the strategy profile the is returned by the base algorithm.Since we have already shown that exact Nash equilibria can be found in games with either d c ≥ 1/2 or d r ≥ 1/2, we will assume that both d c and d r are less than 1/2, since this is the only interesting case.
We start by considering the regret of the row player.The following lemma will be used in the analysis of all three of our approximation algorithms.
LEMMA 5.5.Under the strategy profile (x * , y * ) the regret for the row player is at most δ.

PROOF. Notice that for all
Hence the payoff for the row player it holds T r (x * , y * ) ≥ δ • T r (p, y * ) + (1 − δ) • T r (x, y * ) and his regret under the strategy profile Next, we consider the regret of the column player.The following lemma will be used for both the L 1 case and the L ∞ case.Observe that in the L 1 case, the precondition of d c • b c (y * , q) ≤ 1 always holds, since we have y * − q 1 ≤ 2, thus d c • b c (y * , q) ≤ 1 since we are only interested in the case where d c ≤ 1/2.LEMMA 5.6.If d c • b c (y * , q) ≤ 1, then under strategy profile (x * , y * ) the column player suffers at most 2 − 2δ regret.

PROOF. The regret of the column player under the strategy profile
To complete the analysis, we must select a value for δ that equalises the two regrets.It can easily be verified that setting δ = 2/3 ensures that δ = 2 − 2δ, and so we have the following theorem.THEOREM 5.7.In biased games with L 1 penalties a 2/3-equilibrium can be computed in polynomial time.

A 5/7-approximation algorithm for L 2 2 -biased games
We now turn our attention to biased games with an L 2 2 penalty.Again, we start by giving a combinatorial algorithm for finding a best response.Throughout this section, we fix y as a column player strategy, and we will show how to compute a best response for the row player.
Best responses in L 2 2 -biased games can be found by solving a quadratic program, and actually this particular quadratic program can be solved via the ellipsoid algorithm [Kozlov et al. 1980].We will give a simple combinatorial algorithm that uses the Karush-Kuhn-Tucker (KKT) conditions, and produces a closed formula for the solution.Hence, we will obtain a strongly polynomial time algorithm for finding best responses.
Our algorithm can be applied on L 2 2 penalty functions and any value d r , but for notation simplicity we describe our method for d r = 1.Furthermore, we define α i := R i y + 2p i and we call α i as the payoff of pure strategy i.Then, the utility for the row player can be written as T r (x, y) Notice that the term p T p is a constant and it does not affect the solution of the best response; so we can exclude it from our computations.Thus, a best response for the row player against strategy y is the solution of the following quadratic program The Lagrangian function for this problem is and the corresponding KKT conditions x i = 1 (7) Constraints ( 6)-( 8) are the stationarity conditions and (9) are the complementarity slackness conditions.We say that strategy x is a feasible response if it satisfies the KKT conditions.The obvious way to compute a best response is by exhaustively checking all 2 n possible combinations for the complementarity conditions and choose the feasible response that maximizes the utility for a player.Next we prove how we can bypass the brute force technique and compute all best responses in polynomial time.
In what follows, without loss of generality, we assume that α 1 ≥ . . .≥ α n .That is, the pure strategies are ordered according to their payoffs.In the next lemma we prove that in every best response, if a player plays pure strategy l with positive probability, then he must play every pure strategy k with k < l with positive probability.LEMMA 5.8.In every best response x * if x * l > 0 then x * k > 0 for all k < l.PROOF.For the sake of contradiction suppose that there is a best response x * and a k < l such that x * l > 0 and Suppose now that we shift some probability, denoted by δ, from pure strategy l to pure strategy k.Then his utility is . Notice that δ > 0 since α k ≥ α l and x * l > 0, thus the row player can increase his utility by assigning positive probability to pure strategy k which contradicts the fact that x * is a best response.Lemma 5.8 implies that there are only n possible supports that a best response can use.Indeed, we can exploit the KKT conditions to derive, for each candidate support, the exact probability that each pure strategy would be played.We derive the probability as a function of α i s and of the support size.Suppose that the KKT conditions produce a feasible response when we set the support to have size k.From condition (6) we get that x i = 1 2 (α i − λ) for all 1 ≤ i ≤ k and zero else.But we know that k j x j = 1.Thus we get that .This means that for all i ∈ [k] we get So, our algorithm does the following.It loops through all n candidate supports for a best response.For each one, it uses Equation (10) to determine the probabilities, and then checks whether these satisfy the KKT conditions, and thus if this is a feasible response.If it is, then it is saved for in a list of feasible responses, otherwise it is discarded.After all n possibilities have been checked, the feasible response with the highest payoff is then returned.(2) Among the feasible responses choose one with the highest utility.

Approximation Algorithm.
We now show that the base algorithm gives a 5/7approximation when applied to L 2 2 -penalty games.For the row player's regret, we can use Lemma 5.5 to show that the regret is bounded by δ.However, for the column player's regret, things are more involved.We will show that the regret of the column player is at most 2.5 − 2.5δ.That analysis depends on the maximum entry of the base strategy q and more specifically on whether max k {q k } ≤ 1/2 or not.LEMMA 5.9.If max k {q k } ≤ 1/2, then the regret the column player suffers under strategy profile (x * , y * ) is at most 2.5 − 2.5δ.

PROOF. Note that when max
2 ≤ 1.5 for all possible y.Then, using the analysis from Lemma 5.6, along with the fact that d c • b c (y * , q) ≤ 2 for L 2 2 penalties, and since by assumption d c = 1, the claim follows.For the case where there is a k such that q k > 1/2 a more involved analysis is needed.The first goal is to prove that under any strategy y * that is a best response against p the pure strategy k is played with positive probability.In order to prove that, first it is proven that there is a feasible response against strategy p where pure strategy k is played with positive probability.In what follows we denote α i := C T i p + 2q i .LEMMA 5.10.Let q k > 1/2 for some k ∈ [n].Then there is a feasible response where pure strategy k is played with positive probability.
PROOF.Note that α k > 1 since by assumption q k > 1/2.Recall from Equation (10) that in a feasible response y it holds that In order to prove the claim it is sufficient to show that y k > 0 when in the KKT conditions is set The claim follows.
Next it is proven that the utility of the column player is increasing when he adds pure strategies i in his support such that α i > 1. .Then the utility of the column player when he plays y k can be written as The goal now is to prove that T c (x, y k+1 ) − T c (x, y k ) > 0. By the previous analysis for T c (x, y k ) and if Notice that α k ≥ 2p k > 1.Thus, the utility of the feasible response that assigns positive probability to pure strategy k is strictly greater than the utility of any feasible responses that does not assign probability to k.Thus strategy k is always played in a best response.Hence, the next lemma follows.
LEMMA 5.12.If there is a k ∈ [n] such that q k > 1/2, then in every best response y * the pure strategy k is played with positive probability.
Using now Lemma 5.12 we can provide a better bound for the regret the column player suffers, since in every best response y * the pure strategy k is played with positive probability.
LEMMA 5.13.Let y * be a best response when there is a pure strategy k with q k > 1/2.Then the regret for the column player under strategy profile (x * , y * ) is bounded by 2 − 2δ.
PROOF.Before we proceed with our analysis we assume without loss of generality that k = 1.Recall from the analysis for the Algorithm 1 that the regret for the column player is We focus now on the term y * T y * −2y * T q.It can be proven 1 that y * T y * −2y * T q ≤ 1−2q k .Thus, from (11) we get that R c (x * , y * ) ≤ 2 − 2δ.
Recall now that the regret for the row player is bounded by δ, so if we optimize with respect to δ the regrets are equal for δ = 2/3.Thus, the next theorem follows, since when the there is a k with q k > 1/2 the Algorithm 1 produces a 2/3-equilibrium.Hence, combining this with Lemma 5.9 the Theorem 5.14 follows for δ = 5/7.THEOREM 5.14.In biased games with L 2 2 penalties a 5/7-equilibrium can be computed in polynomial time.

Inner product penalty games
We observe that we can also tackle the case where the penalty function is the inner product of the strategy played, i.e. p = q = 0.For these games, that we call inner product penalty games, we replace p as the starting point of the base algorithm with the fully mixed strategy x n .Hence, for that case x * = δ • x n + (1 − δ) • x for some δ ∈ [0, 1].In Appendix ?? we prove the next theorem.Again, the regret the row player suffers under strategy profile (x * , y * ) is bounded by δ.
LEMMA 5.15.When the penalty function is the inner product of the strategy played, then the regret for the row player under strategy profile (x * , y * ) is bounded by δ.
Furthermore, using similar analysis as in Lemma 5.6 it can be proven that the regret for the column player under strategy profile (x * , y * ) is bounded by (1−δ)(1+d c •y * T y * ).For the column player we will distinguish between the cases where d c ≤ 1/2 and d c > 1/2.For the first case where d c ≤ 1/2 it is easy see that the algorithm produces a 0.6equilibrium.For the other case, when d c > 1/2, first it is proven that there is no pure best response.
LEMMA 5.16.If the penalty for the column player is equal to y T y and d c > 1 2 , then there is no pure best response against any strategy of the row player.
PROOF.Let C j to denote the payoff of the column player from his j-th pure strategy against some strategy x played by the row player.For the sake of contradiction, assume that there is a pure best response for the column player where, without loss of generality, he plays only his first pure strategy.Suppose now that he shifts some probability to his second strategy, that is he plays the first pure strategy with probability x and the second pure strategy with probability 1 − x.The utility for the column player under this mixed strategy is . Notice that x > 0, which means that the column player can deviate from the pure strategy and increase his utility.The claim follows.
With Lemma 5.16 in hand, it can be proven that when d c > 1/2 the column player does not play any pure strategy with probability greater than 3/4.LEMMA 5.17.If d c > 1/2, then in y * no pure strategy is played with probability greater than 3/4.PROOF.For the sake of contradiction suppose that there is a pure strategy i in y * that is played with probability greater than 3/4.Furthermore, let k be the support size of y * .From Lemma 5.16, since d c > 1/2, we know that there is no pure best response, thus k ≥ 2. Then using Equation (10) we get that 3 . If we solve for α j we get that α i > 3k−4 2k−2 > 1 which is a contradiction since when q = 0 it holds that A direct corollary from Lemma 5.17 is that y * T y * ≤ 5/8.Hence, we can prove the following lemma.
PROOF.Firstly, note that T c (x * , y * ) = δx n T Cy * + (1 − δ)x T Cy * − y * T y * .Moreover, max ỹ∈∆ {x n T C ỹ − ỹT ỹ} − T c (x n , y * ) = 0, since y * is a best response against x n .Finally, notice that 0 ≤ y T y ≤ 1 for all y.Thus, the regret for the column player is which matches the claimed result.
If we combine Lemmas 5.15 and 5.18 and solve for δ we can see that the regrets are equal for δ = 13 21 .Thus, we get the following theorem for biased games where q = 0.

A 2/3-approximation for L∞-biased games
Finally, we turn our attention to the L ∞ penalty.We start by giving a combinatorial algorithm for finding best responses.Similar to the best response Algorithm for the L 1 penalty, the intuition is to start from the base strategy p of the row player and shift probability from pure strategies with low payoff to pure strategies with higher payoff.This time though, the shifted probability will be distributed between the pure strategies with higher payoff.
Without loss of generality assume that R 1 y ≥ . . .≥ R n y, ie., that the strategies are ordered according to their payoff in the unbiased bimatrix game.The set of pure strategies of the row player can be partitioned into three disjoint sets according to the payoff they yield: Next we giver an algorithm that computes a best response for L ∞ penalty.Algorithm 6. Best Response Algorithm for L ∞ penalty (1) For all i ∈ L, set x i = 0.
(2) If P ≤ |H| • p max , then set x i = p i + P |H| for all i ∈ H and x j = p j for j ∈ M.
Let p max := max i∈L p i and let P := i∈L p i .Then for every best response the following lemma holds.
LEMMA 5.20.If L = ∅, then for any best response x of the row player against strategy y it holds that x − p ∞ ≥ p max .Else p is the best response.
PROOF.Using similar arguments as in Lemma 5.1, it can be proven that if there are no pure strategies i and k such that R k y − R i y − d r < 0 then any shifting of probability decreases the utility of the row player.Thus, the best response of the player is p.On the other hand, if there are strategies i and k such that R k y − R i y − d r > 0, then the utility of the row player increase if all the probability from strategy i is shifted to pure strategy k.The set L contains all these pure strategies.Let j ∈ L be the pure strategy that defines p max .Then, all the p max probability can be shifted from j to the a pure strategy in H, i.e. a pure strategy that yields the highest payoff, and strictly increase the utility of the player.Thus, the strategy j is played with zero probability and the claim follows.
In what follows assume that L = ∅, hence p max > 0. From Lemma 5.20 follows that there is a best response where the strategy with the highest payoff is played with probability p 1 + p max .Hence, it can be shifted up to p max probability from pure strategies with lower payoff to each pure strategy with higher payoff, starting from the second pure strategy etc.After this shift of probabilities there will be a set of pure strategies that where each one is played with probability p i + p max and possibly one pure strategy j that is played with probability less or equal to p j .The question is whether more probability should be shifted from the low payoff strategies to strategies that yield higher payoff.The next lemma establishes that no pure strategy form L is played with positive probability in any best response against y.
LEMMA 5.21.In every best response against strategy y all pure strategies i ∈ L are played with zero probability.
PROOF.Let K denote denote the set of pure strategies that are played with positive probability after the first shifting of probabilities.Without loss of generality assume that each strategy i ∈ K is played with probability p i + p max .Then the utility of the player under this strategy is equal to U = i∈K (p i + p max ) • R i y − d r • p max .For the sake of contradiction, assume that there is one strategy j from L that belongs to K. Suppose that probability δ is shifted from the strategy j to the first pure strategy.Then the utility for the player is equal to U + δ(R 1 y − R j y − d r ) > U , since by definition of L R 1 y − R j y − d r > 0. Thus, the utility of the player is increasing if probability is shifted.Notice that the analysis holds even if the penalty is p max + δ instead of p max , thus the claim follows.Thus, all the probability P from strategies from L should be shifted to strategies yield higher payoff.The question now is what is the optimal way to distribute that probability over the strategies with the higher payoff.Clearly, the same amount of probability should be shifted in all strategies in H since it makes the penalty smaller.Furthermore, it is easy to see that the maximum amount of probability is shifted to strategies in H. Next we prove that if P ≥ p max • (|H| + |M|) then P is uniformly distributed over the pure strategies in H ∪ M. be the utility when the probability S is distributed uniformly over all pure strategies in H ∪ M. Furthermore, let U ′ be the utility when δ > 0 probability is shifted from a pure strategy j to the first pure strategy that yields the highest payoff.Then U ′ = U + δ(R 1 y − R j y − d r ), but R 1 y − R j y − d r ≤ 0 since j ∈ H ∪ M. The claim follows.
Using the previous analysis the correctness of the algorithm follows.Note that, using similar arguments as in Lemma 5.1 the next lemma can be proved.
LEMMA 5.22.If d r ≥ 1, then p is a dominant strategy.
Furthermore, the combination of Lemma 5.22 with the fact that best responses can be computed in polynomial time gives the next theorem.THEOREM 5.23.In biased games with L ∞ penalty functions and max{d r , d c } ≥ 1, an equilibrium can be computed in polynomial time.
Again we can see that there is a connection between the equilibria of the distance biased game and the well supported Nash equilibria (WSNE) of the underlying bimatrix game.
OBSERVATION 1.Let B = R, C, b r (x, p), b c (y, q), d r , d c be a distance biased game with L ∞ penalties and let d := max{d r , d c }. Any equilirbium of B is a d-WSNE for the bimatrix game (R, C).

Approximation algorithm.
For the quality of approximation, we can reuse the results that we proved for the L 1 penalty.Lemma 5.5 applies unchanged.For Lemma 5.6, we observe that d c • b c (y * , q) ≤ 1 when the penalty b c (y * , q) is the L ∞ norm, since for this case it holds y * − q ∞ ≤ 1 and it is assumed that d c ≤ 1.Thus, we have the following theorem.THEOREM 5.24.In biased games with L ∞ penalties a 2/3-equilibrium can be computed in polynomial time.

CONCLUSIONS
We have studied games with infinite action spaces, and non-linear payoff functions.We have shown that Lipschitz continuity of the payoff function can be exploited to provide algorithms that find approximate equilibria.For Lipschitz games, we showed that Lipschitz continuity of the payoff function allows us to provide an efficient algorithm for finding approximate equilibria.For penalty games, the Lipschitz continuity of the penalty function allows us to provide a QPTAS.Finally, we provided strongly polynomial approximation algorithms for L 1 , L 2 2 , and L ∞ distance biased games.Several open questions stem from our paper.The most important one is to understand the exact computational complexity of equilibrium computation in Lipschitz and penalty games.Although Theorem 2.2 states that there no FPTAS for penalty games, the result holds only for games with penalty functions that depend on the size of the game and tend to zero as the size grows.Another interesting feature is that we cannot verify efficiently in all penalty games whether a given strategy profile is an equilibrium, and so it seems questionable whether PPAD can capture the full complexity of penalty games.On the other side, for the distance biased games that we studied in this paper, we have shown that we can decide in polynomial time if a strategy profile is an equilibrium.Is the equilibrium computation problem PPAD-complete for the two classes of games we studied?Are there any subclasses of penalty games, e.g. when the underlying normal form game is zero sum, that are easy to solve?PROOF.Notice from (10) that for all i we get y i = y k + 1 2 (α i − α k ).Using that we can write the term y T y = i y 2 i as follows for a when y has support size s s i=1 Then we can see that y * T y − 2y * T k q k is increasing as y * k increases, since we know from Lemma 5.12 that y * k > 0. This becomes clear if we take the partial derivative of y * T y * − 2y * k q k with respect to y * k which is equal to Thus, the value of y * T y * − 2y * k q k is maximized when y * k = 1 and our claim follows.
THEOREM 5.4.Let B = R, C, b r (x, p), b c (y, q), d r , d c be a distance biased game with L 1 penalties and let d := max{d r , d c }. Any equilirbium of B is a 2d-WSNE for the bimatrix game (R, C).
LEMMA 5.11.Let y k and y k+1 be two feasible responses with support size k and k + 1 respectively, where α k+1 > 1.Then T c (x, y k+1 ) > T c (x, y k ).PROOF.Let y k be a feasible response with support size k for the column player against strategy p and let λ(k) := k j=1 αj −2 2k PROOF.If P ≥ p max • (|H| + |M|) then there is a best response where the probability P is uniformly distributed over the pure strategies in H ∪ M.PROOF.Let |H|+ |M| = k and S = P − k • p max .Let U = i∈H∪M (p i + p max + S k )R i y − d r (p max + S k))