Bregman-Golden Ratio Algorithms for Variational Inequalities

Variational inequalities provide a framework through which many optimisation problems can be solved, in particular, saddle-point problems. In this paper, we study modifications to the so-called Golden RAtio ALgorithm (GRAAL) for variational inequalities—a method which uses a fully explicit adaptive step-size and provides convergence results under local Lipschitz assumptions without requiring backtracking. We present and analyse two Bregman modifications to GRAAL: the first uses a fixed step size and converges under global Lipschitz assumptions, and the second uses an adaptive step-size rule. Numerical performance of the former method is demonstrated on a bimatrix game arising in network communication, and of the latter on two problems, namely, power allocation in Gaussian communication channels and N-person Cournot completion games. In all of these applications, an appropriately chosen Bregman distance simplifies the projection steps computed as part of the algorithm.


Introduction
Let X , Y be real, finite-dimensional Hilbert spaces.In this work, we consider saddle-point problems of the form min where • f : X × Y → R is convex-concave and continuously differentiable.
Since many non-smooth optimisation problems can be cast in the form of (1), it is a useful and heavily studied tool in itself [1,19,2,34,42,23].However, rather than attempt to solve (1) in its current form, it is far more convenient, even within the afrorementioned references, to follow in the steps of Korpelevič [31] and Popov [45], by casting it as the Variational Inequality (VI): where and the variable z * := (x * , y * ) shown in (2) characterises the solution (x * , y * ) to (1).Many methods (see, for instance, [44,41,20]) for solving (2) require global Lipschitz continuity of the operator F .However, this assumption is often too strong to hold in practice.Even when F is globally Lipschitz continuous, knowledge of its Lipschitz constant is usually required as input to the chosen algorithm and determining this constant is typically more difficult than solving the original problem.Moreover, even if F is globally Lipschitz and its global Lipschitz constant is known, then, as the step-size is inversely related to the Lipschitz constant, a constant step-size rule can be too conservative.This is particularly unnecessary if the generated sequence lies entirely within a region where a local Lipschitz constant is small (relative to the size of the global constant).
Therefore, it is beneficial to instead define a step-size sequence which attempts to approximate a local Lipschitz constant with respect to the point iterates.The standard approach then is to generate a step-size sequence via a backtracking procedure (see [40,17,14,37,26,36] and references therein).While avoiding each of the shortcomings listed above, such methods can become expensive when considering the overall run-time of the algorithm, due to the arbitrarily large number of steps taken during the backtracking procedure within each iteration.An emerging alternative is that of adaptive step-sizes [38,39,3], which accomplish the same goals as backtracking methods without the need for backtracking, ie, the step-size update is fully explicit.In particular, the adaptive Golden RAtio ALgorithm (aGRAAL) [38] (as stated in Algorithm 1), named as such because of it's relationship with the Golden Ratio ϕ = 1+ √ 5 2 , solves (2), and is the method we focus on here.
One other way to potentially improve methods is to replace the Euclidean distance in the proximal operator with a non-Euclidean family of distance-like functions called the Bregman distance.Such methods, for solving (2), can be found in existing literature [14,43,24,21,22,28,27,29,25].Interestingly, most of these methods require a Lipschitz assumption but don't require knowledge of the Lipschitz constant.However, these employ a backtracking procedure and/or a non-increasing step-size sequence, whereas the step-size of our new method is fully explicit and is allowed to increase slightly at each iteration.
In this paper, we investigate Bregman modifications to Algorithm 1.To this end, we begin by proposing the Bregman-Golden RAtio ALgorithm (B-GRAAL), a Bregman version of the fixed step-size Golden RAtio ALgorithm (GRAAL), and prove convergence of our new method in full.We then present an adaptive version of B-GRAAL, or similarly, a Bregman modification to Algorithm 1, which we refer to as the Bregman-adaptive Golden RAtio ALgorithm (B-aGRAAL).Although we only provide a convergence analysis of B-aGRAAL in a restrictive setting, we observe it to work numerically outside this setting.
One advantage of our new method is the flexibility provided by the Bregman proximal operator.In the context of convex-concave games, for instance, this modified operator arises as the Algorithm 1: The Adaptive Golden RAtio ALgorithm (aGRAAL) [38]. Input: Calculate the step-size: Compute the next iterates: (5) Update: projection onto the probability simplex which has a simple closed form expression with respect to the Kullback-Leibler (KL) divergence but not with respect to the standard Euclidean distance.In fact, the Euclidean projection requires an O(n log n) time algorithm in n-dimensions [47,16,18], whereas the KL projection only requires O(n) time (see, for instance [10, Section 5] and [32,Section 4.4]).Another advantage of these modifications in the constrained optimisation case, is that it is sometimes possible to choose a Bregman distance whose domain is the constraint set so as to make the feasibility of the iterates implicit.The remainder of this paper is structure as follows.In Section 2, we collect preliminary results for use in our analysis.In Section 3, we present our fixed step-size method and a proof of convergence.In Section 4, we present our adaptive method with some partial analysis.Section 5 contains experimental results.Firstly, we compare the fixed step-size method with the Euclidean distance and the KL divergence on a matrix game between two players.Secondly, we make the same comparison for the adaptive method on power allocation problem in a Gaussian communication channel.Finally, we apply the adaptive method to an N -person Cournot oligopoly model with appropriately chosen Bregman distances over a closed box.We then conclude this paper by presenting some directions for further research.

Preliminaries
Throughout this work, H denotes a real, finite-dimensional Hilbert space with inner-product •, • and induced norm • .Given an extended real-valued function f : and defined as ∂f (x) := ∅ for x ∈ dom f .The indicator function of a set K ⊆ H is written ι K and takes the value 0 for x ∈ K and +∞ otherwise.
A proper, l.s.c., convex function h : H → (−∞, +∞] is called Legendre [12,Definition 7.1.1]if it is strictly convex on every convex subset of dom ∂h := {x ∈ H : ∂h(x) = ∅} and differentiable on int dom h = ∅ such that ∇h(x) → ∞ whenever x approaches the boundary of dom h.The convex conjugate of h written as h * : H → (−∞, +∞] is the given by When h is merely differentiable on int dom h, the Bregman distance generated by h is the function When h is also convex, D h is non-negative and, when h is σ-strongly convex, D h satisfies We begin by collecting some general properties of the Bregman distance.
Then the following assertions hold.
(a) (three point identity) For all x, y ∈ int dom h and z ∈ dom h, we have (b) For all x, y ∈ int dom h, we have (c) For all x ∈ dom h and y, u, v ∈ int dom h such that ∇h(y) = α∇h(u) + (1 − α)∇h(v) for some α ∈ R, we have Proof.(a) See, for instance, [46, Lemma 2.2] and the paragraph immediately after.(b) See, for instance, [7,Theorem 3.7(v)].(c) By using the definition of D h , together with the assumption ∇h(y) = α∇h(u) + (1 − α)∇h(v), we obtain This completes the proof.
We now turn our attention to operators defined in terms of the Bregman divergence.The (left) Bregman proximal operator of a function f : H → (−∞, +∞] is the (potentially set-valued) operator given by Since we will only require the left Bregman proximal operator in this work, will omit the qualifier "left" from here on-wards.For further details on the analogous "right Bregman proximal operator", the reader is referred to [11].The (left) Bregman projection onto C is the (left) Bregman proximal operator of ι C , that is, Next, we collect properties of the Bregman proximal operator for use in our subsequence algorithm analysis.Proposition 2.3 (Bregman proximal operator).Let f : H → (−∞, +∞] be proper, l.s.c, convex and let h : H → (−∞, +∞] be Legendre such that int dom h ∩ dom f = ∅. Then, for all u ∈ H, we have  (10) follows from convexity of h.To show the second, we apply (9) with u = x to see and similarly, Then adding these inequalities gives the desired result.
Remark 2.4.Parts (a), (b), (d) of Proposition 2.3 also apply to the Bregman resolvent [11], which is defined for a set-valued operator A : To see that the resolvent generalises the proximal operator, we refer to [6, Proposition 3.22 (ii)(a)].

The Bregman-Golden Ratio Algorithm
In this section, we consider the Variational Inequality (VI) problem: where we assume that A.2 h : H → (−∞, +∞] is continuously differentiable, Legendre, and σ-strongly convex.In addition, we will also require that D h (x, x n ) → 0 for every sequence (x n ) ⊆ int dom h that converges to some x ∈ dom h.
Remark 3.1.Assumption A.2 is common in the literature concerning Bregman first-order methods [46,15,5,4].In particular, the limit condition holds when ∇h is continuous and dom h is open.Indeed, in this case, σ-strong convexity of h implies that h * is 1 σ -smooth [30, Theorem 6], and so applying Proposition 2.1(b) gives The significance of dom h being open here is that h is differentiable at x ∈ dom h, however we observe that the same condition can still hold if dom h is closed.In particular, it also holds for the KL divergence (see, for instance [15, Example 2.1]).

Algorithm 2:
The Bregman-Golden RAtio ALgorithm (B-GRAAL) Compute the next iterates: The following lemma establishes the well-definedness of the sequences generated by the Bregman-GRAAL.Lemma 3.2.Suppose Assumptions A.1-A.2 hold.Then the sequences (z k ) and (z k ) generated by Algorithm 2 are well-defined.Moreover, (z k ) ⊆ int dom h and (z k ) ⊆ int dom h ∩ dom g. [8,Corollary 11.16] and so Proposition 2.3(a)-(b) shows that prox h λf is single-valued with range(prox h λf ) ⊆ int dom h ∩ dom g and therefore z k+1 ∈ int dom h ∩ dom g.

Proof. Suppose by way of induction that z
Remark 3.3.The Bregman proximal step shown in ( 13) can be expressed in terms of the Bregman proximal operator: , due to the first-order optimality condition in Proposition 2.3(c).
The following lemma is key in our convergence analysis.Lemma 3.4.Suppose Assumptions A.1-A.4 hold and that F is L-Lipschitz continuous on dom g ∩ int dom h.Let z * ∈ Ω be arbitrary.Then the sequences (z k ), (z k ) generated by Algorithm 2 satisfy 14) ), u := z ∈ dom h ∩ dom g arbitrary, x := z k+1 and y := z k , followed by the three point idenitity (Proposition 2.1(a)) we obtain Shifting the index in (15) (by setting k ≡ k − 1), setting z := z k+1 , and using ϕ∇h(z k ) = (ϕ − 1)∇h(z k ) + ∇h(z k−1 ) followed by the three point identity (Proposition 2.1(a)) gives Let z * ∈ S ∩ dom h.Setting z := z * in (15), summing with ( 16) and rearranging yields We observe that the left-side of ( 17) is non-negative as a consequence of (2) and A.3: To estimate the final term in (17), we use the Cauchy-Schwarz inequality, L-Lipschitz continuity of F , σ-strong convexity of h and the inequality λ ≤ σϕ 2L to obtain Combining ( 17), ( 18) and (19) gives Now applying Proposition 2.1(c) with ∇h( Combining ( 20) and ( 21), followed by collecting like-terms, gives Since ∇h(z k+1 ) = ϕ−1 ϕ ∇h(z k+1 ) + 1 ϕ ∇h(z k ), Proposition 2.1(c) gives Combining ( 22) and ( 23) establishes the second inequality in (14).To show the first inequality in ( 14), we apply the three point identity (Proposition 2.1(a)) to see that Using L-Lipschitz continuity of F and σ-strong convexity of h gives On substituting (25) back into (24) and rearranging, we obtain and therefore which establishes the first inequality of ( 14) and thus completes the proof.
The following is our main result regarding convergence of the Bregman GRAAL with fixed step-size.Theorem 3.5.Suppose Assumptions A.1-A.4 hold and that F is L-Lipschitz continuous on dom g ∩ int dom h.Then the sequences (z k ) and (z k ) generated by Algorithm 2 converge to a point in S ∩ dom h.
Next, using σ-strong convexity of h, we deduce that z k+1 − z k → 0, and that (z k ) and (z k ) are bounded.Thus, let z ∈ H be a cluster point of (z k ).Then there exists a subsequence (z kj ) such that z kj → z and z kj +1 → z as j → ∞.Now recalling (15) gives and taking the limit-infimum of both sides as j → ∞ shows that z ∈ Ω.Since z * ∈ Ω was chosen in Lemma 3.4 to be arbitrary, we can now set z * = z.It then follows that lim j→∞ D h (z * , z kj ) = 0, and consequently, lim j→∞ η kj = 0. Also note that for n ≥ k j , we have η n ≤ η kj from Lemma 3.4, and therefore (ϕ + 1) lim and therefore z k → z * from strong convexity.The fact that z k → z * follows since z k −z k → 0.
Remark 3.6.In the special case where h = • 2 , Algorithm 2 recovers the Euclidean GRAAL with fixed step-size from [38, Section 2] and the conclusions of Theorem 3.5 recover [38, Theorem 1].Despite this, the proof provided here is new and not the same as the one in [38, Theorem 1] even when specialised to the Euclidean case.Indeed, [38,Theorem 1] proceeds by establishing the inequality which is different to Lemma 3.4.Interestingly, ( 27) can be deduced from (22) in Lemma 3.4 by using the equality which applies in the Euclidean case, in place of ( 23) followed by the identity ϕ 2 = ϕ+1.Note also that the inequality ( 23) is already weaker than the inequality 4 The Adaptive Bregman Golden Ratio Algorithm In this section, we present an adaptive modification to Algorithm 2 and analyse its convergence.
The method is presented in Algorithm 3. As with the Euclidean aGRAAL, our Bregman adaptive modification has a fully explicit step-size rule.It is presented in Algorithm 3.
Algorithm 3: The Bregman adaptive Golden RAtio ALgorithm (B-aGRAAL) for k = 1, 2, . . .do Calculate the step-size: Compute the next iterates: Update: Observe that the step-size sequence (λ k ) in Algorithm 3 approximates the inverse of a local Lipschitz constant in the following sense: Before giving our main convergence result for Algorithm 3, we require some preparatory lemmas.The first two are concerned with well-definedness of the algorithm and boundedness of the step-size sequence.

Proof.
Follows by an analogous argument to that of Lemma 3.2 but with λ replaced by λ k and ϕ replaced by φ.Lemma 4.2.If (z k ) generated by Algorithm 3 is bounded and F is locally Lipschitz, then both (λ k ) and (θ k ) are bounded and separated from 0. In fact, for any Proof.First we note that λ k ≤ λ max by definition, and that Then an argument analogous to that of [39, Lemma 2] shows that λ k ≥ σφ 2 4L 2 λmax , and consequently, Our next result establishes an inequality similar, but not completely analogous, to its fixed step-size counterpart in Lemma 3.4.
Proof.We proceed in a similar fashion as in the proof to Lemma 3.4.By first applying Proposition 2.3(c) with f (z ), u := z ∈ dom h ∩ dom g arbitrary, x := z k+1 and y 0 := z k , followed by the three-point identity (Proposition 2.1(a)) we obtain Shifting the index (34) (by setting k ≡ k − 1), setting z := z k+1 , and using the fact that φ∇h(z k ) = (φ − 1)∇h(z k ) + ∇h(z k−1 ) followed by the three point identity (Proposition 2.1(a)) gives Now multiplying both sides of (35) by λ k λ k−1 then gives Let z * ∈ S ∩ dom h.Setting z = z * in (34), summing with (35) and rearranging yields We observe that the left-side of ( 37) is non-negative as a consequence of (2) and A.3: To estimate the final term of (37) we use the Cauchy-Schwarz inequality, the local Lipschitz estimate (32), and σ-strong convexity of h to obtain Combining ( 37), (38) and (39) gives Now applying Proposition 2.1(c) with ∇h(z k+1 ) = φ∇h(z k+1 )−∇h(z k ) φ−1 and rearranging yields Combining ( 40) and ( 41), followed by collecting like-terms and rearranging, gives Next we apply Proposition 2.1(c) once again to see that The final line of ( 42) can therefore be estimated as (44) Substituting ( 44) into ( 42) gives (33), which completes the proof.
The following is our main result regarding convergence of the Bregman adaptive GRAAL.Proof.Since dom g ∩ int dom h is bounded and F is locally Lipschitz, there exists L > 0 such that F is L-Lipschitz continuous on dom g ∩ int dom h.Suppose λ max is sufficient small so that φ , there exists an > 0 such that θ k − 1 − 1 φ ≤ − for all k ∈ N. Now let z * ∈ Ω be arbitrary and denote by (η k ) the sequence Next, we should show that (η k ) is bounded below.To this end, using the three point identity (Proposition 2.1(a)) followed by (34) gives Now, as with ( 17), the final term in ( 46) is non-negative.Since (λ k ) and (θ k ) are bounded and separated from 0 by Lemma 4.2, there exists a constant M > 0 such that Combing ( 46) and ( 47), noting that φ < φ φ−1 , then gives which establishes that (η k ) is bounded below.Next, by telescoping the inequality (45), deduce that (η k ) is bounded and D h (z k+1 , z k ) → 0 as k → ∞.Referring to (43), it follows that D h (z k+1 , z k+1 ) → 0. Also, by applying Proposition 2.1(c) with the identity ∇h(z k ) = φ∇h(z k )−∇h(z k−1 ) φ−1

, we obtain
Altogether, noting that (θ k ) is bounded, we deduce that and so, in particular, lim k→∞ D h (z * , z k ) exists.
Next, using σ-strong convexity of h, we deduce that z k+1 − z k → 0. Let z ∈ H be a cluster point of the bounded sequence (z k ).Then there exists a subsequence (z kj ) such that z kj → z and z kj +1 → z as j → ∞.Now recalling (34) gives and taking the limit-infimum of both sides as j → ∞ shows that z ∈ Ω.Since z * ∈ Ω was chosen in Lemma 4.3 to be arbitrary, we can now set z * = z.It then follows that lim j→∞ D h (z * , z kj ) = 0, and consequently, lim j→∞ η kj = 0. Also note that for n ≥ k j , we have η n ≤ η kj from Lemma 3.4, and therefore and z k → z * from strong convexity.The fact that z k → z * follows since z k − z k → 0. Although it is unlikely to be useful in practice, another interesting feature of the proof it that it requires the maximum step-size to satisfy the upper bound Note that, when φ > σ, this upper bound is looser than the upper-bound of λ < σφ 2L required for B-GRAAL (Algorithm 2).Thus, it is possible for maximum step-size of B-aGRAAL to be larger than the step-size required for B-GRAAL.This is the case, for instance, if D h is the KL divergence on the simplex in which case σ = 1 < φ.
Also, the proof of Theorem 4.4 required L be a Lipschitz constant for F .However, thanks to Lemma 4.2, it would have sufficed for L to satisfy the weaker Lipschitz-like inequality In turn, this would allow for a greater upper bound for λ max which is inversely related to L.

Numerical Experiments
In this section, we present some experimental results for Algorithms 2 and 3, to compare the respective original Euclidean methods against our new methods with respect to various choices of Bregman distances.All experiments are run in Python 3 on a Windows 10 machine with 8GB memory and an Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz processor.
We consider three different problems, all of which can formulated as the variation inequality (11) for different choices of the function g and the operator F .Note that the solutions of the variational inequality (11) can also be characterised as the monotone inclusion find z * ∈ H such that 0 ∈ F (z * ) + ∂g(z * ).Thus, by noting that (30) implies 0 ∈ F (z k ) + ∂g(z k+1 ) + 1 λ k (∇h(z k+1 ) − ∇h(z k )), we monitor the quantity J k given by as a natural residual for Algorithm 3. The analogous expression for Algorithm 2 is given by replacing λ k in (49) with λ.
In all experiments, we run each algorithm on for the same (fixed) number of iterations on 10 random instances of the chosen problem.The figures in each section show the decrease in residual over time, with each faint line representing one instance and the bold line showing the mean behaviour.We use the parameters φ = 1.5, λ max = 10 6 throughout, and initial step-size and iterates as described in each section.

Matrix Games
To test the fixed step-size B-GRAAL (Algorithm 2), we first consider the following matrix game between two players min where ∆ n := {x ∈ R n + : n i=1 x i = 1} denotes the unit simplex and M ∈ R n×n a given matrix.This problem is of the form specified in (1) and so can be formulated as the variational inequality (2).
In particular, we consider the specific problem in the form of (50) of placing a server on a network G = (V, E) with n vertices in a way that minimises its response time.In this problem, a request will originate at some vertex v j ∈ V , which is not known ahead of time, and the objective is to place the server at a vertex v i ∈ V such the response time, as measured by the graphical distance d(v i , v j ), is minimised.We consider the case where the request location v j ∈ V is a decision made by an adversary.The decision variable x ∈ ∆ n (resp.y ∈ ∆ n ) models mixed strategies for placement of the server (resp.request origin).In other words, x t (resp.y t ) is the probability of the server (resp.request origin) being located at the node v t for t = 1, . . ., n.The matrix M is taken to be the distance matrix of the graph G, that is M ij = d(v i , v j ) for all vertices v i , v j ∈ V .In this way, the objective function in (50) measures the expected response time, which we would like to minimise while our adversary seeks to maximise it.
We compare Algorithm 2 with the squared norm h(z) = 1 2 z 2 , which generates the squared Euclidean distance D h (u, v) = 1 2 u − v 2 , and the negative entropy , where z = (x, y) ∈ R 2n is the concatenation of the two vectors x and y.Both of these choices for h are 1-strongly convex on ∆ n .A potential advantage of the the KL projection onto the simplex is that it has the simple closed-formed expression x → x x 1 .whereas the Euclidean projection has no closed form and takes O(n log n) time (see, for instance [47,16,18]).We run two experiments with n = 500 and n = 1000, and the results are shown respectively in Figures 1 and 2. Initial points are chosen as x 0 = y 0 = 1 n , . . ., 1 n ∈ ∆ n , and z 0 = (x 0 , y 0 ), then z 0 as a random perturbation of z 0 .The Lipschitz constant of F is computed as L = M 2 , and the algorithm step-size is taken as λ = ϕ 2L .Under these conditions, convergence to a solution is guranteeed by Theorem 3.5.The residual, given by min i≤k J i 2 , is shown in Figures 1a and 2a.The time per iteration is also shown in Figures 1b and 2b.Despite the KL projection being faster than the Euclidean projection, overall, the Euclidean method performed better in this instance.
We now move onto experiments for the adaptive Algorithm 3.

Gaussian Communication
We now turn our attention to maximising the information capacity of a noisy Gaussian communication channel [35,Chapter 9].In this problem, the goal is to allocate a total power of P across m channels, represented by p ∈ R m + , to maximise the total information capacity of the channels in the presence of allocated noise, represented by n ∈ R m + .The information capacity of the ith channel, denoted C i (p i , n i ), is the function of power p i and noise level n i and is given by where µ i > 0 and β i > 0 are given constants.x i = T } is a scaled simplex.This problem is also of the form specified in (1) and so can also be formulated as the variational inequality (2).
The Lipschitz constant of the operator F for this problem is not straightforward to compute, so we apply the adaptive algorithms.Similarly to Section 5.1, we compare the Euclidean and KL versions of Algorithm 3. Since x → x log x is 1 M -strongly convex for 0 < x ≤ M , the strong convexity constant of the negative entropy over ∆ m P × ∆ m N is min 1 P , 1 N .In our experiments, we set (P, N ) = (500, 50) and generate β ∈ (0, P ] m and µ ∈ (1, N + 1] m uniformly.The initial points are chosen as p 0 = P m , . . ., P m ∈ ∆ m P , n 0 = N m , . . ., N M ∈ ∆ m N , and z 0 = (p 0 , n 0 ), and the initial step-size is taken as λ 0 = z0−z0 2 F (z0)−F (z0) 2 where z 0 is again a small random perturbation of z 0 .It is also worth noting that for the KL version, we had to multiply λ 0 by a small constant (10 −2 ) to avoid numerical instability issues.
We run two experiments, for m = 100 and m = 200, and plot the results in Figures 3 and 4 respectively.The KL method is slower than the Euclidean method in terms of the number of iterations and time.However, unlike in the previous section, both methods reach a similar final accuracy.

Cournot Completion
Our final example is a standard N -player Cournot oligopoly model [13,Example 2.1].This is a system in which N independent firms supply the market with a quantity of some common good or service.More formally, each firm seeks to maximise their utility subject to their capacity, that is, max where • x T := N i=1 x i is the total amount supplied.• C i > 0 is the production capacity of the ith firm.
• c i > 0 is the production cost of the ith firm.
• P : R + → R is the inverse demand curve.
In this section, we consider solutions to this problem in the sense of Nash equilibria, which is equivalent to the variational inequality (11) with By choosing a function h such that dom h = K, it is possible to implicitly enforce the capacity constraint in this problem and avoid performing projections.Two examples, over a single closed interval [α, β], present themselves which satisfy our assumptions: When α = 0 and β = 1, h 1 is called the Fermi-Dirac Entropy, and similarly when α = −1 and Then we sum over the independent functions of one variable, with appropriately set intervals α = 0, β = C i , to create the Bregman distance over the closed box K.We also compute the strong convexity constants by minimising ∇ 2 h 1 over (α, β), which gives σ = 4 β−α , and similarly for h 2 , σ = 2 β−α .In our experiments, P is taken to be a linear inverse demand curve given by P (x) = a − bx for a, b > 0. All parameters are generated by a log-normal distribution, except the cost vector We run two experiments, with N = 2000 and N = 5000, the results of which are shown in Figures 5 and 6.The initial points are chosen as z 0 = C 2 , and λ 0 = 1.We observe here that the final accuracy depends heavily on the choice of Bregman distance.
In both figures, we observe that the Hellinger method is not making any progress after approximately 200 iterations, while the Euclidean method acheives a modest final accuracy.Meanwhile the Bit method is very fast and accurate, converging to a near 0 tolerance in roughly 300 iterations in all instances.Finally, we note that all methods are roughly equal in terms of time per iteration.This is to be expected, since the Euclidean method requires a projection which evaluates min{max{x i , 0}, C i } for each component, whereas the Bregman methods require evaluating ∇h, (∇h) −1 over each component -all of which are taking O(N ) time.

Conclusion
In this paper, we extended the adaptive method aGRAAL (Algorithm 1) to the Bregman distance setting.We proposed two such extensions: The first, Algorithm 2, generalises the fixed stepsize GRAAL and converges under the same assumptions for a strongly-convex Bregman function h : H → R. The second, Algorithm 3, generalises Algorithm 1, and converges in a more restrictive setting.We first examined the performance of Algorithm 2, and found that the KL version is less favourable than the Euclidean version, despite the reduced time per iteration.We then tested Algorithm 3 on a convex-concave game for Gaussian communication channels, where our new method performed worse with respect to the KL divergence when compared to the Euclidean method, although the run-time per iteration was again significantly shorter as was expected.Finally, we examined a Cournot completion model, where one of Bregman based methods reached a much higher accuracy very quickly.
We conclude by outlining directions for further research: • It would be interesting to know whether Algorithm 3 can be shown to converge in a more general setting than what we have shown, and if so, under what circumstances.The difficulties in our analysis arose from two issues: first, the estimate derived in (43) for the Bregman case is weaker than the Euclidean equality z k+1 − z k+1 2 = 1 φ 2 z k+1 − z k 2 used in [38]; and second, the inability to bound θ k below without additional assumptions (as the bound in Lemma 4.2 can be arbitrarily small in general).
• Throughout this paper, h was assumed strongly convex, but whether or not this assumption can be relaxed is unclear.One potential replacement for strong convexity is considered in [9,33] where h is twice differentiable such that the Hessian matrix ∇ 2 h(z) is positive definite for all z ∈ int dom h.Within the scope of twice differentiable functions, such a condition lies between strict and strong convexity, the main consequence being that ∇h is locally Lipschitz and h is locally strongly convex.Indeed, for σ-strongly convex h with η-Lipschitz gradient, one can derive the estimates If such inequalities hold on a local scale, then it would remain to be seen whether these coefficients can be estimated and used in the same way that λ k approximates an inverse of the local Lipschitz constant of F .
• In the context of the convex composite optimisation problems of the form min x∈H f (x) + g(x) where g is nonsmooth and f is smooth, the Bregman proximal gradient algorithm [46,5] is known to converge in a more general setting than Lipschitz continuity.Specifically, L-Lipschitz continuity of ∇f can be relaxed to convexity of the function Lh − f , which indeed holds if ∇f is Lipschitz and h is strongly-convex.It would be interesting to see if a similar relaxation of Lipschitz continuous can be used in the context of the algorithms discussed here.

Lemma 4 . 1 .
Suppose Assumptions A.1-A.2 hold.Then the sequences (z k ) and (z k ) generated by Algorithm 3 are well-defined.Moreover, (z k ) ⊆ int dom h and (z k ) ⊆ int dom h ∩ dom g.

Remark 4 . 5 .
The energy(45) used in the proof Theorem 4.4 is not completely analogous to the one used in the fixed step-size case given in Lemma 3.4.Notably, the coefficient of the final term of η k in (45) has coefficient −1 whereas the corresponding term in Lemma 3.4 has coefficient −ϕ.

Figure 1 :
Figure 1: Matrix game results for n = 500

Figure 2 :C
Figure 2: Matrix game results for n = 1000

Figure 3 :
Figure 3: Gaussian communication channel results for m = 100

Figure 4 :
Figure 4: Gaussian communication channel results for m = 200 (a) Residual over iterations (b) Time per iteration