Quantitative analysis of a subgradient-type method for equilibrium problems

We use techniques originating from the subdiscipline of mathematical logic called ‘proof mining’ to provide rates of metastability and—under a metric regularity assumption—rates of convergence for a subgradient-type algorithm solving the equilibrium problem in convex optimization over fixed-point sets of firmly nonexpansive mappings. The algorithm is due to H. Iiduka and I. Yamada who in 2009 gave a noneffective proof of its convergence. This case study illustrates the applicability of the logic-based abstract quantitative analysis of general forms of Fejér monotonicity as given by the second author in previous papers.


Introduction
In [11], a general logic-based analysis of abstract forms of convergence theorems based on general forms of Fejér monotonicity is given.That paper uses methods from the subdiscipline of mathematical logic called 'proof mining', which aims at the extraction of effective bounds from prima facie nonconstructive proofs by logical transformations (see [9] for a book treatment and [10] for a recent survey).
Even in most simple cases of ordinary Fejér monotonicity on the real line and with all the data involved trivially being computable, there, in general, are no computable rates of convergence as one can show using methods from computability theory (see the discussion in [11] and -in particular - [14]) which sharpen known 'arbitrary slow convergence' phenomena discussed in optimization to noncomputability results.
Logically speaking, this is because the formulation of the Cauchy-property of a sequence (x n ) n∈N (say in a metric space (X, d)), that is ∀k ∈ N∃N ∈ N∀n, m ≥ N d(x n , x m ) < 1 k + 1 is of the form ∀∃∀ which, in general, is of too high logical complexity (and thus not covered by the general logical metatheorems used in proof mining to extract bounds from noneffective proofs).
What can be achieved by the aforementioned logical metatheorems, however, are effective rates of so-called metastability which, moreover, are highly uniform.Metastability is based on a (noneffectively equivalent but constructively weakened) reformulation of the convergence or Cauchy statements into what is known in logic as Herbrand normal form.In the context of the above example, this reformulation is given by (here [n; n + g(n)] := {n, n + 1, n + 2, . . ., n + g(n)}) This statement is of the general form ∀∃ (considering the leading two universal quantifiers as one and disregarding the last universal quantifier as it is bounded) and for statements of the above form, the logical metatheorems of proof mining guarantee the extractability of highly uniform effective bounds on '∃n ∈ N' (see [9]).Such bounds are by now well-known in the literature under the name of rates of metastability (after Tao, see e.g.[17,16]).One important consequence of the Fejér monotonicity (in the very general sense of [11]) of an iterative sequence (x n ) n∈N is that effective rates of convergence can be established if some general form of regularity, provided quantitatively by a so-called modulus of regularity, which generalizes many concepts of regularity used in optimization, is given (see [13]).The existence of regularity usually requires to be in a rather special ('tame') context where the sets in question are e.g.semialgebraic so that tools from the model theory of o-minimal structures can be utilized (see e.g.[6,3] and -for a concrete example - [4]).
Our paper is intended as a case study to illustrate how the abstract approach from [11,13] can be used in a very concrete situation to give a perspicuous quantitative analysis of the algorithm being considered.The logic-based notions used in these papers now have a concrete mathematical meaning so that the whole treatment can be given without any reference to logic.We will indicate in the next subsection that we expect that many other algorithms can be analyzed in a similar way.
In [5], the following equilibrium problem over the fixed point set of a firmly nonexpansive mapping is studied: let f : R N × R N → R be a function such that: (1) f (x, x) = 0 for any x ∈ R N ; (2) f (•, y) is continuous for any y; (3) f (x, •) is convex for any x.Now we can state the equilibrium problem for f (a so-called equilibrium function) over the fixed point set F ix(T ) of T, where T : R N → R N is a firmly nonexpansive mapping with a nonempty fixed point set.
Problem 1 (Equilibrium problem of f over Fix(T )).Find a Iiduka and Yamada proposed a subgradient-type iterative algorithm (x n ) n∈N (see also [7] for a related algorithm also discussed in [5]) and showed that it converges to a point in EP(Fix(T ), f ).However, there is no quantitative information given in the theorem.In this paper we provide explicit quantitative versions of this result as well as of several intermediate convergence results such as As mentioned already, even for N = 1 and f ≡ 0 one can construct a simple computable T such that (x n ) n∈N has no computable rate by adapting a counterexample from [14].
Nevertheless, we present a fully effective and highly uniform rate of metastability for the algorithm providing a complete finitary account of the main result in [5].
Moreover, in Section 3 we even give a rate of convergence, modulo an additional metric regularity assumption.
1.1.Analytical preliminaries.Throughout, we consider R N (N ≥ 1) as the Euclidean space with the usual inner product •, • and the induced Euclidean norm • .With B r (x) and B r (x), we denote the open and closed ball with radius r > 0 and center x ∈ R N with respect to • , respectively.Throughout, if not stated otherwise, let T : R N → R N be a firmly nonexpansive mapping, that is for all x, y ∈ R N : In particular, T is also nonexpansive.

1.2.
The subgradient method of Iiduka and Yamada.for the equilibrium problem utilizing the subgradient of the equilibrium function f : Algorithm 2 (Subgradient-type method for Problem 1).Choose ε 0 ≥ 0, λ 0 > 0 and x 0 ∈ R N arbitrarily and define ρ 0 := x 0 and set n = 0.Then, repeat: As discussed in [5], this algorithm is based on the combination of ideas from two well-known algorithms, namely the hybrid steepest descent method of Yamada [19] and the scheme of Iusem and Sosa from [7].Note that the approximate maximum point y n can be computed effectively due to the error ε n whenever the latter is strictly positive and f (•, x n ) comes with a modulus of uniform continuity on K n while for ε n = 0 there in general would be no computable point y n (see [8] for a discussion of this point in terms of complexity theory).
In [5], the following theorem on the correctness of the algorithm is established: Theorem 3 (Iiduka and Yamada, [5]).Let Fix(T ) = ∅.Assume that there is an M > 0 with ξ n ≤ M for all n ∈ N. Then the sequences (x n ) n∈N , (y n ) n∈N generated by the algorithm satisfy: (a) For all u ∈ Ω n := {u ′ ∈ Fix(T ) | f (y n , u ′ ) ≤ 0}: In particular, if λ n ∈ [0, 2/M 2 ]: for some a, b ≥ 0 and all n ∈ N, then the sequences (x n ) n∈N , (y n ) n∈N are bounded and lim n→∞ f (y n , x n ) = 0 as well as lim n→∞ x n − T x n = 0.
(c) If ε n ≥ 0 for all n with lim n→∞ ε n = 0, in addition to the requirements for (b), then (x n ) n∈N converges to a point in EP(Fix(T ), f ).
In this paper we establish quantitative versions of the claims in this theorem.
1.3.The range of the results.First, let us stress that to consider only sets of fixed points Fix(T ) of a firmly nonexpansive mapping T in Problem 1 is indeed not limiting in the sense that it still allows us to consider arbitrary closed and convex sets C in place of Fix(T ): for any such C, the metric projection P C is a firmly nonexpansive mapping (see e.g.[1]) with Fix(P C ) = C. See [5] for further considerations on this version of Problem 1 over C.
From that perspective, Problem 1 can be seen to indeed encompass many general notions and problems from convex optimization as special cases, including in particular the famous Nash-equilibrium problem (as treated in [5]) as well as the convex minimization problem, the variational inequality problem and the vector minimization problem, next to others (see [7]).
Moreover, allowing arbitrary firmly nonexpansive mappings T in place of plain projections P C can be beneficial in the concrete practical formulation of particular equilibrium problems, as e.g.Iiduka and Yamada show in their work [5] for the example of the previously mentioned Nash-equilibrium problem.Here, while dealing with sets C where, on the one hand, P C may be computationally untractable, while, on the other hand, C can be given by (the intersection of) simple closed convex sets C i whose projections P Ci are tractable, a firmly nonexpansive mapping T can be defined using the tractable projections P Ci which is not a projection itself but fulfills Fix(T ) = C and inherits tractability from the P Ci .
And further, many practical choices of such sets C from convex optimization already lend themselves to representations as fixed point sets of firmly nonexpansive mappings, a prime example maybe being the set of zeros zerA of a monotone (or accretive) operator A. These zero-sets can be expressed as the set of fixed points of the resolvent J A corresponding to A which is, in particular, firmly nonexpansive (see [1] for a comprehensive reference on monotone operators).
We expect that various other algorithms for equilibrium problems over suitable sets C can be treated by following a similar analysis provided in this paper (using [11,13]).

A first quantitative analysis
A first consequence of Theorem 3 is the following reformulation of (parts of) part (a).
As the required sequence is monotone, we can obtain a direct rate of metastability for the sequence from the next lemma which follows immediately from [9], Proposition 2.27 and Remark 2.29.
Here g(n) (K) denotes the n-th iteration of g starting from K. For the special case of K = 0, we simply write Proof.The proof given in [9], Proposition 2.27 and Remark 2.29, only provides the case for K = 0.It is, however, immediately apparent from the proof given there that the argument of g(⌈cu(k+1)⌉) can be chosen to be an arbitrary K ∈ N. Similarly, it follows from said proof that the resulting n is then of the form n = g(i) (K) for some i ≤ ⌈c u (k + 1)⌉.Therefore in particular also n ≥ K by construction of g.
A lemma used in the proof (in [5]) of Theorem 3, part (b) and (c), is the following which is a direct corollary of part (a).
Lemma 6 (Iiduka and Yamada, [5] Using this lemma, we obtain the following quantitative analysis of the convergence of f (y n , x n ) towards 0.
with Φ 1 as in Lemma 5.
Proof.Let k ∈ N and g ∈ N N be arbitrary.By Lemma 5 where the first inequality follows from Lemma 6.From this the claim is immediate.
Lemma 8 (Iiduka and Yamada, [5], p. 257).Let u ∈ Ω = ∅ be arbitrary and (i) For all n ∈ N: In particular, if L ≥ diam{x n | n ∈ N}: (ii) For all n ∈ N: Then, for all k ∈ N and all g ∈ N N : where we have with Φ 1 as in Lemma 5.
Proof.Let k ∈ N and g ∈ N N be arbitrary.As an abbreviation, we write Using Lemma 6, we at first have Further, we have (using Lemma 8) for any u ∈ Ω: and therefore (using ( * 1 )): for all n ∈ N. By Lemma 5, we have that ∃m ≤ Φ 3 (k, g, c u ) − 1 such that (using ( * 2 )) for all i ∈ [m; m + g(m + 1)]: and thus by the above we have Remark 10.Note that a bound L on the diameter of (x n ) n∈N as used in Proposition 9 as an input can actually be obtained in terms of c u by setting L := 2 √ c u as we have: To obtain a rate of metastability for the sequence (x n ) n∈N , we apply recent results of Kohlenbach, Leuştean and Nicolae [11] on Fejér-monotone sequences.Other examples of application of these recent results are especially the derivation of a quantitative version of asymptotic regularity of compositions of two mappings (see [12]).We recall the definition of Fejér monotonicity.Definition 11.Let (X, d) be a metric space, F ⊆ X nonempty and (x n ) n∈N be a sequence in X. (x n ) n∈N is called Fejér-monotone with respect to F , if for all n ∈ N and all p ∈ F : The authors in [11] actually introduce a generalized form of Fejér monotonicity, but for the purpose of this work, the above is enough.However, we pass to the notion of uniform Fejér monotonicity, as introduced in [11], to formulate the (following) quantitative results.
For this, one considers approximations of the approached set F in form of a descending sequence of sets Definition 12. (x n ) n∈N is called uniformly Fejér monotone with respect to F and (AF k ) k∈N if for all r, n, m ∈ N: Any function χ(n, m, r) producing such a k ∈ N is called a modulus of (x n ) n∈N being uniformly Fejér monotone.
Under the assumption of (X, d) being boundedly compact, the authors of [11] obtain (in a slightly generalized setting) an explicit effective rate of metastability for the sequence (x n ) n∈N .This rate only depends on the particular uniform quantitative reformulations of the assumptions of the setting such as a modulus of uniform Fejér monotonicity and some further quantitative information on the space (X, d) and on how the sequence (x n ) n∈N approaches the set F .
With quantitative information on the space (X, d) we here mean explicitly a modulus of total boundedness (as defined in [11]) or a 'modulus of bounded compactness'.This will be discussed in the proof of Theorem 17.
With quantitative information on how the sequence (x n ) n∈N approaches the set F , we mean a bound on (x n ) n∈N having approximate F -points.For this, recall the following definition from [11]: As a feasibility check on whether these results can be applied and whether the setup of [5] fits into the above framework, note that part (a) of Theorem 3 can be seen (modulo some refinement of the approximations Ω n ) as hinting the uniform Fejér monotonicity of (x n ) n∈N (as the sequence from Algorithm 2) with respect to the set Ω being taken as F.
We at first focus on whether (quantitative versions of) these properties of uniform Fejér monotonicity and approximate F -/Ω-points can be obtained by suitably modifying the approximations Ω n .
For this, we need to weaken the conditions of Ω n to allow (x n ) n∈N to lie in them further along the approximation.As none of these x n is expected to be a fixed point of T or to satisfy f (y m , x n ) ≤ 0, we weaken these properties to that of approximate fixed point and f (y m , u) ≤ 1 k+1 , respectively.Part (b) of Theorem 3 gives, as a feasibility check, that in the long run f (y n , x n ) is expected to decrease and that the sequence x n contains better and better approximate fixed points of T .
Using this motivation, we define which plays the role of AF k .By construction, we naturally have that (Ω ′ k ) k∈N is descending and Further, we obtain the following lemma giving a quantitative version of (x n ) n∈N having approximate Ω-points with respect to (Ω ′ k ) k∈N (modulo some quantitative reformulations of the parameters of Algorithm 2).Proof.Let k ∈ N. We again write Again by Lemma 6, we have and as in the proof of Proposition 9, we obtain Note, that for 2 : n → 2 we have Hence by Lemma 5 (applied to j := i + 1 and K := max{k, τ (2k + 1)}), we have that ∃n By n + 1 ≥ n ≥ max{k, τ (2k + 1)} we get ε n+1 ≤ 1 2(k+1) .Therefore, using ( * 1 ) and ( * 2 ): By the above, we have separately as f (y n+1 , x n+1 ) ≥ 0 by definition from Algorithm 2. Also by the definition of Algorithm 2, we have .
By definition of the K j , as K j ⊆ K j+1 and y j ∈ K j , we have y 0 , . . ., y n+1 ∈ K n+1 .Therefore, we have especially The next two lemmas now give the quantitative version of the uniform Fejér monotonicity of (x n ) n∈N with respect to Ω and (Ω ′ k ) k∈N .Lemma 15.Let M > 0 with M ≥ ξ n for all n ∈ N and let λ n ∈ [a, b] ⊆ (0, 2/M 2 ) for all n ∈ N. Now, let n ∈ N be fixed.For any k ≥ n and any u ∈ Ω ′ k : In particular, we have for all u ∈ Ω ′ k and for all l ∈ N with n + l ≤ k + 1 (where for l = 0 the sum is 0).Proof.We give a quantitative analysis of the proof of (3.6) in [5].At first, note that ξ n ∈ ∂f (y n , •)(x n ) by the definition of Algorithm 2. Thus, by the definition of the subgradient, we have especially k .Therefore, we have: From this, it naturally follows that The claim follows from this by induction on l ≥ 1 with n + l ≤ k + 1 (the case of l = 0 is trivial).
Lemma 16.Let M > 0 with M ≥ ξ n for all n ∈ N and let λ n ∈ [a, b] ⊆ (0, 2/M 2 ) for all n ∈ N. Further let e ≥ f (y n , x n ) for all n ∈ N. Then (x n ) n∈N is uniformly Fejér monotone with modulus χ(n, m, r), that is for all r, n, m ∈ N: Proof.Fix r, n, m ∈ N and assume m ≥ 1 without loss of generality.Let k = χ(n, m, r), u ∈ Ω ′ k and l ≤ m.By Lemma 15, we have (as and by using f (y n , x n ) ≤ e, we obtain we obtain Applying Theorem 5.1 from [11], we now obtain a rate of metastability for (x n ) n∈N .
Theorem 17 (Quantitative version of Theorem 3, part (c), I).Let e ≥ f (y n , x n ) and M > 0 with M ≥ ξ n for all n ∈ N. Also, let λ n ∈ [a, b] ⊆ (0, 2/M 2 ) for all n ∈ N as well as L ≥ diam{x n | n ∈ N}.Further, let ε n ≥ 0, ε n → 0 (n → ∞) and τ be a nondecreasing rate of convergence for Then, for all k ∈ N and all g ∈ N N : where we define Proof.The proof is an application of Theorem 5.1 from [11] with X := B L (x 0 ) and F := Ω ∩ X, AF k := Ω ′ k ∩ X (and G := H := Id).Here we use that we have as we assume L ≥ diam{x n | n ∈ N} and we have x n − x 0 ≤ diam{x n | n ∈ N} by definition of the diameter.By Example 2.8 of [11], the function γ(k) and, considering the definition of (II)-moduli of total boundedness from [11], it is straightforward to see that these moduli are 'translation-invariant' in the case of normed vector spaces, i.e. any (II)-modulus of total boundedness for a set A ⊆ R N is also a (II)-modulus of total boundedness for In our situation, we thus have that the particular γ is also a (II-)modulus of total boundedness for B L (x 0 ) = B L (0) + x 0 .By Lemma 14, Φ is an approximate F -point bound and by Lemma 16, χ is a modulus for (x n ) n∈N being uniformly Fejér monotone w.r.t.F (and AF k ).Applying Theorem 5.1 with γ, Φ, χ gives the result.
Remark 18.In the above theorem, we can obtain a bound e on f (y n , x n ) in terms of c u , a, b and M by setting as we have (using Lemma 6): Remark 19.The complexity of our rate of metastability is mainly given by the fact that the function Φ • χ g and so, in particular, the 'counterfunction' g gets iterated in the definition of Σ 0 .Some iteration of this sort, however, is unavoidable as the counterexample given in [15] to the computability of the rate of convergence (already for N = 1 and f = 0) shows that an extremely special case of the algorithm studied computes the limit of a decreasing sequence in [0, 1] whose rate of metastability necessarily needs this iteration process (see the discussion on p.4 of [11]).In the next section we show that a low-complexity rate of full convergence results under an additional metric regularity assumption.

Adding further assumptions
In this section, we investigate two sets of assumptions to strengthen Theorem 17.
3.1.Uniform closedness.In [11], the authors introduce the notion of uniform closedness, an additional assumption on the way the sets AF k approach the set F (using the notation of the previous general setting of Definition 13).
We recall the corresponding definition.
Definition 20.Let (X, d) be a metric space and F ⊆ X be nonempty.Let AF k ⊆ X be closed with Under the assumption of uniform closedness, the authors obtain Theorem 5.3 in [11] as a strengthening of Theorem 5.1.In the following, we will observe that, under further quantitative assumptions on the equilibrium function f , Ω is uniformly closed with the previously defined approximations (Ω ′ k ) k∈N and compute the corresponding moduli of uniform closedness.
This notion of modulus of uniform continuity differs from the commonly known modulus of continuity in (numerical) analysis but is commonly used in computable and constructive analysis as well as in proof mining (see e.g.[2,9,18]).
We then obtain the following result giving corresponding moduli of uniform closedness in terms of moduli of uniform continuity.
Lemma 22.Let (y j ) j∈N be a sequence in R N (defining Ω ′ k ) and let σ j be moduli of uniform continuity for f (y j , •) for all j ∈ N on some subset D ⊆ R N .Then and .
As T is especially nonexpansive, we have T u − T u ′ ≤ u − u ′ .As ω Ω (k) ≥ 4k + 3, we have further and thus As ω Ω (k) ≥ σ max k (2k + 1) ≥ σ i (2k + 1) for all i ≤ k by assumption, we have for all i ≤ k and as σ i is a modulus of uniform continuity for f (y i , •), we have for all i ≤ k.Thus, by definition we have u ∈ Ω ′ k .Using these moduli, we obtain the following strengthening of Theorem 17 in correspondence to Theorem 5.3 instead of Theorem 5.1 (of [11]).
Theorem 23 (Quantitative version of Theorem 3, part (c), II).In addition to the assumptions of Theorem 17, let σ j be a modulus of uniform continuity for f (y j , •) for any j ∈ N on B L (x 0 ).
Then, for all k ∈ N and all g ∈ N N : for Σ(k, g) := Σ 0 (P (k 0 ), k 0 , g, χ k , Φ) with P, χ, Φ and Σ 0 as in Theorem 17 as well as where we define Proof.Apply Theorem 5.3 of [11] under the same considerations as in the proof of Theorem 17, using Lemma 22 with D := B L (x 0 ).The Lemmas 14 and 16 apply as before.
Remark 24.This theorem is a finitization of Theorem 3, part (c) as it (ineffectively, but elementary) implies back the statement of (c): the metastability trivially implies the Cauchy-statement (and thus convergence) of (x n ) n∈N .Further: for M ∈ N and g : n → M , Theorem 23 gives ∃i ≥ M (x i ∈ Ω ′ k ).Thus, as Ω ′ k is closed, we have x := lim n→∞ x n ∈ Ω ′ k and as k was arbitrary, we have x ∈ Ω by Ω = k∈N Ω ′ k .As in [5], p. 258, it follows elementary that x ∈ EP(Fix(T ), f ).
3.2.Regularity conditions.Using the recent quantitative treatment [13] of very general scenarios of regularity conditions in the context of Fejér monotone sequences, we can give an improvement of Theorems 17 and 23 by adding assumptions on (a quantitative version of) a regularity condition for Ω and obtain (under this assumption) even rates of convergence for the sequence approximating an equilibrium point.
Central for the further results is the following quantitative version of regularity, defined as modulus of regularity in [13].For this, given a function F : R N → R, we write zer F for the set of zeros of F .Definition 25 ( [13]).Let F : R N → R be a function with zer F = ∅ and fix z ∈ zer F and r > 0. A function φ : (0, ∞) → (0, ∞) is a modulus of regularity for F w.r.t.zer F and B r (z) if for all ε > 0 and all x ∈ B r (z): The setting for these regularity conditions in [13] is far more general, e.g.being in the context of abstract metric spaces.We will, however, only need the above version for functions over R N .
Under the assumption of a modulus of regularity for F together with the Fejér monotonicity of a sequence (x n ) n∈N w.r.t.zer F and some further assumptions on quantitative information on how the sequence (x n ) n∈N interacts with F , the authors obtain effective rates of convergence for the sequence (x n ) n∈N .
By quantitative information on the interaction of (x n ) n∈N with F , we mean precisely that (x n ) n∈N has approximate F zeros.For this, recall the following (modification of the) definition from [13].
Definition 26.Let F be as above.We say that a sequence (x n ) n∈N has approximate F zeros if A bound on '∃n ∈ N' is called an approximate zero bound.
Notice the similarity of approximate zeros with approximate F -points from Definition 13 (although there F has a different meaning).Guided by this similarity, the fact that the particular sequence (x n ) n∈N from Algorithm 2 has approximate Ω-points (relative to the representation (Ω ′ k ) k∈N ) and the fact that (x n ) n∈N is Fejér monotone w.r.t.Ω, we are particularly interested in a function F where (1) zer F = Ω and ( 2 Towards a particular choice, we first define the set-valued function γ : R N → P(N) through otherwise.
Given a mapping T : R N → R N , we may further define the function This function is now an adequate choice which fulfills the previously desired requirements (1) and (2).For this, note first that γ(x) has the following property whose proof is immediate: Lemma 27.For any x ∈ R N and any k ∈ N: k ∈ γ(x) implies j ∈ γ(x) for all j ≥ k.
Together with the previous results on approximate from Lemma 14, we obtain the following result regarding approximate F zeros.Proof.By Lemma 14, we have ∃n ≤ Φ(k, a, b, M, L, τ, c u ) (x n ∈ Ω ′ k ) .By Lemma 28, this implies F (x n ) ≤ 1 k+1 .As the function F may be perceived to be quite artificial, it is of interest to see equivalent characterizations for the existence of a modulus of regularity for F .The following easy consequence of Lemma 28 gives a result in this vein.
Using this lemma, we obtain the following rate of convergence for the sequence (x n ) n∈N generated by Algorithm 2 under the assumption of a modulus of regularity for F .Proof.The proof is an application of Theorem 4.1, (i) of [13].Fejér monotonicity of (x n ) n∈N w.r.t Ω is already contained in (a) of Theorem 3. By Lemma 30, (2), φ is a modulus of regularity for F .Now, let ε > 0.Then, by Lemma 29, we obtain As ε was arbitrary, Theorem 4.1 of [13] applies as Ω is closed and we obtain x = lim n→∞ x n ∈ Ω with the desired rate of convergence.We obtain x ∈ EP(Fix(T ), f ) as in [5], p. 258.