Ergodic, primal convergence in dual subgradient schemes for convex programming, II: the case of inconsistent primal problems

Consider the utilization of a Lagrangian dual method which is convergent for consistent convex optimization problems. When it is used to solve an infeasible optimization problem, its inconsistency will then manifest itself through the divergence of the sequence of dual iterates. Will then the sequence of primal subproblem solutions still yield relevant information regarding the primal program? We answer this question in the affirmative for a convex program and an associated subgradient algorithm for its Lagrange dual. We show that the primal–dual pair of programs corresponding to an associated homogeneous dual function is in turn associated with a saddle-point problem, in which—in the inconsistent case—the primal part amounts to finding a solution in the primal space such that the Euclidean norm of the infeasibility in the relaxed constraints is minimized; the dual part amounts to identifying a feasible steepest ascent direction for the Lagrangian dual function. We present convergence results for a conditional ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-subgradient optimization algorithm applied to the Lagrangian dual problem, and the construction of an ergodic sequence of primal subproblem solutions; this composite algorithm yields convergence of the primal–dual sequence to the set of saddle-points of the associated homogeneous Lagrangian function; for linear programs, convergence to the subset in which the primal objective is at minimum is also achieved.


Introduction and motivation
Lagrangian relaxation-together with a search in the Lagrangian dual space of multipliers-has a long history as a popular means to attack complex mathematical optimization problems.Lagrangian relaxation is especially popular in cases when an inherent problem structure is present, such that a suitable relaxation is much easier to solve than the original problem, and where the result from optimizing the multipliers is acceptable even if the final primal solution is only near-feasible; examples are found, among others, among economics and logistics applications where the relaxed constraints are associated with capacity or budget constraints.Lagrangian relaxation is also frequently applied in combinatorial optimization, as a starting phase or as a heuristic.In the history of mathematical optimization, several classical works are built on the use of Lagrangian relaxation; see, e.g., the work by Held and Karp [15,16] on the traveling salesperson problem.For text book coverage and tutorials on Lagrangian relaxation, see, e.g., [2,3,28,32] and [10,11,13,29], respectively.
The convergence theory of Lagrangian relaxation is quite well developed for the cases in which the original, primal, problem has an optimal solution, or at least exhibits feasible solutions, even in cases when strong duality fails to hold.For the case when strong duality holds, several techniques have been developed in order to "translate" a dual optimal solution to a primal optimal one; this translation is supported by a consistent primal-dual system of equations and inequalities, sometimes referred to as characterizations of "saddle-point optimality" (cf.[28,Sect. 1.3.3]and [2, Thm.6.2.5]).
In linear programming, decomposition-coordination techniques, like Dantzig-Wolfe decomposition and its dual equivalent Benders decomposition, ensure the convergence to a primal-dual optimal solution.In convex programming, ascent methods for the Lagrange dual, such as (proximal) bundle methods, can be equipped with the construction of an additional, convergent sequence of primal points which are provided by the optimality certificate of the ascent direction-finding quadratic subproblems (e.g., [19,20,33]).When utilizing classical subgradient methods from the "Russian school" (e.g., [9,35,36,40])-in which one subgradient, calculated at the current dual iterate, is utilized as a search direction and combined with a pre-defined step length rule-a convergent sequence of primal vectors can also be constructed as a convex combination of primal subproblem solutions (see, e.g., [40, pp. 116-118] and [39] for linear programs, and [1,14,[25][26][27] for general convex programs).In the case where strong duality does not hold, the "translation" from an optimal Lagrangian dual solution to a primal optimal solution is much more involved, since the primal and dual optimal solutions may then violate both Lagrangian optimality and any complementarity conditions (cf.[22]).
What is hitherto an unsufficiently explored question is to what the sequence of above-mentioned simple convex combinations of primal subproblem solutions converges-if at all-when the original primal problem is inconsistent, in which case the Slater constraint qualification (CQ) assumed in [25] cannot hold.The purpose of this article is to investigate this issue for convex programming; for the special case of linear programming quite strong results are obtained.

Preliminaries and main result
Consider the problem to minimize f (x), (2.1a) subject to g i (x) ≤ 0, i ∈ I, (2.1b) x ∈ X, (2.1c) where the set X ⊂ R n is nonempty, convex and compact, I = {1, . . ., m}, and the functions f : R n → R and g i : R n → R, i ∈ I, are convex and, thus, continuous; these properties are assumed to hold throughout the article.The notation g(x) is in the sequel used for the vector [g i (x)] i∈I .Moreover, whenever f and g i , i ∈ I, are affine functions, and X is polyhedral, we denote the program (2.1) as a linear program.The corresponding Lagrange function L f : R n × R m → R with respect to the relaxation of the constraints (2.1b) is defined by The Lagrangian dual objective function θ f : R m → R is the concave function defined by With no further assumptions on the properties of the program (2.1), the minimization problem defined in (2.2) can be solved in finite time only to ε-optimality ([2, Ch. 7]).
For any approximation error ε ≥ 0, an ε-optimal solution, x ε f (u), to the minimization problem in (2.2) at u ∈ R m is denoted an ε-optimal Lagrangian subproblem solution, and fulfils the inclusion 3) The Lagrange dual to the program (2.1) with respect to the relaxation of the constraints (2.1b) is the convex program to find supremum (2.4)

Primal and dual convergence in the case of consistency
We first recall some convergence results for dual subgradient methods for the case when the feasible set of the program (2.1) is nonempty, while assuming a constraint qualification (e.g., Slater CQ, which for the problem (2.1) is stated as { x ∈ X | g(x) < 0 m } = ∅; see [25]).Denote the optimal objective value of the program (2.1) by θ * f > −∞, and its solution set by By the continuity of f and g i , i ∈ I, and the compactness of X , we have, according to [37,Thm. 30.4(g)], that the primal optimal objective value equals the value obtained when solving the Lagrangian dual program, i.e., that We denote the solution set to the Lagrange dual as the nonemptiness of which can be assured by presuming, e.g., Slater CQ or that the program (2.1) is linearly constrained ([2, Sect.5]).
With respect to a convex set U ⊆ R m and an ε ≥ 0, the conditional ε-subdifferential The normal cone of a convex set U ⊆ R m at u ∈ U is defined by We consider solving the Lagrangian dual program (2.6) by the conditional εsubgradient optimization algorithm1 ([26, Sect.2.1]).It starts at some initial vector u 0 ∈ R m + and computes iterates u t according to (2.11) where [ • ] + denotes the Euclidean projection onto the nonnegative orthant, the sequence {η t } obeys the inclusion η t ∈ N R m + (u t ) for all t, and α t > 0 is the step length chosen and t ≥ 0 denotes the approximation error at iteration t.To simplify the presentation, the cumulative step lengths Λ t are defined by respectively, where • denotes the Euclidean norm.For a sequence {x t } ⊂ R n and a vector y ∈ R n , the notation x t → y means that the sequence { x t } converges to the point y.
The following proposition specializes [26,Thm. 8] to the setting at hand.
At each iteration of the method (2.11) an t -optimal Lagrangian subproblem solution x t f (u t ) is computed; an ergodic (that is, averaged) sequence { x t f } is then defined by The following result is a special case of that in [26,Thm. 19].

123
Proposition 2.2 (Convergence to the primal optimal set) Let the method (2.11), (2.13) be applied to the program (2.6), the sequence { η t } be bounded, and the sequence { x t f } be defined by (2.14).
Proof As in the proof of Proposition 2.1, the condition U * f = ∅ ensures the existence of a saddle-point of L f .The compactness assumptions (on the dual solution set U * f ) in [26,Thm. 20]

Outline and main result
Section 2.1 considers the consistent case of the program (2.1) and presents convergence results for the primal and dual sequences ({ x t f } and { u t }, respectively) obtained when the method (2.11) is applied to its Lagrange dual.In the remainder of the article we will analyze the properties of these two sequences when the primal problem (2.1) is inconsistent, i.e., when { x ∈ X g(x) ≤ 0 m } = ∅, in which case the Slater CQ cannot be assumed.
The remainder of the article is structured as follows.In Sect. 3 we show that, during the course of the iterative scheme (2.11) for solving the program (2.4), the sequence { u t } of dual iterates diverges when employing step lengths (α t ) and approximation errors ( t ) fulfilling (2.13).As the sequence diverges, i.e., as u t → ∞, the term f (x) of the Lagrange function L f (x, u t ) loses significance in the definition (2.3) of the t -optimal subproblem solution, x t f (u t ) ∈ X t f (u t ).In Sect. 4 we characterize the homogeneous dual function, which is the Lagrangian dual function obtained when f ≡ 0. We show that there is a primal-dual problem associated with the homogeneous dual in which the primal part amounts to finding the set X * 0 of points in X with minimum infeasibility with respect to the relaxed constraints, i.e., X * 0 := arg min x∈X g(x) + . (2.15) In Sect. 5 we show that a sequence of scaled dual iterates will in fact converge to the optimal set of the homogeneous dual problem.Section 6.1 presents the corresponding primal convergence results, i.e., that the sequence of primal iterates { x t f } converges to the set X * 0 .To simplify notation we redefine the primal optimal set X * f (defined in 2.5) as the optimal set for the so-called selection problem, i.e., Note that, when x ∈ X g(x) ≤ 0 m = ∅, the equivalence X * 0 = x ∈ X g(x) ≤ 0 m holds, then implying that X * f equals the optimal set for the program (2.1).Here lies the main point of departure when differentiating the convex program (2.1) from its linear programming special case (i.e., when f and g i , i ∈ I, are affine functions and X is a polyhedral set), in which the selection problem (2.16) is a linear program (possessing Lagrange multipliers).For general convex programming, however, this may not be the case.The stronger convergence results achieved for the linear programming case are presented in Sect.6.2.
Our analysis leads to the main contribution of this article, which is then formulated as the following generalization of Proposition 2.2 to hold also for inconsistent convex programs.
Within the context of mathematical optimization, studies of characterizations of inconsistent systems are of course as old as the history of theorems of the alternative and the associated theory of optimality in linear and nonlinear optimization.
An inconsistent system of linear inequalities is studied in [7], which establishes a primal-dual theory on a strictly convex quadratic least-correction problem in which the left-hand sides of the linear inequalities possess negative slacks, the sum of squares of which is minimized.The article [6] is related to ours, in that it studies the behaviour of an augmented Lagrangian algorithm applied to a convex quadratic optimization problem with an inconsistent system of linear inequalities.The algorithm convergeswith a linear speed-to a primal vector that minimizes the original objective function over a set defined by minimally shifted constraints through negative slacks.
For optimization problems involving (twice) differentiable, possibly nonconvex functions the methods in [5,31] are able to detect infeasibility and find solutions in which the norm of the infeasibility is at minimum.Filter methods (see [12] and references therein, and-for the nondifferentiable case- [18,34]) employ feasibility restoration steps to reduce the value of a constraint violation function.In [4] dynamical steering of exact penalty methods toward feasibility and optimality is reviewed and analysed.While these references consider infeasibility within traditional nonlinear programming (inspired) algorithms, our work is devoted to the study of corresponding issues within subgradient based methods applied to Lagrange duals.
As stated in Theorem 2.3(a), for the general convex case we can only establish convergence to the set of minimum infeasibility points, while for the case of linear programs we also show-in Theorem 2.3(b)-that all primal limit points minimize the objective over the set of minimum infeasiblity points.
Then, in Sect.7 we make some further remarks and present an illustrative example.Finally, in Sect.8, we draw conclusions and suggest further research.

Dual divergence in the case of inconsistency
Consider the inconsistent program (2.1) and its Lagrangian dual function θ f defined in (2.2).We begin by establishing that the emptiness of the set { x ∈ X g(x) ≤ 0 m } implies the existence of a nonempty cone C ⊂ R m + , such that the value of the function θ f increases in every direction v ∈ C. Consequently, for this case the Lagrangian dual solution set (defined in (2.7) for the consistent case) fulfils x ∈ X } is also convex.Since g is continuous the set K is closed, and from its definition follows that { x ∈ X | g(x) ≤ 0 m } = ∅ if and only if K can be separated strictly from 0 m .
Assume that C = ∅ and let w ∈ C. The inequality w T g(x) > 0 then holds for all x ∈ X .Hence, for each x ∈ X , g i (x) > 0 must hold for at least one i ∈ I, implying that 0 m / ∈ K .Assume then that 0 m / ∈ K .Then there exist a w ∈ R m and a δ ∈ R such that w T z ≥ δ > 0 = w T 0 m holds for all z ∈ K .By definition of the set K , letting e i ∈ R m denote the ith unit vector, it follows that g(x) + e i γ ∈ K for all x ∈ X and all γ ∈ R + .Hence, w T g(x) + w i γ > 0 holds for all x ∈ X and γ ∈ R + .Letting γ → ∞ yields that w i ≥ 0 for all i ∈ I. Choosing γ = 0 yields that min x∈X { w T g(x) } > 0. It follows that w ∈ C = ∅.The proposition follows.

Proposition 3.2 (The cone of ascent directions of the dual function
Proof The proposition follows by the definition (2.2) of the function θ f , and the relations , where the strict inequality follows from the definition of C in Proposition 3.1.
We next utilize the fact that the cone C is independent of the objective function f to show that in the inconsistent case the sequence {u t } diverges.Proposition 3.3 (Divergence of the dual sequence) Let the sequence { u t } be generated by the method (2.11), (2.13a) applied to the program (2.4), the sequence { η t } be bounded, and the sequence Proof Let w ∈ C = ∅ and define δ := min x∈X { w T g(x) } > 0 and β t := w T u t for all t.Then where the first inequality holds since w ∈ R m + , the second since η t ∈ R m − , and the third since w T g(x) ≥ δ for all x ∈ X .From (2.13a) then follows that β t → ∞, and hence u t → ∞.

A homogeneous dual and an associated saddle-point problem in the case of inconsistency
We next use the result of Proposition 3.3 to establish that for large dual variable values the dual objective function can be closely approximated by an associated homogeneous dual function.Associated with this homogeneous dual is a saddle-point problem, in which the primal part amounts to finding the points in the primal space having minimum total infeasibility in the relaxed constraints.
Consider the Lagrange function associated with the program (2.1), i.e., As the value of u increases, the term f (x) in the computation of the subproblem solution x f (u) in (2.3) loses significance.Hence, according to Proposition 3.3, for large values of t the method (2.11), (2.13a) will tackle an approximation of the homogeneous dual problem to maximize θ 0 over R m + .In what follows, unless otherwise stated, we will assume that { x ∈ X | g(x) ≤ 0 m } = ∅.

The homogeneous version of the Lagrange dual
Consider the problem to find an x ∈ X such that g(x) ≤ 0 m .To this feasibility problem we associate the homogeneous Lagrangian dual problem to find supremum where θ 0 : R m → R is defined by (2.2), for f ≡ 0 (i.e., θ 0 (u) = min x∈X { u T g(x) }).A corresponding (optimal) subproblem solution x 0 0 (u) and the subdifferential ∂ R m 0 θ 0 (u) are analogously defined.
According to its definition, the function θ 0 is superlinear (e.g., [17, Proposition V:1.1.3]),meaning that its hypograph is a nonempty and convex cone in R m+1 , and implying that θ 0 (δu) = δθ 0 (u) holds for all (δ, u) ∈ R + × R m .The definition of the directional derivative, θ 0 (u; d), of θ 0 at u in the direction of d (e.g., [17, Rem.I:4.1.4]),then yields that θ 0 (0 m ; d) = θ 0 (d) holds for all d ∈ R m .The program (4.1) can thus be interpreted as the search for a steepest feasible ascent direction of θ 0 .Such a search requires that the argument of θ 0 is restricted.Hence, we define Defining V using the unit ball is somewhat arbitrary.Owing to the homogeneity of θ 0 , the unit ball-viewed as the convex hull of the projective space-is, however, a natural choice.As shown in Sect.4.2, for this choice the dual mapping yields a singleton set.

An associated saddle-point problem
From the Definition (2.2) of the function θ f and the Definition (4.2) of θ where the homogeneous Lagrange function By the definition of the set X * f in (2.5), for the case when the program (2.1) is consistent, X * 0 = { x ∈ X | g(x) ≤ 0 m } = ∅ denotes its feasible set.For the inconsistent case, whenever x ∈ X it holds that g i (x) > 0 for at least one i ∈ I, implying that [g(x)] + > 0.

Lemma 4.2 (A homogeneous dual mapping)
where the first inequality holds since v ≥ 0 m and g(x) ≤ [g(x)] + , the second follows from the Cauchy-Schwartz inequality, and the third holds since v ≤ 1.Since x ∈ X is arbitrary and v ∈ V , it follows that v ∈ V (x).That the set V (x) is a singleton follows from the fact that equality holds in each of the inequalities in (4.4)only when both v = 1 holds and the vectors v and [g(x)] + are parallel, in which case v = v.The lemma follows.
Since for all x ∈ X and { v } = V (x) the equality v T g(x) = [g(x)] + holds, the right-hand side of (4.3) may be interpreted as the minimal total deviation from feasibility in the constraints g(x) ≤ 0 m over x ∈ X , that is, The set X * 0 × V * 0 of saddle-points of L 0 on X × V is thus given by Note that this definition of X * 0 agrees with (2.5) and is valid regardless of the consistency or inconsistency of the program (2.1).Since V * 0 is a singleton, we define the vector v * by Note the equivalence

5 Convergence to the homogeneous dual optimal set in the inconsistent case
We have characterized the Cartesian product set X * 0 × { v * } of saddle-points of the homogeneous Lagrange function L 0 over X × V .Next, we will show that a sequence of simply scaled dual iterates obtained from the subgradient scheme converges to the point v * .
To simplify the notation in the analysis to follow, we let and where t ≥ 0, t = 0, 1, . ... In tandem with the iterations of the conditional tsubgradient algorithm (2.11) we construct the scaled dual iterate We next show that the conditional (with respect to R m + ) t -subgradients g(x t f (u t ))−η t , used in the algorithm (2.11), are also conditional (with respect to V ) ε t -subgradients of the homogeneous dual function θ 0 at the scaled iterate v t , with Lemma 5.1 (Conditional ε t -subgradients of the homogeneous dual function) Let the sequence { u t } be generated by the method (2.11), (2.13a) applied to the program (2.4), let the sequence { η t } be bounded, and the sequences { ε t } and { v t } be defined by (5.1).Then, Proof From the definitions (2.2) and (2.3) (for ε = 0) follow that the relations hold.The combination of these relations with (2.8)-(2.9)(for ε = t ), (2.10)-(2.11),and the definition of ε in (5.1a) yield that the inequalities . By (2.8) and (5.1), the superlinearity of the function θ 0 , and since and the result follows.
The following two lemmas are needed for the analysis to follow.
Lemma 5.2 (Normalized divergent series step lengths form a divergent series]) Aggregating these inequalities for r = 1, . . ., t then yields the inequality and the lemma follows.

Lemma 5.3 (Projection onto
Proof The result follows by applying the optimality conditions (e.g., [2, Thm.4.2.13]) to the convex and differentiable optimization problems defined by the projection operator in (2.12) for S = R m + and S = V , respectively.
We now establish the convergence characteristics of the scaled dual sequence {v t } defined in (5.1b) to the dual part of the set of saddle-points for L 0 .
Theorem 5.4 (Convergence of a scaled dual sequence) Let the sequence { u t } be generated by the method (2.11), (2.13a), (2.13c) applied to the program (2.4), let the sequence { η t } be bounded, let the sequence { v t } be defined by (5.1b), and let the optimal solution to the homogeneous dual, v * , be defined by (4.7).
f (u t )) − η t for all t ≥ 0. From the definition (2.11) and the triangle inequality it follows that Since X is compact, g is continuous, and the sequence { η r } is bounded, it holds that Γ := L 0 + sup r ≥0 γ r < ∞.From the definition (5.1a) of L t then follows that  The main idea utilized in the proof of Theorem 5.4 is that the scaled sequence { v t } obtained from the subgradient method defines a conditional (with respect to V ) ε t -subgradient algorithm, as applied to the homogeneous Lagrange dual (4.2).Hence, by tackling the Lagrange dual (2.4) by a subgradient method, we receive-in the case of an inconsistent primal problem-a solution to its homogeneous version (4.1).
Next follow two technical corollaries, to be used in the primal convergence analysis.
Corollary 5.5 (Convergence of a normalized dual sequence) Under the assumptions of Theorem 5.4 it holds that u t −1 u t → v * as t → ∞.

Corollary 5.6 (Convergence to the optimal value of the homogeneous dual) Under the assumptions of Theorem 5.4 it holds that
Proof For each t ≥ 0 and each x ∈ X , let ρ t (x) and X is compact, it follows that ρ t (x) → 0 for all x ∈ X .Using the definition (2.2) and the equivalence (4.3), and by separating the minimization over x ∈ X , it follows that On the other hand, since X 0 (v * ) ⊆ X , and (v * ) T g(x) = θ V * 0 0 for any x ∈ X 0 (v * ), we have that By the left-most equality in (5.3) and (2.3) the inequality holds.The corollary follows.

Primal convergence in the case of inconsistency
We apply the conditional ε-subgradient scheme (2.11) to the Lagrange dual of the program (2.1).In each iteration we construct an ergodic primal iterate x t f , according to the scheme defined in (2.14).We here aim at analyzing the convergence of the ergodic sequence { x t f } when the primal program (2.1) is inconsistent.In Sect.6.1 we establish convergence of the ergodic sequence to the feasible set of the selection problem (2.16) for the case of convex programming [i.e., Theorem 2.3(a)].In Sect.6.2 we specialize this result to the case of linear programming, in which case the stronger result of convergence to optimal solutions to the selection problem (2.16) is obtained [i.e., Theorem 2.3(b)].
The set of indices of the strictly positive elements of the vector v is denoted by

Convergence results for general convex programming
To simplify the notation, we let the ergodic sequence, { γ t }, of conditional εsubgradients be defined by for some choices of step lengths α s > 0 and approximation errors s ≥ 0, s = 0, 1, . . ., t − 1.We will also need the following technical lemma (see [21, p. 35] for its proof).
We are now set up to establish the first part of the main result of this article.

123
Proof of Theorem 2.3(a) The case when Consider the case when { x ∈ X | g(x) ≤ 0 m } = ∅.We will show that the ergodic sequence { x t f } converges to the set X * 0 = arg min x∈X { [g(x)] + }.Since X is convex and compact, any limit point Then, by the continuity of g and • , and the equivalence in (4.5), the relations 2 holds in the iteration formula (2.11).Hence, for N ≥ 0 large enough, it holds that By rearranging this equation and dividing the resulting terms by u t , it follows that since, by Proposition 3.3, { u t } → ∞ and, by Corollary 5.5, , and e = θ By combining this result with (6.2) it then follows that By the convexity of the functions g i , for each i ∈ I + (v * ) it then holds that lim sup (6.3) From (5.2) follows that the inequality u t ≤ u 0 + Λ t Γ holds for every t ≥ 1, where Then, for each j ∈ I\I + (v * ) and all t ≥ N , the relations hold, where the first inequality follows from the convexity of g j , the second from (2.11) and the fact that η t j ≤ 0, the equality by telescoping, and the final inequality by (6.4).As t → ∞, by Proposition 3.3, u t → ∞, and by Corollary 5.5, From (6.3) and (6.5) we then conclude (6.1).The theorem follows.123

Properties of and convergence results for the linear programming case
We now analyze the special case when the program (2.1) is a linear program, i.e., when the program can be formulated as the problem to minimize c T x, (6.6a) subject to Ax ≥ b, (6.6b) x ∈ X, ( where c ∈ R n , A ∈ R m×n , b ∈ R m , and X ⊂ R n is a nonempty and bounded polyhedron.The aim of this subsection is to provide a proof of Theorem 2.3(b), stating that the ergodic sequence { x t f } [defined in (2.14)] converges to the optimal set of the selection problem (2.16). 2 For this linear case, the Lagrangian subproblems in (2.2) can be solved exactly in finite time; hereafter we thus let t := 0, t ≥ 0.
Let A i denote the ith row of the matrix A and let x 0 ∈ X 0 (v * ).The selection problem (2.16) can then be expressed as the linear program to minimize c T x, (6.7a) Using that [b i −A i x 0 ] + = 0 for all i ∈ I\I + (v * ), we define the (projected) Lagrangian dual function, θ + c : R m → R, to the program (6.7) with respect to the relaxation of the constraints (6.7b) and (6.7c), as the corresponding Lagrange dual is then given by the problem to maximize We will show that when applying the conditional ε-subgradient optimization algorithm (2.11) to the Lagrange dual (2.4) of the inconsistent linear program (6.6) with respect to the relaxation of the constraints (6.6b), a subgradient scheme is obtained for 2 For the linear program (6.6), the selection problem (cf.2.16) is defined as min x∈X * 0 { c T x }, where the set X * 0 is the subset of X possessing minimum infeasibility in the relaxed constraints (6.6b), i.e., in mathematical notation, the Lagrange dual (6.10) of the selection problem (6.7), which is a consistent linear program.We will then deduce that the ergodic sequence { x t f } converges to the set of optimal solutions to (6.7).But first we introduce some definitions needed for the analysis to follow.
A decomposition of any vector u ∈ R m into two vectors being parallel and orthogonal, respectively, to v * , is given by the maps β : R m → R and ω : R m → R m , according to Here, β(u) equals the length of the projection of u ∈ R m onto v * , while ω(u) equals the projection of u onto the orthogonal complement to v * .Both maps β and ω define projections onto linear subspaces.

Property 6.2 (Properties of maps)
The following properties of the maps β and ω hold.
we can rewrite the Lagrangian dual function, defined in (6.8), as The following lemma follows from Property 6.2(a) and establishes that the value of θ + c at u ∈ R m depends solely on the component ω(u) of u that is perpendicular to v * .Lemma 6.3 (A characterization of a projected dual function) For any u ∈ R m , the equivalence θ + c (ω(u)) = θ + c (u) holds.
Given constants δ > 0, p > 1, and q > (p − 1) −1 p, we define the set of vectors possessing a large enough norm and a small enough angle with the direction of v * .The following lemma ensures that after a finite number N of iterations, all of the dual iterates u t are contained in the set U δ pq ; it follows from Proposition 3.3 and Corollary 5.5.Lemma 6.4 (The dual iterates are eventually in the set U δ pq ) Let the sequence { u t } be generated by the method (2.11), (2.13a) applied to the program (2.4), let the sequence { η t } be bounded, and let { t } = { 0 }.Then, for any constants δ > 0, p > 1, and q > (p − 1) −1 p there is an N ≥ 0 such that u t ∈ U δ pq for all t ≥ N .Propositions 6.5-6.7 below demonstrate that, for p > 1, q > (p − 1) −1 p, and a large enough value of δ > 0, the condition u ∈ U δ pq implies certain relations between the function values θ f (u) and θ + c (u), as well as between their respective conditional subdifferentials.First we establish the inclusion X 0 f (u) ⊆ X 0 (v * ) whenever u ∈ U δ pq .We then show that the value θ f (u) of the Lagrangian dual function equals β(u)θ Proposition 6.5 (Inclusion of the solution set) Let p > 1 and q > (p − 1) −1 p.There exists a constant δ > 0 such that X 0 f (u) ⊆ X 0 (v * ) holds for all u ∈ U δ pq .
Proof For the case when X 0 (v * ) = X the proposition is immediate.Consider the case when X 0 (v * ) ⊂ X .Denote by P X , P X 0 f (u) , and P X 0 (v * ) the (finite) sets of extreme points of X , X 0 f (u), and X 0 (v * ), respectively.From (2.3) and [32, Ch.I.4, Def.3.1] follow that X 0 f (u) and X 0 (v * ) are faces of X , implying the relations P X 0 f (u) ⊆ P X , u ∈ R m , and P X 0 (v * ) ⊆ P X ⊂ X .Hence, it suffices to show that P X 0 f (u) ⊆ P X 0 (v * ) holds whenever u ∈ U δ pq .Let x * 0 ∈ P X 0 (v * ) and x ∈ P X \P X 0 (v * ) be arbitrary.Since the set P X is finite there exists a δ > 0 such that the relations hold.For any u ∈ U δ pq it then follows that where (6.14a) follows from (6.11), (6.14b) from (6.13) and Cauchy-Schwartz inequality, and (6.14c) from the definition (6.12) and the assumptions made.It follows that x / ∈ X 0 f (u), which then implies that X 0 f (u) ⊆ X 0 (v * ).The proposition follows.
In the analysis to follow we choose δ > 0 such that the inclusion in Proposition 6.5 holds.Proposition 6.6 (A decomposition of the dual function) Let p > 1 and q > (p − 1) −1 p.For every u ∈ U δ pq the identity θ f (u) = β(u)θ Proof The result follows since Proposition 6.5, (6.11), (4.3), and Property 6.2(a) yield the equalities We next establish that if γ is a conditional (with respect to R m + ) subgradient to θ f at u ∈ R m + , where u has a sufficiently large norm and a sufficiently small component ω(u) (being orthogonal to v * ), then ω(γ ) is a conditional [with respect to R(v * ); see (6.9)] subgradient of θ + c at ω(u) ∈ R(v * ).Proposition 6.7 (Conditional subgradients of a projected dual function) Let p > 1 and q > (p − 1) −1 p.For each u ∈ U δ pq and γ ∈ ∂ ) and choose ≥ 0 such that v + v * ∈ U δ pq .From Lemma 6.3, Proposition 6.6, and Property 6.2(c) follow that the equalities where (6.16a) follows from (2.8), (6.16b) from (6.11), and (6.16c) from Property 6.2.Combining the relations in (6.15) and (6.16) yields the inequality By inserting this into (6.17),123 and utilizing Property 6.2(a) and (c), and Lemma 6.3, we then receive the inequality The proposition then follows since v ∈ R(v * ).
We now define the sequences { ω t } and { ω t+ 1 2 } according to In each iteration, t, the intermediate iterate 2), so by Proposition 6.7 the vector ω(b − Ax 0 f (u t ) − η t ) is a conditional [with respect to R(v * ); see (6.9)] subgradient to the dual function (6.8) for large enough values of t.To show that the formula (6.18) actually defines a conditional [with respect to R(v * )] subgradient algorithm, we must also show that ω t+1 = proj(ω t+ 1 2 ; R(v * )).
We conclude that ω t = ω t+1 for all t ≥ N , and the proposition follows.
We summarize the development made in this section.Associated with the sequence { u t } ⊂ R m + of dual iterates resulting from a conditional subgradient scheme for maximizing θ f over R m + , we define in (6.18) a sequence { ω t } ≡ { ω(u t ) } ⊂ R(v * ) of iterates corresponding to the function θ + c .Proposition 6.7 shows that a conditional (with respect to R m + ) subgradient of θ f at u t ∈ R m + can be mapped to a conditional [with respect to R(v * )] subgradient of θ + c at ω t ∈ R(v * ).Then, Proposition 6.8 shows that for a large enough value of t the projection of u t+ 1 2 onto R m + in (2.11) has a one-to-one correspondence with the projection of ω t+ 1 2 onto the set R(v * ) [defined in (6.9)].
We are now prepared to establish the remaining part of the main result of this article.

Proof of Theorem 2.3(b)
The case when By Lemma 6.4 and Propositions 6.5-6.8,there is an integer N ≥ 0 such that the sequence { ω t } t≥N is the result of a conditional [with respect to R(v * )] subgradient method applied to the Lagrangian dual (6.10) of the linear program (6.7), which has a nonempty and bounded feasible set.It then follows from Proposition 2.2 that dist(x t f ; X * f ) → 0 as t → ∞.The theorem follows.
This proof of Theorem 2.3(b) contains no explicit reference to any particular choice of the weights defining the ergodic sequence (2.14).Although this article is written with reference to the formula (2.14), the result of Theorem 2.3(b) will be valid for any ergodic sequence of primal iterates that is convergent for consistent programs (see, e.g., [14]), assuming that the corresponding version of Theorem 5.4 can be established.

Illustrations and a separation result
We next present an example which numerically illustrates the main findings made in this article and, finally, a separation result following from our analysis.

Finite attainment of a separating hyper-surface
For the case when the set X is nonempty, closed, and convex, and all functions g i are affine, we have previously, in [24,Cor. 6.4] (see also Cor. 3.24 in the survey [27]), utilized ergodic sequences of subgradient optimization based underestimating affine functions to finitely detect inconsistency and identify a separating hyper-plane.We now return to our original setting of convex functions f and g i , i ∈ I, and a nonempty, convex and compact set X .Provided that the feasible set { x ∈ X | g(x) ≤ 0 m } is empty and that the sets X and Y := x ∈ R n g(x) ≤ 0 m are both nonempty, a hyper-surface that strongly separates the sets X and Y can be identified in a finite number of steps.Theorem 7.1 (Finite attainment of a separating hyper-surface) Let the sequence { u t } be generated by the method (2.11), (2.13a), (2.13c) applied to the program (2.4), the sequence { η t } be bounded, and the sequence { v t } be defined by (5.1b).Suppose that the sets X and Y are nonempty, but X ∩ Y = ∅.Then there exists an integer N ≥ 0 such that the hyper-surface H (v t ) := x ∈ R n g(x) T v t = 1 2 θ V * 0 0 strongly separates the sets X and Y for all t ≥ N .
Proof Since v t ≥ 0 m for all t, it holds that g(x) T v t ≤ 0 for all x ∈ Y and all t.From (2.4) and (2.2) it follows that g(x) T v t ≥ θ 0 (v t ) for all x ∈ X .From Proposition 3.1 it follows that the relations ∅ = C ⊂ R m + hold and also that the strict inequality θ 0 (w) > 0 holds for all w ∈ C. Since C ∩ V = ∅, (4.2) yields that θ Moreover, by Theorem 5.4, θ 0 (v t ) → θ V * 0 0 as t → ∞.Hence, there is an N ≥ 0 such that θ 0 (v t ) > 1  2 θ V * 0 0 for all t ≥ N , and it follows that g(x) T v t > 1 2 θ V * 0 0 > 0 for all x ∈ X and all t ≥ N .The theorem follows.

2 ,
. . . .For any closed and convex set S ⊆ R r and any point x ∈ R r , where r ≥ 1, the convex Euclidean distance function dist : R r → R and the Euclidean projection mapping proj : R r → S are defined as dist(x; S) := min y∈S { y − x } and proj(x; S) := arg min y∈S { y − x } ,(2.12) ) hold, since the Lagrange function, defined by L 0 (x, u) = u T g(x), is convex with respect to x, for u ∈ R m + , and linear with respect to u, and since the sets X and V ⊂ R m + are convex and compact (see, e.g., [17, Thms.VII:4.2.5 and VII:4.3.1]).