1 Introduction and motivation

Lagrangian relaxation—together with a search in the Lagrangian dual space of multipliers—has a long history as a popular means to attack complex mathematical optimization problems. Lagrangian relaxation is especially popular in cases when an inherent problem structure is present, such that a suitable relaxation is much easier to solve than the original problem, and where the result from optimizing the multipliers is acceptable even if the final primal solution is only near-feasible; examples are found, among others, among economics and logistics applications where the relaxed constraints are associated with capacity or budget constraints. Lagrangian relaxation is also frequently applied in combinatorial optimization, as a starting phase or as a heuristic. In the history of mathematical optimization, several classical works are built on the use of Lagrangian relaxation; see, e.g., the work by Held and Karp [15, 16] on the traveling salesperson problem. For text book coverage and tutorials on Lagrangian relaxation, see, e.g., [2, 3, 28, 32] and [10, 11, 13, 29], respectively.

The convergence theory of Lagrangian relaxation is quite well developed for the cases in which the original, primal, problem has an optimal solution, or at least exhibits feasible solutions, even in cases when strong duality fails to hold. For the case when strong duality holds, several techniques have been developed in order to “translate” a dual optimal solution to a primal optimal one; this translation is supported by a consistent primal–dual system of equations and inequalities, sometimes referred to as characterizations of “saddle-point optimality” (cf. [28, Sect. 1.3.3] and [2, Thm. 6.2.5]).

In linear programming, decomposition–coordination techniques, like Dantzig–Wolfe decomposition and its dual equivalent Benders decomposition, ensure the convergence to a primal–dual optimal solution. In convex programming, ascent methods for the Lagrange dual, such as (proximal) bundle methods, can be equipped with the construction of an additional, convergent sequence of primal points which are provided by the optimality certificate of the ascent direction-finding quadratic subproblems (e.g., [19, 20, 33]). When utilizing classical subgradient methods from the “Russian school” (e.g., [9, 35, 36, 40])—in which one subgradient, calculated at the current dual iterate, is utilized as a search direction and combined with a pre-defined step length rule—a convergent sequence of primal vectors can also be constructed as a convex combination of primal subproblem solutions (see, e.g., [40, pp. 116–118] and [39] for linear programs, and [1, 14, 2527] for general convex programs). In the case where strong duality does not hold, the “translation” from an optimal Lagrangian dual solution to a primal optimal solution is much more involved, since the primal and dual optimal solutions may then violate both Lagrangian optimality and any complementarity conditions (cf. [22]).

What is hitherto an unsufficiently explored question is to what the sequence of above-mentioned simple convex combinations of primal subproblem solutions converges—if at all—when the original primal problem is inconsistent, in which case the Slater constraint qualification (CQ) assumed in [25] cannot hold. The purpose of this article is to investigate this issue for convex programming; for the special case of linear programming quite strong results are obtained.

2 Preliminaries and main result

Consider the problem to

$$\begin{aligned}&\text {minimize} \;\;\;\; {f}(\mathbf{x}) , \end{aligned}$$
(2.1a)
$$\begin{aligned}&\mathrm{subject~to~~}\, g_i(\mathbf{x}) \le 0, \quad i \in \mathcal {I}, \end{aligned}$$
(2.1b)
$$\begin{aligned}&\mathbf{x}\in X, \end{aligned}$$
(2.1c)

where the set \(X \subset \mathbb {R}^n\) is nonempty, convex and compact, \(\mathcal {I} = \{1,\ldots ,m\}\), and the functions \(f: \mathbb {R}^n \mapsto \mathbb {R}\) and \({g}_i: \mathbb {R}^n \mapsto \mathbb {R}\), \(i\in \mathcal {I}\), are convex and, thus, continuous; these properties are assumed to hold throughout the article. The notation \(\mathbf{g}(\mathbf{x})\) is in the sequel used for the vector \(\left[ g_i(\mathbf{x})\right] _{i \in \mathcal {I}}\). Moreover, whenever f and \(g_i\), \(i\in \mathcal {I}\), are affine functions, and X is polyhedral, we denote the program (2.1) as a linear program. The corresponding Lagrange function \(\mathcal {L}_{f}:\mathbb {R}^n\times \mathbb {R}^m\mapsto \mathbb {R}\) with respect to the relaxation of the constraints (2.1b) is defined by \(\mathcal {L}_{f}(\mathbf{x},\mathbf{u}) := {f}(\mathbf{x}) + \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x})\), \((\mathbf{x}, \mathbf{u}) \in \mathbb {R}^n \times \mathbb {R}^m\). The Lagrangian dual objective function \(\theta _{f}:\mathbb {R}^m \mapsto \mathbb {R}\) is the concave function defined by

$$\begin{aligned} \theta _{f}(\mathbf{u}) := \min _{\mathbf{x}\in X} \; \mathcal {L}_{f}(\mathbf{x},\mathbf{u}), \quad \mathbf{u}\in \mathbb {R}^m. \end{aligned}$$
(2.2)

With no further assumptions on the properties of the program (2.1), the minimization problem defined in (2.2) can be solved in finite time only to \(\varepsilon \)-optimality ([2, Ch. 7]). For any approximation error \(\varepsilon \ge 0\), an \(\varepsilon \)-optimal solution, \(\mathbf{x}_{f}^{\varepsilon }(\mathbf{u})\), to the minimization problem in (2.2) at \(\mathbf{u}\in \mathbb {R}^m\) is denoted an \(\varepsilon \)-optimal Lagrangian subproblem solution, and fulfils the inclusion

$$\begin{aligned} \mathbf{x}_{f}^{\varepsilon }(\mathbf{u}) \in X_{f}^{\varepsilon }(\mathbf{u}) := \big \{\, {\mathbf{x}\in X} \,\big |\, f(\mathbf{x}) + \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}) \le \theta _{f}(\mathbf{u}) + \varepsilon \,\big \}. \end{aligned}$$
(2.3)

The Lagrange dual to the program (2.1) with respect to the relaxation of the constraints (2.1b) is the convex program to find

$$\begin{aligned} \underset{\mathbf{u}\in \mathbb {R}^m_+}{\text {supremum}}\; \theta _{f}(\mathbf{u}). \end{aligned}$$
(2.4)

2.1 Primal and dual convergence in the case of consistency

We first recall some convergence results for dual subgradient methods for the case when the feasible set of the program (2.1) is nonempty, while assuming a constraint qualification (e.g., Slater CQ, which for the problem (2.1) is stated as \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) < \mathbf{0}^m\} \ne \emptyset \); see [25]). Denote the optimal objective value of the program (2.1) by \(\theta _f^* > -\infty \), and its solution set by

$$\begin{aligned} X_{f}^{*} := \big \{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m, \, {f}(\mathbf{x}) \le \theta _f^* \,\big \}. \end{aligned}$$
(2.5)

By the continuity of f and \(g_i\), \(i\in \mathcal {I}\), and the compactness of X, we have, according to [37, Thm. 30.4(g)], that the primal optimal objective value equals the value obtained when solving the Lagrangian dual program, i.e., that

$$\begin{aligned} \displaystyle \theta _{f}^* := \sup _{\mathbf{u}\in \mathbb {R}^m_+} \theta _{f}(\mathbf{u}). \end{aligned}$$
(2.6)

We denote the solution set to the Lagrange dual as

$$\begin{aligned} U_{f}^* := \big \{\, {\mathbf{u}\in \mathbb {R}^m_+} \,\big |\, \theta _{f}(\mathbf{u}) \ge \theta _{f}^* \,\big \}, \end{aligned}$$
(2.7)

the nonemptiness of which can be assured by presuming, e.g., Slater CQ or that the program (2.1) is linearly constrained ([2, Sect. 5]).

With respect to a convex set \(U \subseteq \mathbb {R}^m\) and an \(\varepsilon \ge 0\), the conditional \(\varepsilon \)-subdifferential ([2, Thms.  3.2.5 and 6.3.7], [26, Sect. 2], [8, 23]) of the concave function \(\theta _{f}\) at \(\mathbf{u}\in U\) is given by

$$\begin{aligned} \partial ^U_{\varepsilon } \theta _{f}(\mathbf{u}) := \big \{\, \varvec{\gamma }\,\in \mathbb {R}^m \,\big |\, \theta _{f}(\mathbf{v}) \le \theta _{f}(\mathbf{u}) + \varvec{\gamma }\,^\mathrm{T}(\mathbf{v}- \mathbf{u}) + \varepsilon , \; \mathbf{v}\in U \,\big \}. \end{aligned}$$
(2.8)

This definition implies the inclusions \(\partial ^{\mathbb {R}^m}_{{\varepsilon }} \theta _{f}(\mathbf{u}) \subseteq \partial ^U_{{\varepsilon }} \theta _{f}(\mathbf{u}) \subseteq \partial ^U_{{\varepsilon '}} \theta _{f}(\mathbf{u})\) for all \(\mathbf{u}\in U\) and \(0 \le {\varepsilon } \le {\varepsilon '} < \infty \). Further, from (2.2), (2.3), and (2.8), follow the inclusion

$$\begin{aligned} \mathbf{g}(\mathbf{x}_{f}^{\varepsilon }(\mathbf{u})) \in \partial _{\varepsilon }^{\mathbb {R}^m} \theta _{f}(\mathbf{u}), \quad \mathbf{u}\in \mathbb {R}^m, \quad \varepsilon \ge 0. \end{aligned}$$
(2.9)

The normal cone of a convex set \(U \subseteq \mathbb {R}^m\) at \(\mathbf{u}\in U\) is defined by

$$\begin{aligned} N_U(\mathbf{u}) := \big \{\, \varvec{\eta }\in \mathbb {R}^m \,\big |\, \varvec{\eta }^\mathrm{T}(\mathbf{v}- \mathbf{u}) \le 0, \, \mathbf{v}\in U \,\big \}. \end{aligned}$$
(2.10)

This definition implies the equivalences \(N_{\mathbb {R}^m_+}(\mathbf{u}) = \big \{\, \varvec{\eta }\in \mathbb {R}^m_- \,\big |\, \varvec{\eta }^\mathrm{T}\mathbf{u}= 0 \,\big \}\) for all \(\mathbf{u}\in \mathbb {R}^m_+\), and \(\partial ^U_{\varepsilon } \theta _f(\mathbf{u}) = \partial _{\varepsilon }^{\mathbb {R}^m} \theta _{f}(\mathbf{u}) - N_U(\mathbf{u})\) for all \(\mathbf{u}\in U\) and \(\varepsilon \ge 0\). Hence, the inclusion \(\mathbf{g}(\mathbf{x}_{f}^{\varepsilon }(\mathbf{u})) - \varvec{\eta }\in \partial ^{\mathbb {R}^m_+}_{\varepsilon } \theta _f(\mathbf{u})\) holds whenever \(\varvec{\eta }\le \mathbf{0}^m\), \(\varvec{\eta }^\mathrm{T}\mathbf{u}= 0\), \(\mathbf{u}\ge \mathbf{0}^m\), and \(\varepsilon \ge 0\).

We consider solving the Lagrangian dual program (2.6) by the conditional \(\varepsilon \)-subgradient optimization algorithmFootnote 1 ([26, Sect. 2.1]). It starts at some initial vector \(\mathbf{u}^0\in \mathbb {R}^m_+\) and computes iterates \(\mathbf{u}^t\) according to

$$\begin{aligned} \mathbf{u}^{t+\frac{1}{2}} := \mathbf{u}^t + \alpha _t \big [\, \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \,\big ], \quad \mathbf{u}^{t+1} := \big [\, \mathbf{u}^{t+\frac{1}{2}} \,\big ]_+, \quad t=0,1,\ldots , \end{aligned}$$
(2.11)

where \([\, \cdot \,]_+\) denotes the Euclidean projection onto the nonnegative orthant, the sequence \(\{ \varvec{\eta }^t \}\) obeys the inclusion \(\varvec{\eta }^t\in N_{\mathbb {R}^m_+}(\mathbf{u}^t)\) for all t, and \(\alpha _t > 0\) is the step length chosen and \(\epsilon _t \ge 0\) denotes the approximation error at iteration t. To simplify the presentation, the cumulative step lengths \(\varLambda _t\) are defined by

$$\begin{aligned} \varLambda _t := { \sum _{s=0}^{t-1} \alpha _s,} \quad t=1,2,\ldots . \end{aligned}$$

For any closed and convex set \(S \subseteq \mathbb {R}^r\) and any point \(\mathbf{x}\in \mathbb {R}^r\), where \(r \ge 1\), the convex Euclidean distance function \(\mathrm{dist}: \mathbb {R}^r \mapsto \mathbb {R}\) and the Euclidean projection mapping \(\mathrm{proj}: \mathbb {R}^r \mapsto S\) are defined as

$$\begin{aligned} \mathrm{dist}(\mathbf{x}; S) := \min _{\mathbf{y}\in S} \, \left\{ \left\| \mathbf{y}- \mathbf{x}\right\| \right\} \quad \text{ and } \quad \mathrm{proj}(\mathbf{x}; S) := \mathop {\mathrm{arg\,min}}\limits _{\mathbf{y}\in S} \, \left\{ \left\| \mathbf{y}- \mathbf{x}\right\| \right\} , \end{aligned}$$
(2.12)

respectively, where \(\Vert \cdot \Vert \) denotes the Euclidean norm. For a sequence \(\{\mathbf{x}^t\}\subset \mathbb {R}^n\) and a vector \(\mathbf{y}\in \mathbb {R}^n\), the notation \(\mathbf{x}^t \rightarrow \mathbf{y}\) means that the sequence \(\{\, \mathbf{x}^t \,\}\) converges to the point \(\mathbf{y}\).

The following proposition specializes [26, Thm. 8] to the setting at hand.

Proposition 2.1

(Convergence to a dual optimal point) Let the method (2.11) be applied to the program (2.6), the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and the sequences of step lengths, \(\{\, \alpha _t \,\}\), and approximation errors, \(\{\, \epsilon _t \,\}\), fulfil the conditions

$$\begin{aligned}&\alpha _t > 0, \quad t=0,1,\ldots ; \quad \alpha _t \rightarrow 0 \quad \text{ and } \quad \varLambda _t \rightarrow \infty \quad \text{ as } \quad t \rightarrow \infty ; \end{aligned}$$
(2.13a)
$$\begin{aligned}&\sum _{s=0}^{\infty } \alpha _s^2 < \infty ; \end{aligned}$$
(2.13b)
$$\begin{aligned}&\epsilon _t \ge 0, \quad t=0,1,\ldots ; \quad \epsilon _t \rightarrow 0 \quad \text{ as } \quad t \rightarrow \infty ; \quad \sum _{s=0}^{\infty } \alpha _s \epsilon _s < \infty . \end{aligned}$$
(2.13c)

If \(U_f^* \ne \emptyset \), then \(\mathbf{u}^t \rightarrow \mathbf{u}^\infty \in U_{f}^*\).

Proof

Assume that \(U^*_f \ne \emptyset \). Then the primal problem (2.1) attains an optimal solution and the Lagrange function defined by \(\mathcal {L}_f({\mathbf{x}},{\mathbf{u}}) := f({\mathbf{x}}) + {\mathbf{u}}^\mathrm{T}{\mathbf{g}}(\mathbf{x})\) has a saddle-point over the set \(X\times \mathbb {R}^m_+\) ([2, Thm. 6.2.5]); the result then follows from [26, Thm. 8]. \(\square \)

At each iteration of the method (2.11) an \(\epsilon _t\)-optimal Lagrangian subproblem solution \(\mathbf{x}_{f}^{\epsilon _t}(\mathbf{u}^t)\) is computed; an ergodic (that is, averaged) sequence \(\{\, \overline{\mathbf{x}}_{f}^t \,\}\) is then defined by

$$\begin{aligned} \overline{\mathbf{x}}_{f}^t := {\frac{1}{\varLambda _t} \sum _{s=0}^{t-1} \alpha _s \mathbf{x}_{f}^{\epsilon _s}(\mathbf{u}^s),} \quad t=1,2,\ldots . \end{aligned}$$
(2.14)

The following result is a special case of that in [26, Thm. 19].

Proposition 2.2

(Convergence to the primal optimal set) Let the method (2.11), (2.13) be applied to the program (2.6), the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and the sequence \(\{\, \overline{\mathbf{x}}_{f}^t \,\}\) be defined by (2.14). If \(U_f^* \ne \emptyset \), then it holds that

$$\begin{aligned} \mathrm{dist}(\overline{\mathbf{x}}_f^t; X_{f}^*) \rightarrow 0 \quad \text {as} \quad t \rightarrow \infty . \end{aligned}$$

Proof

As in the proof of Proposition 2.1, the condition \(U^*_f \ne \emptyset \) ensures the existence of a saddle-point of \(\mathcal {L}_f\). The compactness assumptions (on the dual solution set \(U^*_f\)) in [26, Thm. 20] is here replaced by the conditions (2.13b)–(2.13c), under which the dual sequence \(\{\, \mathbf{u}^t \,\}\) is bounded (see, e.g., the proof of [26, Thm. 8]), i.e., for all \(t \ge 0\), \(\Vert \mathbf{u}^t\Vert \le M\) holds, where \(M > \Vert \mathbf{u}^* \Vert \) for all \(\mathbf{u}^* \in U^*_f\). By restricting the dual space to \(\mathbb {R}^m_+\cap \{\, \mathbf{u}\,|\, \Vert \mathbf{u}\Vert \le M \,\}\), the result in [26, Thm. 19] applies. \(\square \)

Proposition 2.2 states that whenever the dual solution set \(U_f^*\) is nonempty, the sequence \(\{\, \overline{\mathbf{x}}_f^t \,\}\) of primal iterates defined in (2.14) will converge towards the primal optimal set \(X_f^*\), provided that the sequence \(\{\, \alpha _t \,\}\) of step lengths and the sequence \(\{\, \epsilon _t \,\}\) are chosen such that the assumptions (2.13) are fulfilled. For convergence results when more general constructions of the ergodic sequence (2.14) are employed, see [14].

2.2 Outline and main result

Section 2.1 considers the consistent case of the program (2.1) and presents convergence results for the primal and dual sequences (\(\{\, \overline{\mathbf{x}}_f^t \,\}\) and \(\{\, \mathbf{u}^t \,\}\), respectively) obtained when the method (2.11) is applied to its Lagrange dual. In the remainder of the article we will analyze the properties of these two sequences when the primal problem (2.1) is inconsistent, i.e., when \(\{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \), in which case the Slater CQ cannot be assumed.

The remainder of the article is structured as follows. In Sect. 3 we show that, during the course of the iterative scheme (2.11) for solving the program (2.4), the sequence \(\{\, \mathbf{u}^t \,\}\) of dual iterates diverges when employing step lengths (\(\alpha _t\)) and approximation errors (\(\epsilon _t\)) fulfilling (2.13). As the sequence diverges, i.e., as \(\Vert \mathbf{u}^t\Vert \rightarrow \infty \), the term \(f(\mathbf{x})\) of the Lagrange function \(\mathcal {L}_f(\mathbf{x},\mathbf{u}^t)\) loses significance in the definition (2.3) of the \(\epsilon _t\)-optimal subproblem solution, \(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t) \in X_f^{\epsilon _t}(\mathbf{u}^t)\). In Sect. 4 we characterize the homogeneous dual function, which is the Lagrangian dual function obtained when \(f \equiv 0\). We show that there is a primal–dual problem associated with the homogeneous dual in which the primal part amounts to finding the set \(X_0^*\) of points in X with minimum infeasibility with respect to the relaxed constraints, i.e.,

$$\begin{aligned} X_0^* := \mathop {\mathrm{arg\,min}}\limits _{\mathbf{x}\in X} \, \left\{ \, \left\| \left[ \mathbf{g}(\mathbf{x}) \right] _+ \right\| \,\right\} . \end{aligned}$$
(2.15)

In Sect. 5 we show that a sequence of scaled dual iterates will in fact converge to the optimal set of the homogeneous dual problem.

Section 6.1 presents the corresponding primal convergence results, i.e., that the sequence of primal iterates \(\{\, \overline{\mathbf{x}}_f^t \,\}\) converges to the set \(X_0^*\). To simplify notation we redefine the primal optimal set \(X_{f}^*\) (defined in 2.5) as the optimal set for the so–called selection problem, i.e.,

$$\begin{aligned} X_{f}^* := \mathop {\mathrm{arg\,min}}\limits _{\mathbf{x}\in X_0^*} \, \left\{ \, {f}( \mathbf{x}) \,\right\} . \end{aligned}$$
(2.16)

Note that, when \(\big \{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\big \} \ne \emptyset \), the equivalence \(X_0^* = \big \{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x})\le \mathbf{0}^m\,\big \}\) holds, then implying that \(X_{f}^*\) equals the optimal set for the program (2.1). Here lies the main point of departure when differentiating the convex program (2.1) from its linear programming special case (i.e., when f and \(g_i\), \(i \in \mathcal {I}\), are affine functions and X is a polyhedral set), in which the selection problem (2.16) is a linear program (possessing Lagrange multipliers). For general convex programming, however, this may not be the case. The stronger convergence results achieved for the linear programming case are presented in Sect. 6.2.

Our analysis leads to the main contribution of this article, which is then formulated as the following generalization of Proposition 2.2 to hold also for inconsistent convex programs.

Theorem 2.3

(Main result) Apply the method (2.11), (2.13a), (2.13b) to the program (2.4), let the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and let the sequence \(\{\, \overline{\mathbf{x}}_{f}^t \,\}\) be defined by (2.14).

  1. (a)

    Let \(X_0^*\) be defined by (2.15). If (2.13c) holds, then

    $$\begin{aligned} \mathrm{dist}(\overline{\mathbf{x}}_f^t; X_0^*) \rightarrow 0 \quad \text {as} \quad t \rightarrow \infty . \end{aligned}$$
  2. (b)

    Let \(X_f^*\) be defined by (2.16) and \(\{\, \epsilon _t \,\} = \{\, 0 \,\}\). If the program (2.1) is linear, then

    $$\begin{aligned} \mathrm{dist}(\overline{\mathbf{x}}_f^t; X_f^*) \rightarrow 0 \quad \text {as} \quad t \rightarrow \infty . \end{aligned}$$

\(\square \)

Within the context of mathematical optimization, studies of characterizations of inconsistent systems are of course as old as the history of theorems of the alternative and the associated theory of optimality in linear and nonlinear optimization.

An inconsistent system of linear inequalities is studied in [7], which establishes a primal-dual theory on a strictly convex quadratic least–correction problem in which the left-hand sides of the linear inequalities possess negative slacks, the sum of squares of which is minimized. The article [6] is related to ours, in that it studies the behaviour of an augmented Lagrangian algorithm applied to a convex quadratic optimization problem with an inconsistent system of linear inequalities. The algorithm converges—with a linear speed—to a primal vector that minimizes the original objective function over a set defined by minimally shifted constraints through negative slacks.

For optimization problems involving (twice) differentiable, possibly nonconvex functions the methods in [5, 31] are able to detect infeasibility and find solutions in which the norm of the infeasibility is at minimum. Filter methods (see [12] and references therein, and—for the nondifferentiable case—[18, 34]) employ feasibility restoration steps to reduce the value of a constraint violation function. In [4] dynamical steering of exact penalty methods toward feasibility and optimality is reviewed and analysed. While these references consider infeasibility within traditional nonlinear programming (inspired) algorithms, our work is devoted to the study of corresponding issues within subgradient based methods applied to Lagrange duals.

As stated in Theorem 2.3(a), for the general convex case we can only establish convergence to the set of minimum infeasibility points, while for the case of linear programs we also show—in Theorem 2.3(b)—that all primal limit points minimize the objective over the set of minimum infeasiblity points.

Then, in Sect. 7 we make some further remarks and present an illustrative example. Finally, in Sect. 8, we draw conclusions and suggest further research.

3 Dual divergence in the case of inconsistency

Consider the inconsistent program (2.1) and its Lagrangian dual function \(\theta _f\) defined in (2.2). We begin by establishing that the emptiness of the set \(\{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x})\le \mathbf{0}^m\,\}\) implies the existence of a nonempty cone \(C \subset \mathbb {R}^m_+\), such that the value of the function \(\theta _{f}\) increases in every direction \(\mathbf{v}\in C\). Consequently, for this case the Lagrangian dual solution set (defined in (2.7) for the consistent case) fulfils \(U_{f}^* = \emptyset \), and \(\theta _f^* = \infty \) holds.

Proposition 3.1

(A theorem of the alternative) The set \(\big \{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\big \}\) is empty if and only if the cone \(C := \big \{\, \mathbf{w}\in \mathbb {R}^m_+ \,\big |\, \min _{\mathbf{x}\in X} \, \{ \mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x})\} > 0 \,\big \}\) is nonempty.

Proof

By convexity of X and \(\mathbf{g}\), the set \(\{\, (\mathbf{x},\mathbf{z}) \in X \times \mathbb {R}^m \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{z}\,\}\) is convex. Hence, its projection defined by \(K := \{\, \mathbf{z}\in \mathbb {R}^m \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{z}, \; \mathbf{x}\in X \,\} = \{\, \mathbf{g}(\mathbf{x}) + \mathbb {R}^m_+ \,|\, \mathbf{x}\in X \,\}\) is also convex. Since \(\mathbf{g}\) is continuous the set K is closed, and from its definition follows that \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \) if and only if K can be separated strictly from \(\mathbf{0}^m\).

Assume that \(C \not = \emptyset \) and let \(\mathbf{w}\in C\). The inequality \(\mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}) > 0\) then holds for all \(\mathbf{x}\in X\). Hence, for each \(\mathbf{x}\in X\), \(g_i(\mathbf{x}) > 0\) must hold for at least one \(i \in \mathcal {I}\), implying that \(\mathbf{0}^m\not \in K\).

Assume then that \(\mathbf{0}^m\not \in K\). Then there exist a \(\mathbf{w}\in \mathbb {R}^m\) and a \(\delta \in \mathbb {R}\) such that \(\mathbf{w}^\mathrm{T}\mathbf{z}\ge \delta > 0 = \mathbf{w}^\mathrm{T}\mathbf{0}^m\) holds for all \(\mathbf{z}\in K\). By definition of the set K, letting \(\mathbf{e}_i \in \mathbb {R}^m\) denote the ith unit vector, it follows that \(\mathbf{g}(\mathbf{x}) + \mathbf{e}_i \gamma \in K\) for all \(\mathbf{x}\in X\) and all \(\gamma \in \mathbb {R}_+\). Hence, \(\mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}) + w_i \gamma > 0\) holds for all \(\mathbf{x}\in X\) and \(\gamma \in \mathbb {R}_+\). Letting \(\gamma \rightarrow \infty \) yields that \(w_i \ge 0\) for all \(i \in \mathcal {I}\). Choosing \(\gamma = 0\) yields that \(\min _{\mathbf{x}\in X} \{\, \mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\} > 0\). It follows that \(\mathbf{w}\in C \not = \emptyset \). The proposition follows. \(\square \)

Proposition 3.2

(The cone of ascent directions of the dual function) If \(\big \{\, \mathbf{x}\in X \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\big \} = \emptyset \) then \(\theta _f(\mathbf{u}+ \mathbf{v}) > \theta _f(\mathbf{u})\) holds for all \(\mathbf{u}\in \mathbb {R}^m\) and all \(\mathbf{v}\in C\).

Proof

The proposition follows by the definition (2.2) of the function \(\theta _f\), and the relations \(\theta _f(\mathbf{u}+ \mathbf{v}) = \min _{\mathbf{x}\in X} \{ f(\mathbf{x}) + (\mathbf{u}+ \mathbf{v})^\mathrm{T}\mathbf{g}(\mathbf{x}) \} \ge \min _{\mathbf{x}\in X} \{ f(\mathbf{x}) + \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}) \} + \min _{\mathbf{x}\in X} \{ \mathbf{v}^\mathrm{T}\mathbf{g}(\mathbf{x}) \} > \theta _f(\mathbf{u})\), where the strict inequality follows from the definition of C in Proposition 3.1. \(\square \)

We next utilize the fact that the cone C is independent of the objective function f to show that in the inconsistent case the sequence \(\{ \mathbf{u}^t \}\) diverges.

Proposition 3.3

(Divergence of the dual sequence) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a) applied to the program (2.4), the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and the sequence \(\{\, \epsilon _t \,\} \subset \mathbb {R}_+\). If \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \) then \(\Vert \mathbf{u}^t \Vert \rightarrow \infty \).

Proof

Let \(\mathbf{w}\in C \ne \emptyset \) and define \(\delta := \min _{\mathbf{x}\in X} \{\, \mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\} > 0\) and \(\beta _t := \mathbf{w}^\mathrm{T}\mathbf{u}^t\) for all t. Then

$$\begin{aligned} \beta _{t+1}&= \mathbf{w}^\mathrm{T}\big (\, \mathbf{u}^t+\alpha _t \big [ \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \big ] \,\big )_+ \ge \mathbf{w}^\mathrm{T}\big ( \mathbf{u}^t+\alpha _t \big [ \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \big ] \big ) \\&\ge \beta _t + \alpha _t \mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) \; \ge \; \beta _t + \alpha _t \delta , \end{aligned}$$

where the first inequality holds since \(\mathbf{w}\in \mathbb {R}^m_+\), the second since \(\varvec{\eta }^t \in \mathbb {R}^m_-\), and the third since \(\mathbf{w}^\mathrm{T}\mathbf{g}(\mathbf{x}) \ge \delta \) for all \(\mathbf{x}\in X\). From (2.13a) then follows that \(\beta _t \rightarrow \infty \), and hence \(\Vert \mathbf{u}^t\Vert \rightarrow \infty \). \(\square \)

4 A homogeneous dual and an associated saddle-point problem in the case of inconsistency

We next use the result of Proposition 3.3 to establish that for large dual variable values the dual objective function can be closely approximated by an associated homogeneous dual function. Associated with this homogeneous dual is a saddle-point problem, in which the primal part amounts to finding the points in the primal space having minimum total infeasibility in the relaxed constraints.

Consider the Lagrange function associated with the program (2.1), i.e.,

$$\begin{aligned} \mathcal {L}_f(\mathbf{x},\mathbf{u}) = \Vert \mathbf{u}\Vert \left( \frac{f(\mathbf{x})}{\Vert \mathbf{u}\Vert } + \frac{\mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x})}{\Vert \mathbf{u}\Vert } \right) , \quad (\mathbf{x},\mathbf{u}) \in \mathbb {R}^n \times \{ \mathbb {R}^m {\setminus } \{ \mathbf{0}^m\} \}. \end{aligned}$$

As the value of \(\Vert \mathbf{u}\Vert \) increases, the term \(f( \mathbf{x})\) in the computation of the subproblem solution \(\mathbf{x}_f(\mathbf{u})\) in (2.3) loses significance. Hence, according to Proposition 3.3, for large values of t the method (2.11), (2.13a) will tackle an approximation of the homogeneous dual problem to maximize \(\theta _0\) over \(\mathbb {R}^m_+\).

In what follows, unless otherwise stated, we will assume that \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \).

4.1 The homogeneous version of the Lagrange dual

Consider the problem to find an \(\mathbf{x}\in X\) such that \(\mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\). To this feasibility problem we associate the homogeneous Lagrangian dual problem to find

$$\begin{aligned} \displaystyle \mathop {\hbox {supremum}}\limits _{\mathbf{u}\in \mathbb {R}^m_+} \; \theta _0(\mathbf{u}), \end{aligned}$$
(4.1)

where \(\theta _0:\mathbb {R}^m \mapsto \mathbb {R}\) is defined by (2.2), for \({f \equiv 0}\) (i.e., \(\theta _0(\mathbf{u}) = \min _{\mathbf{x}\in X} \{\, \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\}\)). A corresponding (optimal) subproblem solution \(\mathbf{x}_0^0(\mathbf{u})\) and the subdifferential \(\partial _0^{\mathbb {R}^m} \theta _0(\mathbf{u})\) are analogously defined.

According to its definition, the function \(\theta _0\) is superlinear (e.g., [17, Proposition V:1.1.3]), meaning that its hypograph is a nonempty and convex cone in \(\mathbb {R}^{m+1}\), and implying that \(\theta _0(\delta \mathbf{u}) = \delta \theta _0(\mathbf{u})\) holds for all \((\delta , \mathbf{u}) \in \mathbb {R}_+ \times \mathbb {R}^m\). The definition of the directional derivative, \(\theta _0'(\mathbf{u}; \mathbf{d})\), of \(\theta _0\) at \(\mathbf{u}\) in the direction of \(\mathbf{d}\) (e.g., [17, Rem. I:4.1.4]), then yields that \(\theta _0'(\mathbf{0}^m; \mathbf{d}) = \theta _0(\mathbf{d})\) holds for all \(\mathbf{d}\in \mathbb {R}^m\). The program (4.1) can thus be interpreted as the search for a steepest feasible ascent direction of \(\theta _0\). Such a search requires that the argument of \(\theta _0\) is restricted. Hence, we define

$$\begin{aligned} \theta _0^{V_0^{*}}:= \max _{\mathbf{u}\in V} \big \{\, \theta _0(\mathbf{u}) \,\big \} = \max _{\mathbf{d}\in V} \big \{\, \theta _0'(\mathbf{0}^m; \mathbf{d}) \,\big \}, \quad \text{ where } \quad V := \big \{\, \mathbf{u}\in \mathbb {R}^m_+ \,\big |\, \Vert \mathbf{u}\Vert \le 1 \,\big \}. \end{aligned}$$
(4.2)

Defining V using the unit ball is somewhat arbitrary. Owing to the homogeneity of \(\theta _0\), the unit ball—viewed as the convex hull of the projective space—is, however, a natural choice. As shown in Sect. 4.2, for this choice the dual mapping yields a singleton set.

4.2 An associated saddle-point problem

From the Definition (2.2) of the function \(\theta _f\) and the Definition (4.2) of \(\theta _0^{V_0^{*}}\) follow that

$$\begin{aligned} \theta _0^{V_0^{*}}= \max _{\mathbf{u}\in V} \Big \{\, \min _{\mathbf{x}\in X} \, \big \{\, \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\big \} \,\Big \} = \min _{\mathbf{x}\in X} \Big \{\, \max _{\mathbf{u}\in V} \, \big \{\, \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\big \} \,\Big \} \end{aligned}$$
(4.3)

hold, since the Lagrange function, defined by \(\mathcal {L}_0(\mathbf{x},\mathbf{u}) = \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x})\), is convex with respect to \(\mathbf{x}\), for \(\mathbf{u}\in \mathbb {R}^m_+\), and linear with respect to \(\mathbf{u}\), and since the sets X and \(V \subset \mathbb {R}^m_+\) are convex and compact (see, e.g., [17, Thms. VII:4.2.5 and VII:4.3.1]).

Definition 4.1

(The set of saddle-points of the homogeneous Lagrange function [17, Def. VII:4.1.1]) Let the mappings \(X_0(\cdot ): V \mapsto 2^X\) and \(V(\cdot ): X \mapsto 2^V\) be defined by

$$\begin{aligned} X_0(\mathbf{v}) := \mathop {\mathrm{arg\,min}}\limits _{\mathbf{x}\in X} \, \mathcal {L}_0(\mathbf{x},\mathbf{v}), \quad \mathbf{v}\in V, \quad \text {and} \quad V(\mathbf{x}) := \mathop {\mathrm{arg\,max}}\limits _{\mathbf{v}\in V} \, \mathcal {L}_0(\mathbf{x},\mathbf{v}), \quad \mathbf{x}\in X, \end{aligned}$$

where the homogeneous Lagrange function \(\mathcal {L}_0: X \times V \mapsto \mathbb {R}\) is defined as \(\mathcal {L}_0(\mathbf{x},\mathbf{v}) := \mathbf{v}^\mathrm{T}\mathbf{g}(\mathbf{x})\). A point \((\overline{\mathbf{x}}, \overline{\mathbf{v}}) \in X \times V\) is said to be a saddle-point of the function \(\mathcal {L}_0\) on \(X \times V\) when the inclusions \(\overline{\mathbf{x}}\in X_0(\overline{\mathbf{v}})\) and \(\overline{\mathbf{v}}\in V(\overline{\mathbf{x}})\) hold. The set of saddle-points is denoted by \(X_0^* \times {V_0^{*}}\). \(\square \)

By the definition of the set \(X_f^*\) in (2.5), for the case when the program (2.1) is consistent, \(X_0^{*} = \{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x})\le \mathbf{0}^m\,\} \not = \emptyset \) denotes its feasible set. For the inconsistent case, whenever \(\mathbf{x}\in X\) it holds that \(g_i(\mathbf{x}) > 0\) for at least one \(i \in \mathcal {I}\), implying that \(\big \Vert [ \mathbf{g}(\mathbf{x}) ]_+ \big \Vert > 0\).

Lemma 4.2

(A homogeneous dual mapping) If \(\mathbf{x}\in X\) then the set \(V(\mathbf{x})\) is a singleton, given by \(V(\mathbf{x}) = \{\, \Vert [ \mathbf{g}(\mathbf{x}) ]_+ \Vert ^{-1} [ \mathbf{g}(\mathbf{x}) ]_+ \,\}\).

Proof

Let \(\mathbf{x}\in X\) and \(\overline{\mathbf{v}} = \Vert [ \mathbf{g}(\mathbf{x}) ]_+ \Vert ^{-1} [ \mathbf{g}(\mathbf{x}) ]_+\). Then, for any \(\mathbf{v}\in V\) it holds that

$$\begin{aligned} \mathbf{v}^\mathrm{T}\mathbf{g}(\mathbf{x}) \le \mathbf{v}^\mathrm{T}[\mathbf{g}(\mathbf{x})]_+ \le \Vert \mathbf{v}\Vert \cdot \Vert [\mathbf{g}(\mathbf{x})]_+ \Vert \le \Vert [\mathbf{g}(\mathbf{x})]_+ \Vert = \overline{\mathbf{v}}^\mathrm{T}\mathbf{g}(\mathbf{x}), \end{aligned}$$
(4.4)

where the first inequality holds since \(\mathbf{v}\ge \mathbf{0}^m\) and \(\mathbf{g}(\mathbf{x}) \le [\mathbf{g}(\mathbf{x})]_+\), the second follows from the Cauchy–Schwartz inequality, and the third holds since \(\Vert \mathbf{v}\Vert \le 1\). Since \(\mathbf{x}\in X\) is arbitrary and \(\overline{\mathbf{v}}\in V\), it follows that \(\overline{\mathbf{v}}\in V(\mathbf{x})\). That the set \(V(\mathbf{x})\) is a singleton follows from the fact that equality holds in each of the inequalities in (4.4) only when both \(\Vert \mathbf{v}\Vert = 1\) holds and the vectors \(\mathbf{v}\) and \([\mathbf{g}(\mathbf{x})]_+\) are parallel, in which case \(\mathbf{v}= \overline{\mathbf{v}}\). The lemma follows. \(\square \)

Since for all \(\mathbf{x}\in X\) and \(\{\, \overline{\mathbf{v}} \,\} = V(\mathbf{x})\) the equality \(\overline{\mathbf{v}}^\mathrm{T}\mathbf{g}(\mathbf{x}) = \big \Vert [\mathbf{g}(\mathbf{x})]_+ \big \Vert \) holds, the right-hand side of (4.3) may be interpreted as the minimal total deviation from feasibility in the constraints \(\mathbf{g}(\mathbf{x})\le \mathbf{0}^m\) over \(\mathbf{x}\in X\), that is,

$$\begin{aligned} \theta _0^{V_0^{*}}= \min _{\mathbf{x}\in X} \big \{\, \big \Vert [\mathbf{g}(\mathbf{x})]_+ \big \Vert \,\big \} > 0. \end{aligned}$$
(4.5)

The set \(X_0^* \times {V_0^{*}}\) of saddle-points of \(\mathcal {L}_0\) on \(X \times V\) is thus given by

$$\begin{aligned} X_0^* := \mathop {\mathrm{arg\,min}}\limits _{\mathbf{x}\in X} \big \{\, \big \Vert [\mathbf{g}(\mathbf{x})]_+ \big \Vert \,\big \} \quad \text{ and } \quad {V_0^{*}}: = \mathop {\mathrm{arg\,max}}\limits _{\mathbf{v}\in V} \Big \{\, \min _{\mathbf{x}\in X} \{\, \mathbf{v}^\mathrm{T}\mathbf{g}(\mathbf{x}) \,\} \,\Big \}. \end{aligned}$$
(4.6)

Note that this definition of \(X_0^*\) agrees with (2.5) and is valid regardless of the consistency or inconsistency of the program (2.1). Since \({V_0^{*}}\) is a singleton, we define the vector \(\mathbf{v}^{*}\) by

$$\begin{aligned} \big \{\, \mathbf{v}^{*}\,\big \} := {V_0^{*}}. \end{aligned}$$
(4.7)

Note the equivalence \(X_0^* = \big \{\, \mathbf{x}\in X_0(\mathbf{v}^{*}) \,\big |\, V(\mathbf{x}) = \{\, \mathbf{v}^{*}\,\} \,\big \}\).

Proposition 4.3

(Characterization of the set of saddle-points) The following hold:

  1. (a)

    The primal optimal set \(X_0^*\) is nonempty and compact.

  2. (b)

    The dual optimal set \({V_0^{*}}= V(\mathbf{x}^*)\), irrespective of \(\mathbf{x}^* \in X_0^*\).

  3. (c)

    The dual optimal point fulfils \(\mathbf{v}^{*}\in C\).

  4. (d)

    The dual optimal point fulfils \(\Vert \mathbf{v}^{*}\Vert = 1\).

  5. (e)

    If the program (2.1) is linear then \(X_0^*\) is polyhedral.

Proof

The statements (a) and (e) follow by identifying \(X_0^* \times \{\, \mathbf{z}^* \,\} = {\mathrm{arg\,min}}_{(\mathbf{x},\mathbf{z}) \in X \times \mathbb {R}^m} \big \{\, \Vert \mathbf{z}\Vert ^2 \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{z}\,\big \}\) ([2, Thm. 2.3.1]); by assumption, then \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \) holds, implying that \(\mathbf{z}^* \ne \mathbf{0}^m\). By Definition 4.1, \({V_0^{*}}\subseteq V(\mathbf{x}^*)\) holds for all \(\mathbf{x}^* \in X_0^*\), and by Lemma 4.2, \(V(\mathbf{x})\) is a singleton for any \(\mathbf{x}\in X\); hence (b) holds. The statement (c) follows from the definition of the sets C and \({V_0^{*}}\) in Proposition 3.1 and (4.6), respectively, while (d) follows from Lemma 4.2, Proposition (b), and (4.7). \(\square \)

5 Convergence to the homogeneous dual optimal set in the inconsistent case

We have characterized the Cartesian product set \(X_0^* \times \{\, \mathbf{v}^{*}\,\}\) of saddle-points of the homogeneous Lagrange function \(\mathcal {L}_0\) over \(X \times V\). Next, we will show that a sequence of simply scaled dual iterates obtained from the subgradient scheme converges to the point \(\mathbf{v}^{*}\).

To simplify the notation in the analysis to follow, we let

$$\begin{aligned}&L_t := \max _{s=0,\ldots ,t} \big \{\, \Vert \mathbf{u}^s \Vert ,1 \,\big \}, \quad \overline{\varepsilon } := \max _{\mathbf{x}, \mathbf{y}\in X} \big \{\, f(\mathbf{x})-f(\mathbf{y}) \,\big \}, \nonumber \\ {}&\quad \text{ and } \quad \varepsilon _t := \frac{\overline{\varepsilon } + \epsilon _t}{L_t}, \quad t = 0,1,\ldots , \; \end{aligned}$$
(5.1a)

where \(\epsilon _t \ge 0\), \(t=0,1,\ldots \). In tandem with the iterations of the conditional \(\epsilon _t\)-subgradient algorithm (2.11) we construct the scaled dual iterate

$$\begin{aligned} \mathbf{v}^t := \frac{\mathbf{u}^t}{L_t}, \quad t = 0, 1, \ldots . \end{aligned}$$
(5.1b)

We next show that the conditional (with respect to \(\mathbb {R}^m_+\)) \(\epsilon _t\)-subgradients \(\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t\), used in the algorithm (2.11), are also conditional (with respect to V) \(\varepsilon _t\)-subgradients of the homogeneous dual function \(\theta _0\) at the scaled iterate \(\mathbf{v}^t\), with \(\varepsilon _t = L_t^{-1}(\overline{\varepsilon } + \epsilon _t)\).

Lemma 5.1

(Conditional \(\varepsilon _t\)-subgradients of the homogeneous dual function) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a) applied to the program (2.4), let the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and the sequences \(\{\, \varepsilon _t \,\}\) and \(\{\, \mathbf{v}^t \,\}\) be defined by (5.1). Then,

$$\begin{aligned} \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \in \partial _{\varepsilon _t}^{V} \theta _0(\mathbf{v}^t), \quad t = 0, 1, \ldots . \end{aligned}$$

Proof

From the definitions (2.2) and (2.3) (for \(\varepsilon = 0\)) follow that the relations

$$\begin{aligned} \theta _{f}(\mathbf{u}^t)&\le f(\mathbf{x}_0^0( \mathbf{u}^t)) + (\mathbf{u}^t)^\mathrm{T}\mathbf{g}(\mathbf{x}_0^0(\mathbf{u}^t)) = f(\mathbf{x}_0^0(\mathbf{u}^t)) + \theta _0(\mathbf{u}^t), \quad t \ge 0, \end{aligned}$$

and

$$\begin{aligned} \theta _0(\mathbf{u}) \le \mathbf{u}^\mathrm{T}\mathbf{g}(\mathbf{x}_f^0(\mathbf{u})) = \theta _{f}(\mathbf{u}) - f(\mathbf{x}_f^0(\mathbf{u})), \quad \mathbf{u}\in \mathbb {R}^m_+, \end{aligned}$$

hold. The combination of these relations with (2.8)–(2.9) (for \(\varepsilon = \epsilon _t\)), (2.10)–(2.11), and the definition of \(\overline{\varepsilon }\) in (5.1a) yield that the inequalities

$$\begin{aligned} \theta _0(\mathbf{u}) - \theta _0(\mathbf{u}^t)&\le \theta _{f}(\mathbf{u}) - f(\mathbf{x}_f^0(\mathbf{u})) - \left[ \theta _{f}(\mathbf{u}^t) - f(\mathbf{x}_0^0(\mathbf{u}^t)) \right] \\ {}&\le \big [ \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \big ]^\mathrm{T}(\mathbf{u}\!-\! \mathbf{u}^t) + \overline{\varepsilon } + \epsilon _t \end{aligned}$$

hold for all \(t \ge 0\) and all \(\mathbf{u}\in \mathbb {R}^m_+\), implying that \(\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t \in \partial _{\overline{\varepsilon }+\epsilon _t}^{\mathbb {R}^m_+} \theta _0(\mathbf{u}^t)\). By (2.8) and (5.1), the superlinearity of the function \(\theta _0\), and since \(V\subset \mathbb {R}^m_+\), it holds that \(\partial _{\overline{\varepsilon }+\epsilon _t}^{\mathbb {R}^m_+} \theta _{0}(\mathbf{u}^t) = \partial _{{\varepsilon _t}}^{\mathbb {R}^m_+} \theta _{0}(\mathbf{v}^t) \subseteq \partial _{{\varepsilon _t}}^{V} \theta _{0}(\mathbf{v}^t)\), and the result follows. \(\square \)

The following two lemmas are needed for the analysis to follow.

Lemma 5.2

(Normalized divergent series step lengths form a divergent series]) Assume that \(\{\, \alpha _t \,\}_{t=1}^\infty \subset \mathbb {R}_+\). If \(\varLambda _t \rightarrow \infty \) as \(t \rightarrow \infty \), then \(\big \{\, \sum _{r=1}^t (1 + \varLambda _r)^{-1} \alpha _r \,\big \} \rightarrow \infty \) as \(t \rightarrow \infty \).

Proof

Since \(\log (1+d)\le d\) whenever \(d> -1\), for any \(r \ge 1\), the relations

$$\begin{aligned} \log (1 + \varLambda _{r+1}) = \log (1 + \varLambda _r) + \log \big [ 1 + (1 + \varLambda _r)^{-1} \alpha _r \big ] \le \log (1 + \varLambda _r) + (1 + \varLambda _r)^{-1} \alpha _r \end{aligned}$$

hold. Aggregating these inequalities for \(r=1,\ldots ,t\) then yields the inequality

$$\begin{aligned} \log (1 + \varLambda _{t+1}) \le \log (1 + \alpha _0) + {\sum _{r=1}^t \big [ (1 + \varLambda _r)^{-1} \alpha _r \big ] }, \end{aligned}$$

and the lemma follows. \(\square \)

Lemma 5.3

(Projection onto V) For any \(\mathbf{u}\in \mathbb {R}^m\) the equalities \(\mathrm{proj}(\mathbf{u}; V) = \mathrm{proj}([\mathbf{u}]_+; V) = \big ( \max \big \{\, 1; \big \Vert [\mathbf{u}]_+ \big \Vert \,\big \} \big )^{-1} [\mathbf{u}]_+\) hold.

Proof

The result follows by applying the optimality conditions (e.g., [2, Thm. 4.2.13]) to the convex and differentiable optimization problems defined by the projection operator in (2.12) for \(S = \mathbb {R}^m_+\) and \(S = V\), respectively. \(\square \)

We now establish the convergence characteristics of the scaled dual sequence \(\{ \mathbf{v}^t \}\) defined in (5.1b) to the dual part of the set of saddle-points for \(\mathcal {L}_0\).

Theorem 5.4

(Convergence of a scaled dual sequence) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a), (2.13c) applied to the program (2.4), let the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, let the sequence \(\{\, \mathbf{v}^t \,\}\) be defined by (5.1b), and let the optimal solution to the homogeneous dual, \(\mathbf{v}^{*}\), be defined by (4.7). If \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \), then it holds that \(\mathbf{v}^t \rightarrow \mathbf{v}^{*}\) as \(t \rightarrow \infty \).

Proof

Let \(\varvec{\gamma }\,^{t} := \mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) - \varvec{\eta }^t\) for all \(t \ge 0\). From the definition (2.11) and the triangle inequality it follows that

$$\begin{aligned} \Vert \mathbf{u}^t \Vert \le \Vert \mathbf{u}^0 \Vert + \sum _{r=0}^{t-1} \alpha _r \Vert \varvec{\gamma }\,^r \Vert , \quad t \ge 1. \end{aligned}$$
(5.2)

Since X is compact, \(\mathbf{g}\) is continuous, and the sequence \(\{\, \varvec{\eta }^r \,\}\) is bounded, it holds that \(\varGamma := L_0 + \sup _{r \ge 0} \, \big \{\, \big \Vert \varvec{\gamma }\,^r \big \Vert \,\big \} < \infty \). From the definition (5.1a) of \(L_t\) then follows that

$$\begin{aligned} 1 \le L_t \le \max _{s=1,\ldots ,t} \big \{\, \varGamma (1 + \varLambda _s) \,\big \} = \varGamma (1 + \varLambda _t), \quad t \ge 1, \end{aligned}$$

which implies that \(L_t^{-1} \alpha _t \ge \big [ \varGamma (1 + \varLambda _t) \big ]^{-1} \alpha _t\), \(t \ge 1\). It follows that \(L_t^{-1} \alpha _t > 0\) for all t and, by Proposition 3.3, that \(\big \{\, L_t^{-1} \alpha _t \,\big \} \rightarrow 0\) as \(t \rightarrow \infty \). Since, by Assumption (2.13a), \(\varLambda _t \rightarrow \infty \) as \(t \rightarrow \infty \) it follows from Lemma 5.2 that \(\big \{\, \sum _{s=0}^{t-1} (L_s^{-1} \alpha _s) \,\big \} \rightarrow \infty \) as \(t \rightarrow \infty \). Consequently the sequence \(\big \{\, L_t^{-1} \alpha _t \,\big \}\) fulfils the conditions (2.13a).

From Lemma 5.3 follows that

$$\begin{aligned} \mathrm{proj}(\mathbf{v}^t + L_t^{-1} \alpha _t \varvec{\gamma }\,^t; V)&= \mathrm{proj}(L_t^{-1} [\mathbf{u}^t + \alpha _t \varvec{\gamma }\,^t]_+; V) = \mathrm{proj}(L_t^{-1} \mathbf{u}^{t+1}; V). \end{aligned}$$

If \(\Vert \mathbf{u}^{t+1} \Vert \le L_t\), then \(L_{t+1} = L_t\) and \(\mathrm{proj}(L_t^{-1} \mathbf{u}^{t+1}; V) = \mathrm{proj}(\mathbf{v}^{t+1}; V) = \mathbf{v}^{t+1}\) hold. Otherwise, \(\Vert \mathbf{u}^{t+1} \Vert = L_{t+1} > L_t\) and \(\mathrm{proj}(L_t^{-1} \mathbf{u}^{t+1}; V) = L_{t+1}^{-1} \mathbf{u}^{t+1} = \mathbf{v}^{t+1}\) hold. In both cases

$$\begin{aligned} \mathbf{v}^0 = {L_0^{-1}}{\mathbf{u}^0} \quad \text {and} \quad \mathbf{v}^{t+\frac{1}{2}} = \mathbf{v}^t + {L_t^{-1}}{\alpha _t} \varvec{\gamma }\,^t, \quad \mathbf{v}^{t+1} = \mathrm{proj}(\mathbf{v}^{t+\frac{1}{2}}; V), \quad t = 0,1,\ldots , \end{aligned}$$

hold. Further, by Lemma 5.1 the inclusion \(\varvec{\gamma }\,^t \in \partial ^V_{\varepsilon _t} \theta _0(\mathbf{v}^t)\) holds for \(t \ge 1\). By (2.13c) the sequence \(\{\, \epsilon _t \,\}\) is bounded; from Proposition 3.3 and (5.1a) it then follows that \(\varepsilon _t \rightarrow 0\). The theorem then follows from [26, Thm. 3]. \(\square \)

The main idea utilized in the proof of Theorem 5.4 is that the scaled sequence \(\{\, \mathbf{v}^t \,\}\) obtained from the subgradient method defines a conditional (with respect to V) \(\varepsilon _t\)-subgradient algorithm, as applied to the homogeneous Lagrange dual (4.2). Hence, by tackling the Lagrange dual (2.4) by a subgradient method, we receive—in the case of an inconsistent primal problem—a solution to its homogeneous version (4.1).

Next follow two technical corollaries, to be used in the primal convergence analysis.

Corollary 5.5

(Convergence of a normalized dual sequence) Under the assumptions of Theorem 5.4 it holds that \(\left\{ \, \Vert \mathbf{u}^t \Vert ^{-1} \mathbf{u}^t \,\right\} \rightarrow \mathbf{v}^{*}\) as \(t\rightarrow \infty \).

Proof

By the superlinearity of the function \(\theta _0\), it holds that \(\theta _0(\Vert \mathbf{u}^t\Vert ^{-1}\mathbf{u}^t) \ge \theta _0(L_t^{-1}\mathbf{u}^t) = \theta _0(\mathbf{v}^t)\), \(t \ge 0\). The corollary then follows since, by Theorem 5.4, \(\theta _0(\mathbf{v}^t) \rightarrow \theta _0^{V_0^{*}}\). \(\square \)

Corollary 5.6

(Convergence to the optimal value of the homogeneous dual) Under the assumptions of Theorem 5.4 it holds that \(\big \{\, (\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) \,\big \} \rightarrow \theta _0^{{V_0^{*}}}\) as \(t\rightarrow \infty \).

Proof

For each \(t\ge 0\) and each \(\mathbf{x}\in X\), let \(\rho _t(\mathbf{x}) := L_t^{-1}f(\mathbf{x}) + (\mathbf{v}^t - \mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x})\). Since \(L_t \rightarrow \infty \), \(\mathbf{v}^t \rightarrow \mathbf{v}^{*}\), and X is compact, it follows that \(\rho _t(\mathbf{x}) \rightarrow 0\) for all \(\mathbf{x}\in X\). Using the definition (2.2) and the equivalence (4.3), and by separating the minimization over \(\mathbf{x}\in X\), it follows that

$$\begin{aligned} L_t^{-1} \theta _f(\mathbf{u}^t)= & {} \min _{\mathbf{x}\in X} \big \{ (\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}) + \rho _t(\mathbf{x}) \big \} \ge \min _{\mathbf{x}\in X} \big \{ (\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}) \big \} + \min _{\mathbf{x}\in X} \{ \rho _t(\mathbf{x}) \}\nonumber \\= & {} \theta _0^{{V_0^{*}}} + \min _{\mathbf{x}\in X} \{\, \rho _t(\mathbf{x}) \,\}. \end{aligned}$$
(5.3)

On the other hand, since \(X_0(\mathbf{v}^{*}) \subseteq X\), and \((\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}) = \theta _0^{{V_0^{*}}}\) for any \(\mathbf{x}\in X_0(\mathbf{v}^{*})\), we have that

$$\begin{aligned} L_t^{-1} \theta _f(\mathbf{u}^t) \le \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, (\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}) + \rho _t(\mathbf{x}) \,\big \} = \theta _0^{{V_0^{*}}} + \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \{\, \rho _t(\mathbf{x}) \,\}. \end{aligned}$$

It follows that \(\big \{\, L_t^{-1} \theta _f(\mathbf{u}^t) \,\big \} \rightarrow \theta _0^{{V_0^{*}}}\) as \(t \rightarrow \infty \). By the left-most equality in (5.3) and (2.3) the inequality \((\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) \le L_t^{-1} \theta _f(\mathbf{u}^t) - \rho _t(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) + L_t^{-1} \epsilon _t\) holds. The corollary follows. \(\square \)

6 Primal convergence in the case of inconsistency

We apply the conditional \(\varepsilon \)-subgradient scheme (2.11) to the Lagrange dual of the program (2.1). In each iteration we construct an ergodic primal iterate \(\overline{\mathbf{x}}_f^t\), according to the scheme defined in (2.14). We here aim at analyzing the convergence of the ergodic sequence \(\{\, \overline{\mathbf{x}}_f^t \,\}\) when the primal program (2.1) is inconsistent. In Sect. 6.1 we establish convergence of the ergodic sequence to the feasible set of the selection problem (2.16) for the case of convex programming [i.e., Theorem 2.3(a)]. In Sect. 6.2 we specialize this result to the case of linear programming, in which case the stronger result of convergence to optimal solutions to the selection problem (2.16) is obtained [i.e., Theorem 2.3(b)].

The set of indices of the strictly positive elements of the vector \(\mathbf{v}\) is denoted by

$$\begin{aligned} \mathcal {I}_+(\mathbf{v}) := \big \{\, i \in \mathcal {I} \,\big |\, v_i > 0 \,\big \} \subseteq \mathcal {I}, \quad \mathbf{v}\in V. \end{aligned}$$

6.1 Convergence results for general convex programming

To simplify the notation, we let the ergodic sequence, \(\{\, \overline{\varvec{\gamma }\,}^t \,\}\), of conditional \(\varepsilon \)-subgradients be defined by

$$\begin{aligned} \overline{\varvec{\gamma }\,}^t := \frac{1}{\varLambda _t} \sum _{s=0}^{t-1} \alpha _s \big [\, \mathbf{g}(\mathbf{x}_f^{\epsilon _s}(\mathbf{u}^s)) - \varvec{\eta }^s \,\big ], \quad t=1,2,\ldots , \end{aligned}$$

for some choices of step lengths \(\alpha _s > 0\) and approximation errors \(\epsilon _s \ge 0\), \(s=0,1,\ldots , t-1\). We will also need the following technical lemma (see [21, p. 35] for its proof).

Lemma 6.1

(A convergent sequence of convex combinations) Let the sequences \(\{\, \mu _{ts} \,\} \subset \mathbb {R}_+\) and \(\{\, \mathbf{e}^s \,\} \subset \mathbb {R}^r\), where \(r \ge 1\), satisfy the relations \(\sum _{s=0}^{t-1} \mu _{ts} = 1\) for \(t=1,2,\ldots \), \(\mu _{ts}\rightarrow 0\) as \(t \rightarrow \infty \) for \(s = 0, 1, \ldots \), and \(\mathbf{e}^s \rightarrow \mathbf{e}\) as \(s\rightarrow \infty \). Then, \(\big \{\, \sum _{s = 0}^{t-1} \mu _{ts} \mathbf{e}^s \,\big \} \rightarrow \mathbf{e}\) as \(t \rightarrow \infty \). \(\square \)

We are now set up to establish the first part of the main result of this article.

Proof of Theorem 2.3(a)

The case when \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} \not = \emptyset \) is treated in Proposition 2.2.

Consider the case when \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \). We will show that the ergodic sequence \(\{\, \overline{\mathbf{x}}_f^t \,\}\) converges to the set \(X_0^* = \mathrm{arg\,min}_{\mathbf{x}\in X} \{\, \Vert [\mathbf{g}(\mathbf{x})]_+ \Vert \,\}\).

Since X is convex and compact, any limit point \(\overline{\mathbf{x}}_f^\infty \) of \(\{\, \overline{\mathbf{x}}_f^t \,\}\) fulfils \(\overline{\mathbf{x}}_f^\infty \in X\). Then, by the continuity of \(\mathbf{g}\) and \(\Vert \cdot \Vert \), and the equivalence in (4.5), the relations

$$\begin{aligned} \theta _0^{{V_0^{*}}} = \min _{\mathbf{x}\in X} \big \{\, \big \Vert [\mathbf{g}(\mathbf{x})]_+ \big \Vert \,\big \} \le \big \Vert [\mathbf{g}(\overline{\mathbf{x}}_f^\infty )]_+ \big \Vert \le \limsup _{t \rightarrow \infty } \big \{\, \big \Vert [\mathbf{g}(\overline{\mathbf{x}}_f^t)]_+ \big \Vert \,\big \} \end{aligned}$$

hold. From Lemma 4.2 and the definitions (4.6)–(4.7) follow that the vectors \([\mathbf{g}(\mathbf{x})]_+\) and \(\mathbf{v}^{*}\) are parallel, and that \(\big \Vert [\mathbf{g}(\mathbf{x})]_+ \big \Vert = \theta _0^{{V_0^{*}}}\) hold for all \(\mathbf{x}\in X_0^*\). Further, by Proposition (d), \(\Vert \mathbf{v}^{*}\Vert = 1\) holds. Hence, it suffices to show that

$$\begin{aligned} \limsup _{t \rightarrow \infty } \, \big \{\, \big [ \mathbf{g}(\overline{\mathbf{x}}_f^t) \big ]_+ \,\big \} \le \theta _0^{{V_0^{*}}}\mathbf{v}^{*}, \end{aligned}$$
(6.1)

which will imply the equivalence \(\big \Vert [\mathbf{g}(\overline{\mathbf{x}}_f^\infty )]_+ \big \Vert = \theta _0^{V_0^{*}}\) and thus the sought inclusion \(\overline{\mathbf{x}}_f^\infty \in X_0^*\).

Since \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\} = \emptyset \), it follows from Proposition (d) that \(\mathcal {I}_+(\mathbf{v}^{*}) \not = \emptyset \). Consider any \(i \in \mathcal {I}_+(\mathbf{v}^{*})\). When \(t \ge 0\) is large enough, by Corollary 5.5, \(u^t_i > 0\) holds, implying that \(u_i^{t+1} = {u_i}^{\!\!t+\frac{1}{2}}\) holds in the iteration formula (2.11). Hence, for \(N \ge 0\) large enough, it holds that

$$\begin{aligned} u_i^t = {u_i}^{\!\!N} + \sum _{s=N}^{t-1} \alpha _s \big [\, g_i(\mathbf{x}_f^{\epsilon _s}(\mathbf{u}^s)) - \eta ^s_i \,\big ] = {u_i}^{\!\!N} + \varLambda _t {\overline{\gamma }_i}^{\!t} - \varLambda _N {\overline{\gamma }_i}^{\!N}, \quad t \ge N+1. \end{aligned}$$

By rearranging this equation and dividing the resulting terms by \(\Vert \mathbf{u}^t \Vert \), it follows that

$$\begin{aligned} \Big \{\, \big \Vert \mathbf{u}^t \big \Vert ^{-1} \varLambda _t {\overline{\gamma }_i}^{\!t} \,\Big \} = \Big \{\, \big \Vert \mathbf{u}^t \big \Vert ^{-1} \big ( u_i^t - {u_i}^{\!\!N} + \varLambda _N {\overline{\gamma }_i}^{\!N} \big ) \,\Big \} \rightarrow v_i^* \quad \text {as} \quad t \rightarrow \infty , \end{aligned}$$
(6.2)

since, by Proposition 3.3, \(\{\, \Vert \mathbf{u}^t \Vert \,\} \rightarrow \infty \) and, by Corollary 5.5, \(\{\, \Vert \mathbf{u}^t\Vert ^{-1}u_i^t \,\} \rightarrow v_i^*\). By Corollary 5.6 it holds that \(\big \{\, (\mathbf{v}^{*})^\mathrm{T}\mathbf{g}(\mathbf{x}_f^{\epsilon _t}(\mathbf{u}^t)) \,\big \} \rightarrow \theta _0^{{V_0^{*}}}\) as \(t \rightarrow \infty \). Applying Lemma 6.1 with the identifications

$$\begin{aligned} r = 1, \quad \mu _{ts} = \varLambda _t^{-1} \alpha _s, \quad \mathbf{e}^s = (\mathbf{v}^*)^\mathrm{T}\mathbf{g}(\mathbf{x}_f^{\epsilon _s}(\mathbf{u}^s)), \quad \text {and} \quad \mathbf{e}= \theta _0^{{V_0^{*}}} \end{aligned}$$

then yields that \(\big \{\, (\mathbf{v}^{*})^\mathrm{T}\overline{\varvec{\gamma }\,}^t \,\big \} \rightarrow \theta _0^{{V_0^{*}}}\) as \(t \rightarrow \infty \). Since \(v_j^* = 0\) for \(j \in \mathcal {I} {\setminus } \mathcal {I}_+(\mathbf{v}^{*})\), it follows by utilizing (6.2) that

$$\begin{aligned} \big \{\, (\mathbf{v}^{*})^\mathrm{T}\overline{\varvec{\gamma }\,}^t \Vert \mathbf{u}^t \Vert ^{-1} \varLambda _t \,\big \} \rightarrow \Vert \mathbf{v}^{*}\Vert ^2 = 1 \quad \text {as} \quad t \rightarrow \infty , \end{aligned}$$

which implies that

$$\begin{aligned} \big \{\, \Vert \mathbf{u}^t \Vert ^{-1} \varLambda _t \,\big \} = \big \{\, (\mathbf{v}^{*})^\mathrm{T}\overline{\varvec{\gamma }\,}^t \Vert \mathbf{u}^t \Vert ^{-1} \varLambda _t \big [ (\mathbf{v}^{*})^\mathrm{T}\overline{\varvec{\gamma }\,}^t \big ]^{-1} \,\big \} \rightarrow \big ( \theta _0^{{V_0^{*}}} \big )^{-1} \quad \text {as} \quad t \rightarrow \infty . \end{aligned}$$

By combining this result with (6.2) it then follows that

$$\begin{aligned} {\overline{\gamma }_i}^{\!t} \rightarrow \theta _0^{{V_0^{*}}}v_i^* \quad \text {as} \quad t \rightarrow \infty , \quad i \in \mathcal {I}_+(\mathbf{v}^{*}). \end{aligned}$$

By the convexity of the functions \(g_i\), for each \(i \in \mathcal {I}_+(\mathbf{v}^{*})\) it then holds that

$$\begin{aligned} \limsup _{t \rightarrow \infty } \big \{\, g_i(\overline{\mathbf{x}}_f^t) \,\big \} \le \limsup _{t \rightarrow \infty } \bigg \{\, \frac{1}{\varLambda _t} \sum _{s=0}^{t-1} \alpha _s g_i(\mathbf{x}_f^{\epsilon _s}(\mathbf{u}^s)) \,\bigg \} \le \limsup _{t \rightarrow \infty } \big \{\, {\overline{\gamma }_i}^{\!t} \,\big \} = \theta _0^{{V_0^{*}}} v_i^*. \end{aligned}$$
(6.3)

From (5.2) follows that the inequality \(\Vert \mathbf{u}^t \Vert \le \Vert \mathbf{u}^0 \Vert + \varLambda _t \varGamma \) holds for every \(t \ge 1\), where \(\varGamma = L_0 + \sup _{r \ge 0} \big \{\, \big \Vert \mathbf{g}(\mathbf{x}_f^{\epsilon _r}(\mathbf{u}^r)) - \varvec{\eta }^r \big \Vert \,\big \} < \infty \). By Proposition 3.3, for a large enough \(N \ge 1\), \(\Vert \mathbf{u}^t\Vert > \Vert \mathbf{u}^0\Vert \) holds for each \(t \ge N\), which implies the inequalities

$$\begin{aligned} \varLambda _t^{-1} \le \varGamma \big ( \Vert \mathbf{u}^t \Vert - \Vert \mathbf{u}^0 \Vert \big )^{-1}, \quad t \ge N. \end{aligned}$$
(6.4)

Then, for each \(j \in \mathcal {I}{\setminus } \mathcal {I}_+(\mathbf{v}^{*})\) and all \(t \ge N\), the relations

$$\begin{aligned} g_j(\overline{\mathbf{x}}_f^t)\le & {} \frac{1}{\varLambda _t} \sum _{s=0}^{t-1} \alpha _s g_j(\mathbf{x}_f^{\epsilon _s}(\mathbf{u}^s)) \le \frac{1}{\varLambda _t} \sum _{s=0}^{t-1} \big ( u_j^{s+1} - u_j^{s} \big ) = \frac{1}{\varLambda _t} \big ( u_j^t - u_j^0 \big )\\\le & {} \varGamma \frac{u_j^t - u_j^0}{\Vert \mathbf{u}^t \Vert - \Vert \mathbf{u}^0 \Vert } \end{aligned}$$

hold, where the first inequality follows from the convexity of \(g_j\), the second from (2.11) and the fact that \(\eta ^t_j \le 0\), the equality by telescoping, and the final inequality by (6.4). As \(t\rightarrow \infty \), by Proposition 3.3, \(\Vert \mathbf{u}^t \Vert \rightarrow \infty \), and by Corollary 5.5, \(\big \{\, \Vert \mathbf{u}^t \Vert ^{-1} u_j^t \,\big \} \rightarrow v_j^* = 0\). It follows that

$$\begin{aligned} \limsup _{t \rightarrow \infty } \, \big \{\, g_j(\overline{\mathbf{x}}_f^t) \,\big \} \le 0, \quad j \in \mathcal {I} {\setminus } \mathcal {I}_+(\mathbf{v}^{*}). \end{aligned}$$
(6.5)

From (6.3) and (6.5) we then conclude (6.1). The theorem follows. \(\square \)

6.2 Properties of and convergence results for the linear programming case

We now analyze the special case when the program (2.1) is a linear program, i.e., when the program can be formulated as the problem to

$$\begin{aligned}&\text {minimize} \;\; \mathbf{c}^\mathrm{T}\mathbf{x}, \end{aligned}$$
(6.6a)
$$\begin{aligned}&\mathrm{subject~to~~}\, \mathbf{A}\mathbf{x}\ge \mathbf{b}, \end{aligned}$$
(6.6b)
$$\begin{aligned}&\mathbf{x}\in X, \end{aligned}$$
(6.6c)

where \(\mathbf{c}\in \mathbb {R}^n\), \(\mathbf{A}\in \mathbb {R}^{m \times n}\), \(\mathbf{b}\in \mathbb {R}^m\), and \(X \subset \mathbb {R}^n\) is a nonempty and bounded polyhedron. The aim of this subsection is to provide a proof of Theorem 2.3(b), stating that the ergodic sequence \(\{\, \overline{\mathbf{x}}_f^t \,\}\) [defined in (2.14)] converges to the optimal set of the selection problem (2.16).Footnote 2 For this linear case, the Lagrangian subproblems in (2.2) can be solved exactly in finite time; hereafter we thus let \(\epsilon _t := 0\), \(t \ge 0\).

Let \(\mathbf{A}_i\) denote the ith row of the matrix \(\mathbf{A}\) and let \(\mathbf{x}_0\in X_0(\mathbf{v}^{*})\). The selection problem (2.16) can then be expressed as the linear program to

$$\begin{aligned}&\text {minimize} \;\;\; \mathbf{c}^\mathrm{T}\mathbf{x}, \end{aligned}$$
(6.7a)
$$\begin{aligned}&\mathrm{subject~to~~}\, \mathbf{A}_i\mathbf{x}= b_i - [b_i - \mathbf{A}_i\mathbf{x}_0]_+, i \in \mathcal {I}_+(\mathbf{v}^{*}), \ \end{aligned}$$
(6.7b)
$$\begin{aligned}&\mathbf{A}_i\mathbf{x}\ge b_i, i \in \mathcal {I} {\setminus } \mathcal {I}_+(\mathbf{v}^{*}), \end{aligned}$$
(6.7c)
$$\begin{aligned}&\mathbf{x}\in X_0(\mathbf{v}^{*}). \end{aligned}$$
(6.7d)

Using that \([b_i - \mathbf{A}_i\mathbf{x}_0]_+ = 0\) for all \(i\in \mathcal {I}{\setminus }\mathcal {I}_+(\mathbf{v}^{*})\), we define the (projected) Lagrangian dual function, \({\theta }^+_\mathbf{c}: \mathbb {R}^m \mapsto \mathbb {R}\), to the program (6.7) with respect to the relaxation of the constraints (6.7b) and (6.7c), as

$$\begin{aligned} {\theta }^+_\mathbf{c}(\mathbf{u}) := \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, \mathbf{c}^\mathrm{T}\mathbf{x}+ \mathbf{u}^\mathrm{T}\big ( \mathbf{b}- \mathbf{A}\mathbf{x}- [\mathbf{b}- \mathbf{A}\mathbf{x}_0]_+ \big ) \,\big \}, \quad \mathbf{u}\in \mathbb {R}^m. \end{aligned}$$
(6.8)

Defining the radial cone to \(\mathbb {R}^m_+\) at \(\mathbf{v}\in \mathbb {R}^m_+\) as

$$\begin{aligned} R(\mathbf{v}) := \big \{\, \mathbf{u}\in \mathbb {R}^m \,\big |\, u_i \ge 0, \; i \in \mathcal {I} {\setminus } \mathcal {I}_+(\mathbf{v}) \,\big \}, \end{aligned}$$
(6.9)

the corresponding Lagrange dual is then given by the problem to

$$\begin{aligned} \mathop {\text {maximize}}_{\mathbf{u}\in R(\mathbf{v}^{*})} \; {\theta }^+_\mathbf{c}(\mathbf{u}). \end{aligned}$$
(6.10)

We will show that when applying the conditional \(\varepsilon \)-subgradient optimization algorithm (2.11) to the Lagrange dual (2.4) of the inconsistent linear program (6.6) with respect to the relaxation of the constraints (6.6b), a subgradient scheme is obtained for the Lagrange dual (6.10) of the selection problem (6.7), which is a consistent linear program. We will then deduce that the ergodic sequence \(\{\, \overline{\mathbf{x}}_f^t \,\}\) converges to the set of optimal solutions to (6.7). But first we introduce some definitions needed for the analysis to follow.

A decomposition of any vector \(\mathbf{u}\in \mathbb {R}^m\) into two vectors being parallel and orthogonal, respectively, to \(\mathbf{v}^{*}\), is given by the maps \(\beta : \mathbb {R}^m \mapsto \mathbb {R}\) and \(\varvec{\omega }: \mathbb {R}^m \mapsto \mathbb {R}^m\), according to

$$\begin{aligned} \mathbf{u}= \mathbf{v}^{*}\beta (\mathbf{u}) + \varvec{\omega }(\mathbf{u}), \quad \beta (\mathbf{u}) := \mathbf{u}^\mathrm{T}\mathbf{v}^{*}, \quad \text {and} \quad \varvec{\omega }(\mathbf{u}) := \mathbf{u}- \mathbf{v}^{*}\beta (\mathbf{u}). \end{aligned}$$
(6.11)

Here, \(\beta (\mathbf{u})\) equals the length of the projection of \(\mathbf{u}\in \mathbb {R}^m\) onto \(\mathbf{v}^{*}\), while \(\varvec{\omega }(\mathbf{u})\) equals the projection of \(\mathbf{u}\) onto the orthogonal complement to \(\mathbf{v}^{*}\). Both maps \(\beta \) and \(\varvec{\omega }\) define projections onto linear subspaces.

Property 6.2

(Properties of maps) The following properties of the maps \(\beta \) and \(\varvec{\omega }\) hold.

  1. (a)

    \(\varvec{\omega }(\mathbf{u})^\mathrm{T}\varvec{\omega }(\mathbf{v}) = \varvec{\omega }(\mathbf{u})^\mathrm{T}\mathbf{v}= \mathbf{u}^\mathrm{T}\varvec{\omega }(\mathbf{v})\) for all \(\mathbf{u}, \mathbf{v}\in \mathbb {R}^m\),

  2. (b)

    \(\beta (\mathbf{u}+ \varrho \mathbf{v}^{*}) = \beta (\mathbf{u}) + \varrho \) for all \(\mathbf{u}\in \mathbb {R}^m\) and all \(\varrho \in \mathbb {R}\), and

  3. (c)

    \(\varvec{\omega }(\mathbf{u}+ \varrho \mathbf{v}^{*}) = \varvec{\omega }(\mathbf{u})\) for all \(\mathbf{u}\in \mathbb {R}^m\) and all \(\varrho \in \mathbb {R}\). \(\square \)

Using the fact that \(\varvec{\omega }(\mathbf{b}- \mathbf{A}\mathbf{x}) = \mathbf{b}- \mathbf{A}\mathbf{x}- [\mathbf{b}- \mathbf{A}\mathbf{x}_0]_+\) for any \(\mathbf{x}_0 \in X_0^*\), we can rewrite the Lagrangian dual function, defined in (6.8), as

$$\begin{aligned} {\theta }^+_\mathbf{c}(\mathbf{u}) = \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, \mathbf{c}^\mathrm{T}\mathbf{x}+ \mathbf{u}^\mathrm{T}\varvec{\omega }(\mathbf{b}- \mathbf{A}\mathbf{x}) \,\big \}. \end{aligned}$$

The following lemma follows from Property 6.2(a) and establishes that the value of \({\theta }^+_\mathbf{c}\) at \(\mathbf{u}\in \mathbb {R}^m\) depends solely on the component \(\varvec{\omega }(\mathbf{u})\) of \(\mathbf{u}\) that is perpendicular to \(\mathbf{v}^{*}\).

Lemma 6.3

(A characterization of a projected dual function) For any \(\mathbf{u}\in \mathbb {R}^m\), the equivalence \({\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{u})) = {\theta }^+_\mathbf{c}(\mathbf{u})\) holds. \(\square \)

Given constants \(\delta > 0\), \(p > 1\), and \(q > (p-1)^{-1} p\), we define the set

$$\begin{aligned} \textstyle U^{\delta }_{pq}&:= \big \{\, \mathbf{u}\in \mathbb {R}^m \,\big |\, \beta (\mathbf{u}) \ge q \delta ^2, \; \beta (\mathbf{u}) \ge p \delta ^2 \Vert \varvec{\omega }(\mathbf{u}) \Vert \,\big \} \end{aligned}$$
(6.12)

of vectors possessing a large enough norm and a small enough angle with the direction of \(\mathbf{v}^{*}\). The following lemma ensures that after a finite number N of iterations, all of the dual iterates \(\mathbf{u}^t\) are contained in the set \(U^{\delta }_{pq}\); it follows from Proposition 3.3 and Corollary 5.5.

Lemma 6.4

(The dual iterates are eventually in the set \(U^{\delta }_{pq}\)) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a) applied to the program (2.4), let the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and let \(\{\, \epsilon _t \,\} = \{\, 0 \,\}\). Then, for any constants \(\delta > 0\), \(p>1\), and \(q > (p-1)^{-1} p\) there is an \(N \ge 0\) such that \(\mathbf{u}^t \in U^{\delta }_{pq}\) for all \(t \ge N\). \(\square \)

Propositions 6.56.7 below demonstrate that, for \(p > 1\), \(q > (p-1)^{-1}p \), and a large enough value of \(\overline{\delta } > 0\), the condition \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\) implies certain relations between the function values \(\theta _f(\mathbf{u})\) and \({\theta }^+_\mathbf{c}(\mathbf{u})\), as well as between their respective conditional subdifferentials. First we establish the inclusion \(X_{f}^0(\mathbf{u}) \subseteq X_0(\mathbf{v}^{*})\) whenever \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\). We then show that the value \(\theta _f(\mathbf{u})\) of the Lagrangian dual function equals \(\beta (\mathbf{u}) \theta _0^{V_0^{*}}+ {\theta }^+_\mathbf{c}(\mathbf{u})\) whenever \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\).

Proposition 6.5

(Inclusion of the solution set) Let \(p>1\) and \(q > (p-1)^{-1} p\). There exists a constant \(\overline{\delta } > 0\) such that \(X_{f}^0(\mathbf{u}) \subseteq X_0(\mathbf{v}^{*})\) holds for all \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\).

Proof

For the case when \(X_0(\mathbf{v}^{*}) = X\) the proposition is immediate. Consider the case when \(X_0(\mathbf{v}^{*}) \subset X\). Denote by \(P_X\), \(P_{X_f^0(\mathbf{u})}\), and \(P_{X_0(\mathbf{v}^{*})}\) the (finite) sets of extreme points of X, \(X_f^0(\mathbf{u})\), and \(X_0(\mathbf{v}^{*})\), respectively. From (2.3) and [32, Ch. I.4, Def. 3.1] follow that \(X_f^0({\mathbf{u}})\) and \(X_0({\mathbf{v}^{*}})\) are faces of X, implying the relations \(P_{X_f^0(\mathbf{u})} \subseteq P_X\), \(\mathbf{u}\in \mathbb {R}^m\), and \(P_{X_0(\mathbf{v}^{*})} \subseteq P_X \subset X\). Hence, it suffices to show that \(P_{X_f^0(\mathbf{u})} \subseteq P_{X_0(\mathbf{v}^{*})}\) holds whenever \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\). Let \(\mathbf{x}_0^* \in P_{X_0(\mathbf{v}^{*})}\) and \(\overline{\mathbf{x}}\in {P}_X {\setminus } P_{X_0(\mathbf{v}^{*})}\) be arbitrary. Since the set \({P}_X\) is finite there exists a \(\overline{\delta }> 0\) such that the relations

$$\begin{aligned} \mathbf{c}^\mathrm{T}\left( \overline{\mathbf{x}}- \mathbf{x}_0^* \right)\ge & {} - \overline{\delta }, \end{aligned}$$
(6.13a)
$$\begin{aligned} - (\mathbf{v}^{*})^\mathrm{T}\mathbf{A}(\overline{\mathbf{x}}- \mathbf{x}_0^*)\ge & {} \overline{\delta }^{-1}, \end{aligned}$$
(6.13b)
$$\begin{aligned} \Vert \mathbf{A}\left( \overline{\mathbf{x}}- \mathbf{x}_0^* \right) \Vert\le & {} \overline{\delta } \end{aligned}$$
(6.13c)

hold. For any \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\) it then follows that

$$\begin{aligned} \mathcal {L}_f(\overline{\mathbf{x}},\mathbf{u}) - \mathcal {L}_f(\mathbf{x}_0^*,\mathbf{u})&= \mathbf{c}^\mathrm{T}(\overline{\mathbf{x}}- \mathbf{x}_0^*) - \beta (\mathbf{u})(\mathbf{v}^{*})^\mathrm{T}\mathbf{A}(\overline{\mathbf{x}}- \mathbf{x}_0^*) - \varvec{\omega }(\mathbf{u})^\mathrm{T}\mathbf{A}(\overline{\mathbf{x}}- \mathbf{x}_0^*) \end{aligned}$$
(6.14a)
$$\begin{aligned}&\ge - \overline{\delta }+ \beta (\mathbf{u}) {\overline{\delta }}^{-1} - \overline{\delta }\Vert \varvec{\omega }(\mathbf{u}) \Vert \end{aligned}$$
(6.14b)
$$\begin{aligned}&\ge \overline{\delta }\big ( q p^{-1} [p-1] - 1 \big ) \,>\, 0, \end{aligned}$$
(6.14c)

where (6.14a) follows from (6.11), (6.14b) from (6.13) and Cauchy-Schwartz inequality, and (6.14c) from the definition (6.12) and the assumptions made. It follows that \(\overline{\mathbf{x}}\not \in X_{f}^0(\mathbf{u})\), which then implies that \(X_{f}^0(\mathbf{u}) \subseteq X_0(\mathbf{v}^{*})\). The proposition follows. \(\square \)

In the analysis to follow we choose \(\overline{\delta }> 0\) such that the inclusion in Proposition 6.5 holds.

Proposition 6.6

(A decomposition of the dual function) Let \(p>1\) and \(q > (p-1)^{-1} p\). For every \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\) the identity \(\theta _{f}(\mathbf{u}) = \beta (\mathbf{u}) \theta _0^{V_0^{*}}+ {\theta }^+_\mathbf{c}(\mathbf{u})\) holds.

Proof

The result follows since Proposition 6.5, (6.11), (4.3), and Property 6.2(a) yield the equalities

$$\begin{aligned} \quad \theta _f(\mathbf{u})&= \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, \mathbf{c}^\mathrm{T}\mathbf{x}+ \mathbf{u}^\mathrm{T}(\mathbf{b}- \mathbf{A}\mathbf{x}) \,\big \}\\ {}&= \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, \mathbf{c}^\mathrm{T}\mathbf{x}+ \big ( \mathbf{v}^{*}\beta (\mathbf{u}) + \varvec{\omega }(\mathbf{u}) \big )^\mathrm{T}(\mathbf{b}- \mathbf{A}\mathbf{x}) \,\big \} \\&= \theta _0^{V_0^{*}}\beta (\mathbf{u}) + \min _{\mathbf{x}\in X_0(\mathbf{v}^{*})} \big \{\, \mathbf{c}^\mathrm{T}\mathbf{x}+ \varvec{\omega }(\mathbf{u})^\mathrm{T}(\mathbf{b}-\mathbf{A}\mathbf{x}) \,\big \} = \theta _0^{V_0^{*}}\beta (\mathbf{u}) + {\theta }^+_\mathbf{c}(\mathbf{u}). \quad \quad \;\; \end{aligned}$$

\(\square \)

We next establish that if \(\varvec{\gamma }\,\) is a conditional (with respect to \(\mathbb {R}^m_+\)) subgradient to \(\theta _f\) at \(\mathbf{u}\in \mathbb {R}^m_+\), where \(\mathbf{u}\) has a sufficiently large norm and a sufficiently small component \(\varvec{\omega }(\mathbf{u})\) (being orthogonal to \(\mathbf{v}^{*}\)), then \(\varvec{\omega }(\varvec{\gamma }\,)\) is a conditional [with respect to \(R(\mathbf{v}^{*})\); see (6.9)] subgradient of \({\theta }^+_\mathbf{c}\) at \(\varvec{\omega }(\mathbf{u}) \in R(\mathbf{v}^{*})\).

Proposition 6.7

(Conditional subgradients of a projected dual function) Let \(p\!>\!1\) and \(q > (p-1)^{-1} p\). For each \(\mathbf{u}\in U^{\overline{\delta }}_{pq}\) and \(\varvec{\gamma }\,\in \partial ^{\mathbb {R}^m_+}_0 \theta _f(\mathbf{u})\) the inclusion \(\varvec{\omega }(\varvec{\gamma }\,) \in \partial ^{R(\mathbf{v}^{*})}_0 {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{u}))\) holds.

Proof

Let \(\mathbf{v}\in R(\mathbf{v}^{*})\) and choose \(\varrho \ge 0\) such that \(\mathbf{v}+ \varrho \mathbf{v}^{*}\in U^{\overline{\delta }}_{pq}\). From Lemma 6.3, Proposition 6.6, and Property 6.2(c) follow that the equalities

$$\begin{aligned} \theta _f(\mathbf{v}+ \varrho \mathbf{v}^{*}) - \theta _f(\mathbf{u})&= \big [\, \beta (\mathbf{v}+ \varrho \mathbf{v}^{*}) - \beta (\mathbf{u}) \,\big ] \theta _0^{V_0^{*}}+ {\theta }^+_\mathbf{c}(\mathbf{v}+ \varrho \mathbf{v}^{*}) - {\theta }^+_\mathbf{c}(\mathbf{u}) \end{aligned}$$
(6.15a)
$$\begin{aligned}&= \big [\, (\mathbf{v}-\mathbf{u})^\mathrm{T}\mathbf{v}^{*}+ \varrho \,\big ] \theta _0^{V_0^{*}}+ {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{v})) - {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{u})) \end{aligned}$$
(6.15b)

hold. Since \(\mathbf{u}\in U^{\overline{\delta }}_{pq} \subset \mathbb {R}^m_+\), \(\mathbf{v}+ \varrho \mathbf{v}^{*}\in U^{\overline{\delta }}_{pq}\), and \(\varvec{\gamma }\,\in \partial ^{\mathbb {R}^m_+}_0 \theta _f(\mathbf{u})\) it follows that

$$\begin{aligned} \theta _f(\mathbf{v}+ \varrho \mathbf{v}^{*}) - \theta _f(\mathbf{u})&\le \varvec{\gamma }\,^\mathrm{T}\left( \mathbf{v}+ \varrho \mathbf{v}^{*}- \mathbf{u}\right) \end{aligned}$$
(6.16a)
$$\begin{aligned}&= \varvec{\gamma }\,^\mathrm{T}\big [\, \varvec{\omega }(\mathbf{v}+ \varrho \mathbf{v}^{*}) - \varvec{\omega }(\mathbf{u}) \,\big ] + \varvec{\gamma }\,^\mathrm{T}\big [\, \beta (\mathbf{v}+ \varrho \mathbf{v}^{*}) - \beta (\mathbf{u}) \,\big ] \mathbf{v}^{*} \end{aligned}$$
(6.16b)
$$\begin{aligned}&= \varvec{\omega }(\varvec{\gamma }\,)^\mathrm{T}\big [\, \varvec{\omega }(\mathbf{v}) - \varvec{\omega }(\mathbf{u}) \,\big ] + \varvec{\gamma }\,^\mathrm{T}\big [\, (\mathbf{v}- \mathbf{u})^\mathrm{T}\mathbf{v}^{*}+ \varrho \,\big ] \mathbf{v}^{*}, \end{aligned}$$
(6.16c)

where (6.16a) follows from (2.8), (6.16b) from (6.11), and (6.16c) from Property 6.2. Combining the relations in (6.15) and (6.16) yields the inequality

$$\begin{aligned} {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{v})) - {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{u})) \le \varvec{\omega }(\varvec{\gamma }\,)^\mathrm{T}\big [ \varvec{\omega }(\mathbf{v}) - \varvec{\omega }(\mathbf{u}) \big ] + \big [ (\mathbf{v}- \mathbf{u})^\mathrm{T}\mathbf{v}^{*}+ \varrho \big ] \Big (\! \varvec{\gamma }\,^\mathrm{T}\mathbf{v}^{*}\!-\! \theta _0^{V_0^{*}}\!\Big ). \end{aligned}$$
(6.17)

Since \(\varvec{\gamma }\,\in \partial ^{\mathbb {R}^m_+}_0 \theta _f(\mathbf{u})\) and (according to Proposition 6.5) \(X_{f}^0(\mathbf{u})\subseteq X_0(\mathbf{v}^{*})\), it then follows that \(\theta _0^{V_0^{*}}= \min _{\mathbf{x}\in X} \big \{\, (\mathbf{b}-\mathbf{A}\mathbf{x})^\mathrm{T}\mathbf{v}^{*}\,\big \} = \varvec{\gamma }\,^\mathrm{T}\mathbf{v}^{*}\). By inserting this into (6.17), and utilizing Property 6.2(a) and (c), and Lemma 6.3, we then receive the inequality

$$\begin{aligned} {\theta }^+_\mathbf{c}(\mathbf{v}) - {\theta }^+_\mathbf{c}(\varvec{\omega }(\mathbf{u})) \le \varvec{\omega }(\varvec{\gamma }\,)^\mathrm{T}\big [\, \mathbf{v}- \varvec{\omega }(\mathbf{u}) \,\big ]. \end{aligned}$$

The proposition then follows since \(\mathbf{v}\in R(\mathbf{v}^{*})\). \(\square \)

We now define the sequences \(\{\, \varvec{\omega }^t \,\}\) and \(\{\, \varvec{\omega }^{t + \frac{1}{2}} \,\}\) according to

$$\begin{aligned} \varvec{\omega }^t := \varvec{\omega }(\mathbf{u}^t) \quad \text {and} \quad \varvec{\omega }^{t + \frac{1}{2}} := \varvec{\omega }^t + \alpha _t \varvec{\omega }(\mathbf{b}- \mathbf{A}\mathbf{x}_f^0(\mathbf{u}^t) - \varvec{\eta }^t), \quad t = 0,1,\ldots . \end{aligned}$$
(6.18)

In each iteration, t, the intermediate iterate \(\varvec{\omega }^{t + \frac{1}{2}}\) is the result of a step in the direction of \(\varvec{\omega }(\mathbf{b}- \mathbf{A}\mathbf{x}_f^0(\mathbf{u}^t) - \varvec{\eta }^t)\). The vector \(\mathbf{b}- \mathbf{A}\mathbf{x}_f^0(\mathbf{u}^t) - \varvec{\eta }^t \in \mathbb {R}^m\) is a conditional (with respect to \(\mathbb {R}^m_+\)) subgradient to the Lagrangian dual function (2.2), so by Proposition 6.7 the vector \(\varvec{\omega }(\mathbf{b}- \mathbf{A}\mathbf{x}_f^0(\mathbf{u}^t) - \varvec{\eta }^t)\) is a conditional [with respect to \(R(\mathbf{v}^{*})\); see (6.9)] subgradient to the dual function (6.8) for large enough values of t. To show that the formula (6.18) actually defines a conditional [with respect to \(R(\mathbf{v}^{*})\)] subgradient algorithm, we must also show that \(\varvec{\omega }^{t+1} = \mathrm{proj}(\varvec{\omega }^{t+\frac{1}{2}};\,R(\mathbf{v}^{*}))\).

Proposition 6.8

(A subgradient method for the projected dual function) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a) applied to (6.6), and the sequences \(\{\, \varvec{\omega }^t \,\}\) and \(\big \{\, \varvec{\omega }^{t + \frac{1}{2}} \,\big \}\) by (6.18); let the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded and let \(\{\, \epsilon _t \,\} = \{\, 0 \,\}\). Then, there exists an \(N \ge 0\) such that \(\mathrm{proj}(\varvec{\omega }^{t+\frac{1}{2}};\,R(\mathbf{v}^{*})) = \varvec{\omega }^{t+1}\) for all \(t \ge N\).

Proof

By (6.18), (6.11), and (2.11), \(\varvec{\omega }^{t + \frac{1}{2}} = \varvec{\omega }(\mathbf{u}^{t + \frac{1}{2}})\) holds for all \(t \ge 0\). Define \(\overline{\varvec{\omega }}^t := \mathrm{proj}(\varvec{\omega }^{t + \frac{1}{2}}; \, R(\mathbf{v}^{*}))\) and note that \(\omega _i(\mathbf{u}) = u_i - v_i^* (\mathbf{v}^{*})^\mathrm{T}\mathbf{u}\) holds for all \(i \in \mathcal {I}\) and all \(\mathbf{u}\in \mathbb {R}^m\).

Consider \(i \in \mathcal {I} {\setminus } \mathcal {I}_+(\mathbf{v}^{*})\), so that \(v_i^* = 0\). By (6.9), (6.11), (2.11), and (6.18) follow that

$$\begin{aligned} \overline{\omega }_i^t = \big [ {\omega _i}^{t+\frac{1}{2}} \big ]_+ = \big [ \omega _i(\mathbf{u}^{t+\frac{1}{2}}) \big ]_+ = \big [ {u_i}^{t+\frac{1}{2}} \big ]_+ = u_i^{t+1} = \omega _i(\mathbf{u}^{t+1}) = \omega _i^{t+1}, \quad t \ge 0. \end{aligned}$$

Consider \(i \in \mathcal {I}_+(\mathbf{v}^{*})\), so that \(v_i^* > 0\). Due to (6.9) and (6.11) it then holds that

$$\begin{aligned} \overline{\omega }_i^t&= {\omega _i}^{t+\frac{1}{2}} = \omega _i(\mathbf{u}^{t+\frac{1}{2}}) = {u_i}^{t+\frac{1}{2}} - v_i^* (\mathbf{v}^{*})^\mathrm{T}\mathbf{u}^{t+\frac{1}{2}}\\&= {u_i}^{t+\frac{1}{2}} - v_i^* \textstyle \sum _{j \in \mathcal {I}_+(\mathbf{v}^{*})} v_j^* {u_j}^{t+\frac{1}{2}}, \quad t \ge 0. \end{aligned}$$

For all \(t \ge N\), where \(N \ge 0\) is large enough, the relations \(0 < u_i^{t+1} = {u_i}^{t+\frac{1}{2}}\) hold, implying that \(\overline{\omega }_i^t = u_i^{t+1} - v_i^* \sum _{j \in \mathcal {I}_+(\mathbf{v}^{*})} v_j^* u_j^{t+1} = \omega _i^{t+1}\), where the latter identity is due to (6.18).

We conclude that \(\overline{\varvec{\omega }}^t = \varvec{\omega }^{t+1}\) for all \(t \ge N\), and the proposition follows. \(\square \)

We summarize the development made in this section. Associated with the sequence \(\{\, \mathbf{u}^t \,\} \subset \mathbb {R}^m_+\) of dual iterates resulting from a conditional subgradient scheme for maximizing \(\theta _f\) over \(\mathbb {R}^m_+\), we define in (6.18) a sequence \(\{\, \varvec{\omega }^t \,\} \equiv \{\, \varvec{\omega }(\mathbf{u}^t) \,\} \subset R(\mathbf{v}^{*})\) of iterates corresponding to the function \({\theta }^+_\mathbf{c}\). Proposition 6.7 shows that a conditional (with respect to \(\mathbb {R}^m_+\)) subgradient of \(\theta _f\) at \(\mathbf{u}^t \in \mathbb {R}^m_+\) can be mapped to a conditional [with respect to \(R(\mathbf{v}^{*})\)] subgradient of \({\theta }^+_\mathbf{c}\) at \(\varvec{\omega }^t \in R(\mathbf{v}^{*})\). Then, Proposition 6.8 shows that for a large enough value of t the projection of \(\mathbf{u}^{t+\frac{1}{2}}\) onto \(\mathbb {R}^m_+\) in (2.11) has a one-to-one correspondence with the projection of \(\varvec{\omega }^{t+\frac{1}{2}}\) onto the set \(R(\mathbf{v}^{*})\) [defined in (6.9)].

We are now prepared to establish the remaining part of the main result of this article.

Proof of Theorem 2.3(b)

The case when \(\{\, \mathbf{x}\in X \,|\, \mathbf{A}\mathbf{x}\ge \mathbf{b}\,\} \ne \emptyset \) is treated in Proposition 2.2.

Assume that \(\{\, \mathbf{x}\in X \,|\, \mathbf{A}\mathbf{x}\ge \mathbf{b}\,\} = \emptyset \). By Lemma 6.4 and Propositions 6.56.8, there is an integer \(N \ge 0\) such that the sequence \(\{\, \varvec{\omega }^t \,\}_{t \ge N}\) is the result of a conditional [with respect to \(R(\mathbf{v}^{*})\)] subgradient method applied to the Lagrangian dual (6.10) of the linear program (6.7), which has a nonempty and bounded feasible set. It then follows from Proposition 2.2 that \(\mathrm{dist}(\overline{\mathbf{x}}_{f}^t; X_{f}^*) \rightarrow 0\) as \(t \rightarrow \infty \). The theorem follows. \(\square \)

This proof of Theorem 2.3(b) contains no explicit reference to any particular choice of the weights defining the ergodic sequence (2.14). Although this article is written with reference to the formula (2.14), the result of Theorem 2.3(b) will be valid for any ergodic sequence of primal iterates that is convergent for consistent programs (see, e.g., [14]), assuming that the corresponding version of Theorem 5.4 can be established.

7 Illustrations and a separation result

We next present an example which numerically illustrates the main findings made in this article and, finally, a separation result following from our analysis.

Fig. 1
figure 1

Illustration in the primal space for the linear programming instance (7.1) of the program (2.1). The primal ergodic iterates \(\overline{\mathbf{x}}^t_f\), \(t = 1, \ldots , 30\), (circle) tend to the optimal set \(X_0^* = \big \{\, (4, \frac{18}{5})^\mathrm{T}\,\big \}\) (bullet)

7.1 A numerical example

Consider the linear programming instance of the program (2.1) given by that to

$$\begin{aligned}&\text {minimize} \; \quad 4x_1 + 2x_2 , \end{aligned}$$
(7.1a)
$$\begin{aligned}&\text {subject to} \; \quad x_1 - \;\;x_2 \ge 2, \end{aligned}$$
(7.1b)
$$\begin{aligned}& -x_1 + 2x_2 \ge 4, \end{aligned}$$
(7.1c)
$$\begin{aligned}&\mathbf{x}\in X := [0, 4]^2. \end{aligned}$$
(7.1d)

Here \(\big \{\, \mathbf{x}\in X \,\big |\, \mathbf{A}\mathbf{x}\ge \mathbf{b}\,\big \} = \emptyset \); see illustration in Fig. 1; the corresponding cone \(C = \big \{\, \mathbf{w}\in \mathbb {R}^2_+ \,\big |\, 2 w_2< 3 w_1 < 12 w_2 \,\big \}\) is illustrated in Fig. 2. It is straightforward to show that

Fig. 2
figure 2

Illustration in the Lagrangian dual space of the instance (7.1) of (2.1). Left the cone C (shaded), the dual iterates \(\mathbf{u}^t\), \(t = 1, \ldots , 30\), (circle) and the direction of \(\mathbf{v}^*\) (dashed line). Right the scaled dual iterates \(\mathbf{v}^t\), \(t = 1, \ldots , 30\), (circle) and the vector \(\mathbf{v}^*\) (dashed line)

$$\begin{aligned} X_0^* = X_f^*&= \left\{ \, (4, {\textstyle \frac{18}{5}})^\mathrm{T}\,\right\} , \quad \mathbf{v}^{*}= {\textstyle \frac{1}{\sqrt{5}}} (2, 1)^\mathrm{T}, \quad \text {and} \quad \theta _0^{V_0^{*}}= {\textstyle \frac{4}{\sqrt{5}}}. \end{aligned}$$

We computed the sequence \(\{\mathbf{u}^t\}\) by the method (2.11), with \(\mathbf{u}^0 = \varvec{\eta }^t = (0, 0)^\mathrm{T}\) and \(\alpha _t = \frac{10}{1+t}\).

Figure 2 illustrates the dual iterates \(\mathbf{u}^t\) (left) and the scaled dual iterates \(\mathbf{v}^t\) (right), for \(t = 0, \ldots , 30\); the dashed lines indicate (the direction of) the vector \(\mathbf{v}^* = \frac{1}{\sqrt{5}} (2, 1)^\mathrm{T}\). The sequence \(\{\mathbf{u}^t\}\) diverges in the direction of \(\mathbf{v}^*\) and the sequence \(\{\mathbf{v}^t\}\) converges to \(\mathbf{v}^*\).

In Figure 1 the ergodic primal iterates \(\overline{\mathbf{x}}^t_f\) [defined in (2.14)] are illustrated for \(t = 1, \ldots , 30\). The sequence \(\{\overline{\mathbf{x}}^t_f\}\) converges to the singleton set \(X_0^* = \big \{\, ( 4, {\textstyle \frac{18}{5}} )^\mathrm{T}\,\big \}\).

The appearence of the solution set \(X_0^* = \mathrm{arg\,min}_{\mathbf{x}\in X} \{\, \Vert [\mathbf{g}(\mathbf{x})]_+ \Vert \,\}\) depends on the scaling of the constraints in (2.1b), as demonstrated next. The set \(X_0^* = \big \{\, (4, \frac{18}{5})^\mathrm{T}\,\big \}\) is illustrated in Fig. 3 (left). Scaling the constraint (7.1c) to “\({\textstyle -\frac{1}{4}x_1 + \frac{1}{2}x_2 \ge 1}\)” yields the solution set \(X_0^* = \big \{\, (4, \frac{12}{5})^\mathrm{T}\,\big \}\), which is illustrated in Fig. 3 (right).

Fig. 3
figure 3

Illustration of the set \(X_0^*\) (bullet) for two scalings of the constraint (7.1c) of the problem instance (7.1). Left \(X_0^* = \big \{\, (4, \frac{18}{5})^\mathrm{T}\,\big \}\). Right \(X_0^* = \big \{\, (4, \frac{12}{5})^\mathrm{T}\,\big \}\)

7.2 Finite attainment of a separating hyper-surface

For the case when the set X is nonempty, closed, and convex, and all functions \(g_i\) are affine, we have previously, in [24, Cor. 6.4] (see also Cor. 3.24 in the survey [27]), utilized ergodic sequences of subgradient optimization based underestimating affine functions to finitely detect inconsistency and identify a separating hyper-plane.

We now return to our original setting of convex functions f and \(g_i\), \(i \in \mathcal {I}\), and a nonempty, convex and compact set X. Provided that the feasible set \(\{\, \mathbf{x}\in X \,|\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\}\) is empty and that the sets X and \(Y := \big \{\, \mathbf{x}\in \mathbb {R}^n \,\big |\, \mathbf{g}(\mathbf{x}) \le \mathbf{0}^m\,\big \}\) are both nonempty, a hyper-surface that strongly separates the sets X and Y can be identified in a finite number of steps.

Theorem 7.1

(Finite attainment of a separating hyper-surface) Let the sequence \(\{\, \mathbf{u}^t \,\}\) be generated by the method (2.11), (2.13a), (2.13c) applied to the program (2.4), the sequence \(\{\, \varvec{\eta }^t \,\}\) be bounded, and the sequence \(\{\, \mathbf{v}^t \,\}\) be defined by (5.1b). Suppose that the sets X and Y are nonempty, but \(X \cap Y = \emptyset \). Then there exists an integer \(N\ge 0\) such that the hyper-surface \(H(\mathbf{v}^t) := \big \{\, \mathbf{x}\in \mathbb {R}^n \,\big |\, \mathbf{g}(\mathbf{x})^\mathrm{T}\mathbf{v}^t = \frac{1}{2} \theta _{0}^{{V_0^{*}}} \,\big \}\) strongly separates the sets X and Y for all \(t\ge N\).

Proof

Since \(\mathbf{v}^t \ge \mathbf{0}^m\) for all t, it holds that \(\mathbf{g}(\mathbf{x})^\mathrm{T}\mathbf{v}^t \le 0\) for all \(\mathbf{x}\in Y\) and all t. From (2.4) and (2.2) it follows that \(\mathbf{g}(\mathbf{x})^\mathrm{T}\mathbf{v}^t \ge \theta _0(\mathbf{v}^t)\) for all \(\mathbf{x}\in X\). From Proposition 3.1 it follows that the relations \(\emptyset \not = C \subset \mathbb {R}^m_+\) hold and also that the strict inequality \(\theta _0(\mathbf{w}) > 0\) holds for all \(\mathbf{w}\in C\). Since \(C \cap V \not = \emptyset \), (4.2) yields that \(\theta _{0}^{{V_0^{*}}} > 0\). Moreover, by Theorem 5.4, \(\theta _0(\mathbf{v}^t) \rightarrow \theta _{0}^{{V_0^{*}}}\) as \(t \rightarrow \infty \). Hence, there is an \(N \ge 0\) such that \(\theta _0(\mathbf{v}^t) > \frac{1}{2} \theta _{0}^{{V_0^{*}}}\) for all \(t \ge N\), and it follows that \(\mathbf{g}(\mathbf{x})^\mathrm{T}\mathbf{v}^t> \frac{1}{2} \theta _{0}^{{V_0^{*}}} > 0\) for all \(\mathbf{x}\in X\) and all \(t \ge N\). The theorem follows. \(\square \)

8 Conclusions and further research

In this work we apply a conditional subgradient optimization algorithm to the Lagrange dual of a (possibly) inconsistent convex program and compute an associated primal ergodic sequence. We establish the convergence of the resulting dual and primal sequences to the set of saddle-points for the Lagrange function; for the special case of linear programming the primal ergodic sequence converges to the minimum of the primal objective function over the primal part of the set of saddle-points. The stronger result for the case of linear programming is explained by the fact that the corresponding selection problem (6.7) is guaranteed to possess Lagrange multipliers.Footnote 3

The convergence rate for both the primal and the dual sequences can probably be improved via more careful choices of weights for the sequence of weighted averages of primal subproblem solutions, e.g., as suggested in [14] for consistent convex programs.

Another interesting subject for further study is the behavior of the primal–dual sequences when applying other solution schemes, such as bundle methods (see, e.g., [30]) and augmented Lagrangian methods (see, e.g., [6, 38]), to the Lagrangian dual problem.