1 Introduction and motivation

Lagrangian relaxation is a frequently utilized tool for solving large-scale convex minimization problems due to its simplicity and its property of systematically providing optimistic estimates on the optimal value. One popular tool for solving the dual problems of Lagrangian relaxation schemes is subgradient optimization. The advantage of subgradient methods is that they often find near-optimal dual solutions quickly, whilst a drawback is that near-optimal primal feasible solutions can not, in general, be obtained directly from the subgradient scheme. As the dual iterates in a subgradient scheme converge towards an optimal dual solution, primal convergence towards a near-optimal primal solution is not in general achieved by simply using the subproblem solutions as primal iterates. Even with a dual optimal solution at hand, an optimal primal solution can not easily be obtained. The reason for this inconvenience is that the dual objective function is typically nonsmooth at an optimal point, whence an optimal primal solution is a nontrivial convex combination of the extreme subproblem solutions.

This paper analyzes what is called ergodic sequences by Larsson et al. [30] or recovering primal solutions by Sherali and Choi [42]; we will use the notion ergodic sequences. To guarantee primal convergence for a linear program in a subgradient scheme, Shor [44, Chapter 4] and Larsson and Liu [27] (originally developed in [26]) utilize a strategy which, rather than using the subproblem solution as primal iterate, uses a convex combination of previously found subproblem solutions, denoted as an ergodic sequence. In [44, Chapter 4] the convex combinations are determined by the step lengths used in the subgradient scheme, while in [27, Theorem 3] the mean of the iterates previously found are used. These results are extended in [42] to a more general case of convex combinations and step lengths in the subgradient algorithm applied to linear programs. Larsson et al. [30] show that the convex combinations used in [44] and [27] yield primal convergence also for general convex optimization problems.

Several other methods for generating approximate primal solutions in a subgradient scheme have been studied. Barahona and Anbil [7] propose a method for approximating the solution to a linear program by utilizing a subgradient method in which a primal solution is created as a convex combination of the previous solution and the primal iterate obtained from the subgradient method. The method is denoted the volume algorithm and was revisited by Bahiense et al. [4] and Sherali and Lim [43], where they extended it to include more information in the dual scheme. Nesterov [34] analyzes a primal–dual subgradient method where a primal feasible approximation to the optimum is obtained by using control sequences in both the primal and dual space. Nedić and Ozdaglar [32, 33] study methods which utilize the average of all previously found iterates as primal solutions. The latter algorithms employ a constant step length due to its simplicity and practical significance. For a more thorough overview of the history of strategies for the construction of primal iterates in dual subgradient schemes, see Anstreicher and Wolsey [1].

This paper generalizes the results in [42] to the class of convex programs, and extends the results in [30] to include more general convex combinations in the definition of the ergodic sequences. We present a new set of rules for constructing the convexity weights defining the ergodic sequence of primal iterates. In contrast to rules previously utilized, they exploit more information from later subproblem solutions than from earlier ones. We evaluate the new rules on a set of nonlinear multicommodity flow problems (NMFPs) and show that they clearly outperform the previously utilized ones.

The remainder of this paper is organized as follows. In Sect. 2 we introduce some basic concepts regarding Lagrangian relaxation and subgradient methods. In Sect. 3, we describe the notion of primal ergodic sequences and present an important theorem regarding their convergence when considering general convex problems. Section 3 also includes a new set of rules for choosing convexity weights when defining the ergodic sequences. The final part of Sect. 3 includes a taxonomy of previous results and their connection to the results presented in this paper. In Sect. 4 we introduce the NMFP and describe a solution approach based on Lagrangian relaxation. Computational results for a set of NMFP test instances employing the new rules for choosing the convexity weights are presented in Sect. 5. Conclusions are then drawn in Sect. 6.

2 Background

Let \(f: {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) and \(h_i: {\mathbb {R}}^n\rightarrow {\mathbb {R}}, i\in \mathcal {I} := \{1, \ldots , m\}\), be convex and (possibly) nonsmooth functions and the set \(X\subset {\mathbb {R}}^n\) be convex and compact. Consider the program

$$\begin{aligned} f^*\!:= \text {minimum} \quad \;\;f(\mathbf{x})&,\end{aligned}$$
(1a)
$$\begin{aligned} \text {subject to } \quad h_{i}(\mathbf{x})&\le 0, \quad i\in \mathcal {I},\end{aligned}$$
(1b)
$$\begin{aligned} \,\,\,\quad \qquad \mathbf{x}&\in X , \end{aligned}$$
(1c)

with solution set \(X^*\subset {\mathbb {R}}^n\). We assume that the set \(X\) is simple and that the feasible set \(\{\mathbf{x}\in X\; | \; h_i(\mathbf{x})\le 0,\,\, i\in \mathcal {I}\}\) is nonempty, implying that the solution set \(X^*\) is also nonempty. We define the Lagrange function \(\mathcal {L}: {\mathbb {R}}^n\times {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) with respect to the relaxation of the constraints (1b) as \(\mathcal {L}(\mathbf{x},\mathbf{u}) := f(\mathbf{x}) + \mathbf{u}^T\mathbf{h}(\mathbf{x})\) for \((\mathbf{x}, \mathbf{u})\in {\mathbb {R}}^n\times {\mathbb {R}}^m\). The dual objective function \(\theta : {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} \theta (\mathbf{u}) := \min _{\mathbf{x}\in X} \left[ f(\mathbf{x}) + \mathbf{u}^T\mathbf{h}(\mathbf{x})\right] , \quad \mathbf{u}\in {\mathbb {R}}^m. \end{aligned}$$
(2)

The set \(X\) is compact which implies that \(\theta \) is continuous [8, Theorem 6.3.1] on \({\mathbb {R}}^m\). The function \(\theta \) is also concave on \({\mathbb {R}}^m\) and the nonempty and compact solution set for the subproblem in (2) at \(\mathbf{u}\in {\mathbb {R}}^m\) is

$$\begin{aligned} X(\mathbf{u}) := \left\{ \left. \mathbf{x}\in X \;\right| \; f(\mathbf{x}) + \mathbf{u}^T\mathbf{h}(\mathbf{x}) \le \theta (\mathbf{u}) \right\} \!. \end{aligned}$$
(3)

For \(\mathbf{u}\in {\mathbb {R}}^m_+\), the set \(X(\mathbf{u})\) is also convex. The Lagrangian dual problem is defined as

$$\begin{aligned} \theta ^*\! := \underset{\mathbf{u}\in {\mathbb {R}}^m_+}{\text {supremum}}\;\;\theta (\mathbf{\mathbf{u}}), \end{aligned}$$
(4)

whose solution set \(U^*\subseteq {\mathbb {R}}^m_+\) is convex. By weak duality, the inequality \(\theta (\mathbf{u})\le f(\mathbf{x})\) holds for all \(\mathbf{u}\in {\mathbb {R}}^m_+\) and all \(\mathbf{x}\in X\) such that \(\mathbf{h}(\mathbf{x})\le \mathbf{0}^m\) [8, Theorem 6.2.1].

Let \(S\) be a nonempty, closed, and convex set. We define the projection and distance operators by

$$\begin{aligned} {\mathrm {proj}}\,(\mathbf{x}, S) :={\mathop {{{\mathrm{{\mathrm {argmin}}}}}}\limits _{{\mathbf{y}\in S}}} \Vert \mathbf{y}-\mathbf{x}\Vert _2 \quad \text {and} \quad {{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}, S) :=\min _{\mathbf{y}\in S}\Vert \mathbf{y}-\mathbf{x}\Vert _2. \end{aligned}$$
(5)

Note that the function \({{\mathrm{{\mathrm {dist}}}}}(\cdot , S)\) is convex and continuous. We denote by closed map a point-to-set map \(X : {\mathbb {R}}^m\rightarrow 2^{{\mathbb {R}}^n}\) such that when \(\{\mathbf{u}^t\}\subset {\mathbb {R}}^m,\,\{\mathbf{u}^t\}\rightarrow \mathbf{u},\,\mathbf{x}^t\in X(\mathbf{u}^t)\) for all \(t\), and \(\{\mathbf{x}^t\}\rightarrow \mathbf{x}\), this implies \(\mathbf{x}\in X(\mathbf{u})\). The following result states that the point-to-set map \(X(\cdot )\) defined in (3) is a closed map.

Lemma 1

(\(X( \cdot )\) is a closed map [30, Lemma 1]) Let the sequence \(\{ \mathbf{u}^t\}\subset {\mathbb {R}}^{m}\), the map \(X(\cdot ) : {\mathbb {R}}^m\rightarrow 2^X\) be given by the definition (3), and the sequence \(\{\mathbf{x}^t\}\) be given by the inclusion \(\mathbf{x}^t\in X(\mathbf{u}^t)\). If \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}\in {\mathbb {R}}^m\), then \({{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}^t, X(\mathbf{u}))\rightarrow 0\). If, in addition, \(X(\mathbf{u}) = \{\mathbf{x}\}\), then \(\{\mathbf{x}^t\}\rightarrow \mathbf{x}\).

For each \(\mathbf{u}\in {\mathbb {R}}^m\), we define the set of indices corresponding to strictly positive multiplier values as

$$\begin{aligned} \mathcal {I}(\mathbf{u}) := \left\{ i\in \mathcal {I} \mid u_i>0\right\} . \end{aligned}$$

Lemma 2

(affineness of the Lagrange function [30, Lemma 2]) The functions \(f\) and \(h_i,\,i\in \mathcal {I}(\mathbf{u})\), are affine on \(X(\mathbf{u})\) for every \(\mathbf{u}\in {\mathbb {R}}^m_+\). Further, if the function \(f\) is (the functions \(h_i\), \(i\in \mathcal {I}(\mathbf{u})\), are) differentiable, then \(\nabla f\) is ( \(\nabla h_i\), \(i\in \mathcal {I}(\mathbf{u})\) are) constant on \(X(\mathbf{u})\).

From Lemma 2 follows that for all \(\mathbf{u}\in {\mathbb {R}}^m_+\) and every \(i\in \mathcal {I}(\mathbf{u})\), \(\partial h_i\) is constant on \({\mathrm {rint}}\,\, X(\mathbf{u})\); hence for every \(\overline{\mathbf{x}}\in {\mathrm {rint}}\,\, X(\mathbf{u})\), each subgradient \(\xi \in \partial h_i(\overline{\mathbf{x}})\) defines a hyperplane that supports the function \(h_i\) at every x \(\in {\mathrm {rint}}\,X(\mathbf{u})\). We define the subdifferential of \(\theta \) at \(\mathbf{u}\in {\mathbb {R}}^m\) as the set

$$\begin{aligned} \partial \theta (\mathbf{u}) := \left\{ \left. \varvec{\gamma } \in {\mathbb {R}}^m \;\right| \; \theta (\mathbf{v}) \le \theta (\mathbf{u}) + \varvec{\gamma }^T(\mathbf{v} - \mathbf{u}), \quad \mathbf{v}\in {\mathbb {R}}^m\right\} \!. \end{aligned}$$

Proposition 1

(subdifferential to the dual function [30, Proposition 1]) For each \(\mathbf{u}\in {\mathbb {R}}^m\), it holds that \(\partial \theta (\mathbf{u}) = \{\mathbf{h}(\mathbf{x}) \;|\; \mathbf{x}\in X(\mathbf{u})\}\). Further, \(\theta \) is differentiable at \(\mathbf{u}\) if and only if each \(h_i\) is constant on \(X(\mathbf{u})\), in which case \(\nabla \theta (\mathbf{u}) = \mathbf{h}(\mathbf{x})\) for all \(\mathbf{x}\in X(\mathbf{u})\).

To obtain primal–dual optimality relations, we assume Slater’s constraint qualification as stated in Assumption 1.

Assumption 1

(Slater constraint qualification) The set \(\{\,\mathbf{x}\in X \,|\, \mathbf{h}(\mathbf{x})<\mathbf{0}^m\,\}\) is nonempty.

Under Assumption 1, the solution set \(U^*\) is nonempty and compact and, by strong duality, the equality \(\theta (\mathbf{u}^*) = f(\mathbf{x}^*)\) holds for some pair of primal–dual solutions \((\mathbf{x}^*, \mathbf{u}^*)\) fulfilling \(\mathbf{u}^*\in {\mathbb {R}}^m_+,\,\mathbf{x}^*\in X\) and \(\mathbf{h}(\mathbf{x}^*)\le \mathbf{0}^m\) ([8, Theorem 6.2.5]).

Proposition 2

(optimality conditions, [8, Theorem 6.2.5]) Let Assumption 1 hold. Then, \(\mathbf{u}\in U^*\) and \(\mathbf{x}\in X^*\) if and only if \(\mathbf{u}\in {\mathbb {R}}^m_+,\,\mathbf{x}\in X(\mathbf{u}),\,\mathbf{h}(\mathbf{x})\le \mathbf{0}^m\) and \(\mathbf{u}^T\mathbf{h}(\mathbf{x}) = 0\).

2.1 Subgradient optimization

We consider solving the Lagrangian dual program by the subgradient optimization method. We start at some \(\mathbf{u}^0\in {\mathbb {R}}^m_+\) and compute iterates \(\mathbf{u}^t\) according to

$$\begin{aligned} \mathbf{u}^{t+1} = \left[ \mathbf{u}^t + \alpha _{t}\mathbf{h}(\mathbf{x}^t)\right] _+, \quad t=0, 1, \ldots , \end{aligned}$$
(6)

where \(\mathbf{x}^t\in X(\mathbf{u}^t)\) solves the subproblem defined in (2) at \(\mathbf{u}^t\), implying that \(\mathbf{h}(\mathbf{x}^t)\in \partial \theta (\mathbf{u}^t),\,\alpha _{t}>0\) is the step length chosen at iteration \(t\), and \([\,\cdot \,]_+\) denotes the Euclidian projection onto the nonnegative orthant \({\mathbb {R}}^m_+\). For some early development of the theory of the subgradient optimization method, see Shor [44, Chapter 2], Polyak [37, 38], and Ermol’ev [15]. The convergence of the method (6) for the special case of a divergent series step length rule is established in the following proposition.

Proposition 3

(convergence of dual iterates, [1, Theorem 3]) Suppose that Assumption 1 holds, and let the method (6) be applied to the program (4), with the step lengths \(\alpha _{t}\) fulfilling the conditions

$$\begin{aligned} \alpha _{t}>0, \;\forall t, \quad \lim _{t\rightarrow \infty }\sum _{s=0}^{t-1}\alpha _s = \infty , \quad \lim _{t\rightarrow \infty }\sum _{s=0}^{t-1}\alpha _s^2 < \infty . \end{aligned}$$
(7)

Then \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}^\infty \in U^*\) and \(\{\theta (\mathbf{u}^t)\}\rightarrow \theta ^*\).

3 Ergodic primal convergence

In this section, we introduce the notion of an ergodic sequence and present two important results regarding the convergence of ergodic sequences depending on convexity weights and step lengths. Assume that the method (6) is applied to the problem (4). At each iteration \(t\), an ergodic primal iterate \(\overline{\mathbf{x}}^t\) is composed according to

$$\begin{aligned} \overline{\mathbf{x}}^t = \sum _{s=0}^{t-1}\mu _s^t\mathbf{x}^s, \quad \sum _{s=0}^{t-1}\mu _s^t = 1, \quad \mu _s^t\ge 0,\quad s=0, \ldots , t-1, \end{aligned}$$
(8)

where \(\mathbf{x}^s\) is the primal solution found in iteration \(s\), i.e., the solution to the subproblem defined in (2). The vector \(\overline{\mathbf{x}}^t\) thus is a convex combination of all previous subproblem solutions. We define

$$\begin{aligned} \gamma _s^{\,t} = \mu _s^t/\alpha _s, \quad s=0, \ldots , t-1, \quad t=1, 2, \ldots , \end{aligned}$$
(9a)

and

$$\begin{aligned} \varDelta \gamma _{\text {max}}^{\,t} = \max _{s\in \{1, \ldots , t-1\}}\{\gamma _s^{\,t} - \gamma _{s-1}^{\,t}\}, \quad t=2, 3, \ldots . \end{aligned}$$
(9b)

Assumption 2

(relations between convexity weights and step lengths) The step lengths \(\alpha _{t}\) and the convexity weights \(\mu _s^t\) are chosen such that the following conditions are satisfied:

  • A1: \(\gamma _s^{\,t}\ge \gamma _{s-1}^{\,t}, \; s=1, \ldots , t-1, \; t=2, 3, \ldots \),

  • A2: \(\varDelta \gamma _{\max }^{\,t}\rightarrow 0 \text { as } t\rightarrow \infty , \text { and }\)

  • A3: \(\gamma _{0}^{\,t} \rightarrow 0 \text { as } t\rightarrow \infty \text { and, for some } \varGamma >0, \gamma _{t-1}^{\,t}\le \varGamma \text { for all } t\).

The condition A1 requires that \(\mu _s^t/\mu _{s-1}^t \ge \alpha _{s}/\alpha _{s-1}, s= 1, \ldots , t-1, t=1, 2, \ldots \). This can be interpreted as the requirement that whenever the step length at iteration \(s\,(\alpha _s)\) is larger than the previous one at iteration \(s-1\,(\alpha _{s-1})\), the corresponding convexity weight (\(\mu _s^t\)) should be larger than the previous one \((\mu _{s-1}^t)\). By condition A2, the difference between each pair of subsequent convexity weights tends to zero as \(t\) increases, meaning that no primal iterate should be completely neglected. Condition A3 assures that, for decreasing step lengths, the convexity weights decrease at a rate not slower than that of the step lengths.

Remark 1

For any fixed value of \(s\in \{0, \ldots , t-1\}\), it follows from Assumption 2 that \(\gamma _s^t \le \gamma _0^t + s\varDelta \gamma _{\text {max}}^t \rightarrow 0\) as \(t\rightarrow \infty \). This implies that \(\gamma _s^t = \mu _s^t/\alpha _s \rightarrow 0\), which yields that \(\mu _s^t \rightarrow 0\) as \(t\rightarrow \infty \), since \(0< \alpha _s < \infty \). \(\square \)

One example of convexity weights and step lengths fulfilling Assumption 2 is when each ergodic iterate equals the average of all previously found subproblem solutions, i.e., \(\mu _s^t = 1/t,\,s = 0, \ldots , t-1,\,t =1, 2, \ldots \), and the step lengths are chosen according to a harmonic series, i.e., \(\alpha _t = a/(b + ct),\,t = 0, 1, \ldots \), where \(a, b, c > 0\). Note that in [42, Theorem 1], Assumption 2 is included in the hypothesis.

We now present a special case of a result of Silverman and Toeplitz (proven in [25]) which will be utilized in the analysis to follow.

Lemma 3

(convergence of convex combinations, [25, p. 35]) Assume that the sequence \(\{\mu _{s}^t\}\subset {\mathbb {R}}\) fulfills the conditions

$$\begin{aligned}&\mu _{s}^t \ge 0, \; s=0, \ldots , t-1, \quad \sum _{s=0}^{t-1}\mu _{s}^t = 1, \; t=1, 2, \ldots , \;\text { and }\; \; \nonumber \\&\quad \lim _{t\rightarrow \infty } \mu _{s}^t = 0, \, s=0, 1, \ldots . \end{aligned}$$

If the sequence \(\{\mathbf{b}^s\}\subset {\mathbb {R}}^r,\,r\ge 1\), is such that \(\lim _{s\rightarrow \infty } \mathbf{b}^s = \mathbf{b}\) holds, then it holds that \(\lim _{t\rightarrow \infty }\left( \sum _{s=0}^{t-1}\mu _{s}^t\mathbf{b}^s\right) = \mathbf{b}\).

3.1 Feasibility in the limit

We here show that, assuming convergence towards a dual feasible point in the subgradient method (6), and that the step lengths, \(\alpha _{t}\), and convexity weights, \(\mu _s^t\), are chosen such that Assumption 2 is fulfilled, the ergodic sequence of iterates, \(\overline{\mathbf{x}}^t\), converges to the set of primal feasible solutions.

Proposition 4

(feasibility of \(\overline{\mathbf{x}}^t\) in the limit) Suppose that the method (6) operated with a suitable step length rule yields \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}^\infty \in {\mathbb {R}}^m_+\). If the step lengths \(\alpha _{t}\) and convexity weights \(\mu _s^t\) fulfill Assumption 2, then the sequence \(\{\overline{\mathbf{x}}^t\}\) generated according to (8) fulfills

$$\begin{aligned} \limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m, \quad and \quad \overline{\mathbf{x}}^t\in X, \; t=1, 2, \ldots . \end{aligned}$$

Proof

For all \(t\ge 2\), we have that

$$\begin{aligned} \mathbf{h}(\overline{\mathbf{x}}^t)&\le \sum _{s=0}^{t-1}\mu _s^t \mathbf{h}(\mathbf{x}^s) \le \sum _{s=0}^{t-1}\mu _s^t \frac{1}{\alpha _s}\left( \mathbf{u}^{s+1}-\mathbf{u}^s\right) = \sum _{s=0}^{t-1}\gamma _{s}^{\,t}\left( \mathbf{u}^{s+1}-\mathbf{u}^s\right) \end{aligned}$$
(10a)
$$\begin{aligned}&= -\gamma _0^{\,t}\mathbf{u}^0 - \sum _{s=1}^{t-1}\left( \gamma _s^{\,t} - \gamma _{s-1}^{\,t}\right) \mathbf{u}^s + \gamma _{t-1}^{\,t}\mathbf{u}^t\end{aligned}$$
(10b)
$$\begin{aligned}&= -\gamma _0^{\,t}\mathbf{u}^0 + \gamma _{t-1}^{\,t}\mathbf{u}^t - \mathbf{u}^\infty \sum _{s=1}^{t-1}\left( \gamma _s^{\,t} - \gamma _{s-1}^{\,t}\right) + \sum _{s=1}^{t-1}\left( \gamma _s^{\,t} - \gamma _{s-1}^{\,t}\right) \left( \mathbf{u}^\infty - \mathbf{u}^s\right) \end{aligned}$$
(10c)
$$\begin{aligned}&= \gamma _{0}^{\,t}\left( \mathbf{u}^\infty - \mathbf{u}^0\right) +\gamma _{t-1}^{\,t}(\mathbf{u}^t - \mathbf{u}^\infty )+ \sum _{s=1}^{t-1}(\gamma _s^{\,t} - \gamma _{s-1}^{\,t})\left( \mathbf{u}^\infty - \mathbf{u}^s\right) , \end{aligned}$$
(10d)

where the inequalities in (10a) follow from the convexity of the function \(\mathbf{h}\) and the iteration formula (6), respectively. By the condition A3 in Assumption 2, the first term in (10d) tends to \(\mathbf{0}^m\) as \(t\rightarrow \infty \). From the hypothesis, \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}^\infty \) and by condition A3 in Assumption 2, \(\gamma _{t-1}^{\,t}\le \varGamma \) holds, implying that the second term in (10d) tends to \(\mathbf{0}^m\) as \(t\rightarrow \infty \). Let \(\varvec{\sigma }_t = \sum _{s=1}^{t-1}(\gamma _s^{\,t} - \gamma _{s-1}^{\,t})(\mathbf{u}^\infty - \mathbf{u}^s)\). We need to show that \(\varvec{\sigma }_t\rightarrow \mathbf{0}^m\) as \(t\rightarrow \infty \). Given any \(\varepsilon > 0\), let \(\kappa \ge 1\) be large enough so that \(||\mathbf{u}^\infty -\mathbf{u}^s||\le \varepsilon /2\varGamma \) holds for all \(s \ge \kappa +1\). Then, for \(t\ge \kappa +2\) and large enough so that \(\varDelta \gamma _{\text {max}}^{\,t}\sum _{s=1}^\kappa ||\mathbf{u}^\infty -\mathbf{u}^s||\le \varepsilon /2\) holds, we have that

$$\begin{aligned} ||\varvec{\sigma }_t||&\le \sum _{s=1}^{t-1}\left( \gamma _s^{\,t} - \gamma _{s-1}^{\,t}\right) ||\mathbf{u}^\infty - \mathbf{u}^s||\end{aligned}$$
(11a)
$$\begin{aligned}&\le \varDelta \gamma _{\text {max}}^{\,t}\sum _{s=1}^{\kappa }||\mathbf{u}^\infty - \mathbf{u}^s|| + \frac{\varepsilon }{2\varGamma }\sum _{s=\kappa +1}^{t-1}\left( \gamma _s^{\,t} - \gamma _{s-1}^{\,t}\right) \end{aligned}$$
(11b)
$$\begin{aligned}&\le \frac{\varepsilon }{2} + \frac{\varepsilon }{2\varGamma }\left( \gamma _{t-1}^{\,t} - \gamma _{\kappa }^{\,t}\right) \end{aligned}$$
(11c)
$$\begin{aligned}&\le \frac{\varepsilon }{2} + \frac{\varepsilon }{2} = \varepsilon , \end{aligned}$$
(11d)

where the inequality (11a) follows from the triangle inequality and condition A1 in Assumption 2, and the inequality (11b) from condition A2 in Assumption 2 and the assumption that \(\kappa \ge 1\) is large enough. The inequality (11c) follows from the assumption that \(t\ge \kappa +2\) is large enough, and the inequality (11d) follows from the condition A3 in Assumption 2 and the fact that \(\gamma _\kappa ^{\,t}\ge 0\). Since \(\varepsilon >0\) is arbitrary, we deduce that \(\varvec{\sigma }_t \rightarrow \mathbf{0}^m\) as \(t\rightarrow \infty \). It follows that

$$\begin{aligned} \limsup _{t\rightarrow \infty } h_i(\overline{\mathbf{x}}^t) \le \mathbf{0}^m, \quad i\in \mathcal {I}. \end{aligned}$$

Furthermore, since \(X\) is convex and \(\overline{\mathbf{x}}^t\) is a convex combination of \(\mathbf{x}^s\in X(\mathbf{u}^s)\subseteq X\), it holds that \(\overline{\mathbf{x}}^t\in X\) for all \(t\). \(\square \)

Proposition 4 states that as long as the sequence \(\{\mathbf{u}^t\}\) of dual iterates converges to some feasible point in the Lagrangian dual problem (4), and the step lengths and convexity weights are appropriately chosen, the corresponding sequence of primal iterates defined by (8) will produce a primal feasible solution in the limit. If there is an accumulation point \(\overline{\mathbf{x}}\) such that \(\{\overline{\mathbf{x}}^t\}\rightarrow \overline{\mathbf{x}}\), then Proposition 4 states that \(\overline{\mathbf{x}}\) is feasible in the original problem (1). If the functions \(f\) and \(h_i\), \(i\in \mathcal {I}\), are affine, and the set \(X\) is a polytope, then Proposition 4 reduces to [42, Theorem 1].

Note that the conditions A2 and A3 of Assumption 2 are fulfilled if condition A1 in Assumption 2 holds together with the condition that \(\gamma _{t-1}^{\,t}\rightarrow 0\) as \(t\rightarrow \infty \). Below, we present a result for strengthened assumptions on the convexity weights and step lengths, but where the sequence \(\{\mathbf{u}^t\}\) is only assumed to be bounded.

Corollary 1

(bounded dual sequence) Suppose that the sequence \(\{\mathbf{u}^t\}\) generated by the formula (6) is bounded, and that condition  A1 of Assumption 2 holds along with the condition that \(\gamma _{t-1}^{\,t}\rightarrow 0\) as \(t\rightarrow \infty \). Then, the sequence \(\{\overline{\mathbf{x}}^t\}\) generated by (8) fulfills

$$\begin{aligned} \limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m, \quad and \quad \overline{\mathbf{x}}^t\in X,\quad t=1, 2, \ldots . \end{aligned}$$

Proof

From the relations (10a)–(10b) and the condition A1 of Assumption 2 follows that \(\mathbf{h}(\overline{\mathbf{x}}^t)\le \gamma _{t-1}^{\,t}\mathbf{u}^t,\,t\ge 2\). Since \(\gamma _{t-1}^{\,t}\rightarrow 0\) and \(\{\mathbf{u}^t\}\) is bounded, \(\limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m\) holds. \(\square \)

Note that, under the assumptions of Corollary 1, any accumulation point \(\overline{\mathbf{x}}\) to the sequence \(\{\overline{\mathbf{x}}^t\}\) is feasible in (1).

3.2 Optimality in the limit

We next establish—assuming that Slater’s constraint qualification (Assumption 1) is fulfilled—primal convergence to the set of optimal solutions \(X^*\) of the problem (1) as long as the step lengths, \(\alpha _{t}\), and the convexity weights, \(\mu _s^t\), are chosen to satisfy Assumption 2.

Theorem 1

(optimality of \(\overline{\mathbf{x}}^t\) in the limit) Suppose Assumption 1 holds and that the subgradient method (6) operated with a suitable step length rule attains dual convergence, i.e., \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}^\infty \in {\mathbb {R}}^m_+\), and let \(\overline{\mathbf{x}}^t\) be generated according to (8). If the step lengths \(\alpha _{t}\) and the convexity weights \(\mu _s^t\) satisfy Assumption 2, then

$$\begin{aligned} \mathbf{u}^\infty \in U^* \quad and \quad {{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X^*) \rightarrow 0. \end{aligned}$$

Proof

From Proposition 4 follows that \(\limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m\) and \(\overline{\mathbf{x}}^t\in X,\,t\ge 1\). In view of Proposition 2, it suffices to show that \(\{{{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X(\mathbf{u}^\infty ))\} \rightarrow 0\) and that \(\{\mathbf{h}(\overline{\mathbf{x}}^t)^T\mathbf{u}^\infty \}\rightarrow 0\) as \(t\rightarrow \infty \).

By the convexity and nonnegativity of the function \({{\mathrm{{\mathrm {dist}}}}}(\cdot , S)\), and the definition (8), the inequalities

$$\begin{aligned} 0\le {{\mathrm{{\mathrm {dist}}}}}\left( \overline{\mathbf{x}}^t, X(\mathbf{u}^\infty )\right) \le \sum _{s=0}^{t-1}\mu _s^t {{\mathrm{{\mathrm {dist}}}}}\left( \mathbf{x}^s, X(\mathbf{u}^\infty )\right) , \quad t=1, 2, \ldots , \end{aligned}$$

hold. By Lemma 1, \(\left\{ {{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}^s, X(\mathbf{u}^\infty )\right\} \rightarrow 0\) as \(s\rightarrow \infty \). Utilizing Remark 1 and Lemma 3 with \(\mathbf{b}^s = \text {dist}(\mathbf{x}^s, X(\mathbf{u}^\infty ))\) and \(\mathbf{b}=0\), it follows that \(\left\{ {{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X(\mathbf{u}^\infty )\right\} \rightarrow 0\) as \(t\rightarrow \infty \).

Whenever \(\mathcal {I}(\mathbf{u}^\infty )=\emptyset \), the equation \(\mathbf{h}(\overline{\mathbf{x}}^t)^T\mathbf{u}^\infty = 0\) holds for all \(t\), so by Proposition 2, \(\mathbf{u}^\infty \in U^*\) and \({{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X^*) \rightarrow 0\). Now, assume that \(\mathcal {I}(\mathbf{u}^\infty )\ne \emptyset \), and consider an \(i\in \mathcal {I}(\mathbf{u}^\infty )\). Since \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}^\infty \), it follows, for some fixed \(\tau \ge 1\) that is large enough, that \(u_i^t>0\) for all \(t\ge \tau \). Therefore, by the iteration formula (6), it holds that

$$\begin{aligned} h_i(\mathbf{x}^s)= \frac{u_i^{s+1} - u_i^s}{\alpha _s}, \quad s\ge \tau . \end{aligned}$$

Assume that \({\mathrm {rint}}\,X(\mathbf{u}^\infty ) \ne \emptyset \) and let \(\overline{\mathbf{x}}\in {\mathrm {rint}}\,X(\mathbf{u}^\infty )\) and \(\varvec{\xi }_i\in \partial h_i(\overline{\mathbf{x}})\). Lemma 2 yields that

$$\begin{aligned} h_i(\mathbf{x}) = h_i\left( \overline{\mathbf{x}}\right) + \varvec{\xi }_i^T\left( \mathbf{x}-\overline{\mathbf{x}}\right) , \quad \mathbf{x}\in X(\mathbf{u}^\infty ). \end{aligned}$$

The function \(h_i\) is uniformly continuous over the compact set \(X\), so for every \(\delta >0\) there exists an \(\varepsilon >0\) such that for any \(\mathbf{x}\) with \({{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}, X(\mathbf{u}^\infty )) \le \varepsilon \), the inequality

$$\begin{aligned} h_i(\mathbf{x}) \le h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\mathbf{x}-\overline{\mathbf{x}}) + \frac{\delta }{3} \end{aligned}$$

holds. If \({\mathrm {rint}}\,X(\mathbf{u}^\infty ) = \emptyset \), i.e., the set \(X(\mathbf{u}^\infty )\) is a singleton, the same reasoning holds when \(\{\overline{\mathbf{x}}\}= X(\mathbf{u}^\infty )\). From Lemma 1, we know that \(\{{{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}^s, X(\mathbf{u}^\infty ))\}\rightarrow 0\) as \(t\rightarrow \infty \), and hence, for some fixed \(\kappa \ge \tau +1\), the inequality \({{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}^s, X(\mathbf{u}^\infty ))\le \varepsilon \) holds for all \(s\ge \kappa \). Therefore, it holds that

$$\begin{aligned} h_i\left( \overline{\mathbf{x}}\right) + \varvec{\xi }_i^T\left( \mathbf{x}^s-\overline{\mathbf{x}}\right) \ge \frac{u_i^{s+1}-u_i^s}{\alpha _s} - \frac{\delta }{3}, \quad s\ge \kappa . \end{aligned}$$
(12)

Hence, for \(t\ge \kappa +1\), we have that

$$\begin{aligned} h_i(\overline{\mathbf{x}}^t)&\ge h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\overline{\mathbf{x}}^t-\overline{\mathbf{x}}) = \sum _{s=0}^{t-1}\mu _s^t\left( h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\mathbf{x}^s-\overline{\mathbf{x}})\right) \end{aligned}$$
(13a)
$$\begin{aligned}&\ge \sum _{s=0}^{\kappa -1}\mu _s^t\left( h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\mathbf{x}^s-\overline{\mathbf{x}})\right) + \sum _{s=\kappa }^{t-1}\mu _s^t\left( \frac{1}{\alpha _s}(u_i^{s+1}-u_i^s) - \frac{\delta }{3}\right) \end{aligned}$$
(13b)
$$\begin{aligned}&\ge \sum _{s=0}^{\kappa -1}\mu _s^t\left( h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\mathbf{x}^s-\overline{\mathbf{x}})\right) + \sum _{s=\kappa }^{t-1}\gamma _s^{\,t}(u_i^{s+1}-u_i^s) - \frac{\delta }{3}, \end{aligned}$$
(13c)

where the inequality in (13a) follows from the convexity of \(h_i\), the inequality (13b) follows from (12), and the inequality (13c) from the fact that \(\mu _s^t\le 1\) for \(s = 0, \ldots , {t-1}\). For \(t\ge \kappa \), under the conditions A1–A3 of Assumption 2, we have that

$$\begin{aligned} \sum _{s=0}^{\kappa -1} \gamma _s^{\,t} \le \sum _{s=0}^{\kappa -1}\left( \gamma _0^{\,t} + (\kappa -1)\varDelta \gamma _{\text {max}}^{\,t}\right) = \kappa \gamma _0^{\,t} + \kappa (\kappa -1)\varDelta \gamma _{\text {max}}^{\,t} \rightarrow 0 \quad \text {as }\; t\rightarrow \infty , \end{aligned}$$

which implies that \(\gamma _{s}^{\,t}\rightarrow 0\) as \(t\rightarrow \infty \) for \(s=0, \ldots , \kappa -1\). Since \(\alpha _s<\infty \) for \(s=0, \ldots , \kappa -1\), it follows that \(\mu _s^t\rightarrow 0\) as \(t\rightarrow \infty \) for \(s=0, \ldots , \kappa -1\). Therefore, for large enough values of \(t\ge \kappa \), the inequality \(\sum _{s=0}^{\kappa -1}\mu _s^t\left( h_i(\overline{\mathbf{x}}) + \varvec{\xi }_i^T(\mathbf{x}^s-\overline{\mathbf{x}})\right) \ge -\delta /3\) holds.

By the same reasoning as in the proof of Proposition 4 [the inequalities (11)], for large enough values of \(t\), the inequality \(\sum _{s=\kappa }^{t-1}\gamma _s^{\,t}(u_i^{s+1}-u_i^s)\ge -\delta /3\) holds. Hence,

$$\begin{aligned} h_i(\overline{\mathbf{x}}^t)\ge -\frac{\delta }{3} -\frac{\delta }{3} -\frac{\delta }{3} = -\delta \end{aligned}$$

for large enough values of \(t\ge \kappa + 1\). Therefore, \(\liminf _{t\rightarrow \infty } h_i(\overline{\mathbf{x}}^t) \ge 0\) holds. From Proposition 4 follows that \(\limsup _{t\rightarrow \infty } h_i(\overline{\mathbf{x}}^t) \le 0\). We deduce that \(\lim _{t\rightarrow \infty } h_i(\overline{\mathbf{x}}^t) = 0\). Since this result holds for all \(i\in \mathcal {I}(\mathbf{u}^\infty )\), and since \(u_i^\infty =0\) whenever \(i\in \mathcal {I}\setminus \mathcal {I}(\mathbf{u}^\infty )\), it follows that

$$\begin{aligned} \left\{ (\mathbf{u}^\infty )^T\mathbf{h}(\overline{\mathbf{x}}^t)\right\} \rightarrow 0 \quad \text { as }\quad t\rightarrow \infty . \end{aligned}$$

By Proposition 2, the theorem then follows. \(\square \)

For the case when (a) the functions \(f\) and \(h_i\), \(i\in \mathcal {I}\), are affine, and (b) the set \(X\) is a polytope, Theorem 1 reduces to the result of Sherali and Choi [42, Theorem 2].

3.3 A new rule for choosing the convexity weights when utilizing harmonic series step lengths

We now study the special case when the step lengths \(\alpha _{t}\) are chosen according to a harmonic series, i.e.,

$$\begin{aligned} \alpha _{t} := \frac{a}{b+ct}, \quad t=0, 1, \ldots , \quad \text { where } a>0,\; b>0,\; c>0. \end{aligned}$$
(14)

This choice of step lengths was used by Larsson and Liu [27], and Larsson et al. [30] and guarantees, according to Proposition 3, convergence to a dual optimum. We define

$$\begin{aligned} \varDelta \mu _{\text {max}}^t := \max _{s\in \{1, \ldots , t-1\}}\{\mu _s^t - \mu _{s-1}^t\}, \quad t=2, 3, \ldots , \end{aligned}$$

where \(\mu _s^t\) are the convexity weights employed in (8).

Assumption 3

(convexity weights when employing harmonic step lengths) The convexity weights are chosen to satisfy

  • B1: \(\mu _s^t\ge \mu _{s-1}^t, \; s=1, \ldots , t-1, \,\,\, t=2, 3, \ldots , \)

  • B2: \(t\varDelta \mu _{\text {max}}^t\rightarrow 0, \; \text {as } t\rightarrow \infty , \text { and} \)

  • B3: \(t\mu _{t-1}^t\le M<\infty , \; t=1, 2, \ldots .\)

Condition B1 states that the convex combinations \(\overline{\mathbf{x}}^t\), defined in (8), should put more weight on later observations (that is, primal subproblem solutions \(\mathbf{x}^t\)). Condition B2 states that no particular primal iterate should be favoured, meaning that the difference between the weights for two subsequent iterates should tend to zero. Condition B3 states that the convexity weights \(\mu _{t-1}^t\) should decrease at a rate not lower than \(1/t\) as \(t\rightarrow \infty \).

Consider the following result.

Proposition 5

(convexity weights fulfilling Assumption 3 together with step lengths defined by (14) fulfill Assumption 2) If the step lengths, \(\alpha _{t}\), fulfill (14) and the convexity weights, \(\mu _s^t\), satisfy Assumption 3, then Assumption 2 is fulfilled.

Proof

From (14) follows that the inequality \(\alpha _s\le \alpha _{s-1}\) holds for all \(s\ge 1\), which implies that \(\gamma _s^{\,t}-\gamma _{s-1}^{\,t} = \mu _s^t/\alpha _s - \mu _{s-1}^t/\alpha _{s-1} \ge (\mu _s^t-\mu _{s-1}^t)/\alpha _s \ge 0\). Hence, the condition A1 in Assumption 2 is fulfilled. Next, we have that

$$\begin{aligned} \gamma _{s}^t-\gamma _{s-1}^{\,t}&= \frac{b+cs}{a}\mu _s^t - \frac{b+c(s-1)}{a}\mu _{s-1}^t \\&= \frac{1}{a}\left( b(\mu _s^t-\mu _{s-1}^t) + c\mu _{s-1}^t + cs(\mu _s^t-\mu _{s-1}^t)\right) \!, \end{aligned}$$

which implies that

$$\begin{aligned} \varDelta \gamma _{\text {max}}^{\,t}&= \max _{s\in \{1, \ldots , t-1\}} \left\{ \gamma _s^{\,t}-\gamma _{s-1}^{\,t}\right\} \end{aligned}$$
(15a)
$$\begin{aligned}&= \frac{1}{a}\max _{s\in \{1, \ldots , t-1\}} \left\{ b\left( \mu _s^t-\mu _{s-1}^t\right) + c\mu _{s-1}^t + cs\left( \mu _s^t-\mu _{s-1}^t\right) \right\} \end{aligned}$$
(15b)
$$\begin{aligned}&\le \frac{b}{a}\varDelta \mu _{\text {max}}^t + \frac{c}{a}\mu _{t-2}^t + \frac{c}{a}(t-1)\varDelta \mu _{\text {max}}^t \rightarrow 0, \quad \text {as }\,\, t\rightarrow \infty , \end{aligned}$$
(15c)

where the inequality (15c) follows by maximizing each term in (15b) separately. The first term in (15c) tends to zero due to condition B2 in Assumption 3, the second term converges to zero due to the conditions B1 and B3 in Assumption 3 and the third term converges to zero due to the condition B2 in Assumption 3. Hence, the condition A2 in Assumption 2 is fulfilled. We also have that

$$\begin{aligned} \gamma _{t-1}^{\,t}&= \frac{\mu _{t-1}^t}{\alpha _{t-1}}=\frac{(b+c(t-1))\mu _{t-1}^t}{a}\nonumber \\&\le \frac{\mu _{t-1}^t(b-c) + cM}{a} \le \frac{M}{a}\left( \frac{|b-c|}{t} + c\right) < \infty , \end{aligned}$$

for any \(t\ge 1\), which implies that the condition A3 in Assumption 2 is satisfied. \(\square \)

Larsson et al. [30] show that by using the convexity weights \(\mu _s^t = 1/t\), primal convergence can always be guaranteed for the harmonic series step lengths (14). We here refer to this rule for creating an ergodic sequence as the \(1/t\)-rule; it was first analyzed by Larsson and Liu [27], who prove convergence for the case when (1) is a linear program. Clearly, the \(1/t\)-rule fulfills the conditions of Corollary 5; hence the primal convergence of the \(1/t\)-rule is a special case of the analysis above.

One drawback of the \(1/t\)-rule is the fact that when creating the ergodic sequences of primal solutions, all previously found iterates are weighted equally. We expect that later subproblem solutions in the subgradient method are more likely to belong to the set of optimal solutions to the subproblem (2) at a dual optimal solution, \(\mathbf{u}^*\in U^*\). We therefore propose a more general set of rules for creating ergodic sequences of primal iterates which allows for later primal iterates to receive larger convexity weights.

Definition 1

(the \(s^k\)-rule) Let \(k\ge 0\). The \(s^k\)-rule creates an ergodic sequence by choosing convexity weights according to

$$\begin{aligned} \mu _s^t = \frac{(s+1)^k}{\sum _{l = 0}^{t-1}(l+1)^k}, \quad \mathrm{for} \quad s = 0, \ldots , t-1, \quad t\ge 1. \end{aligned}$$

Note that the \(s^0\)-rule is equivalent to the \(1/t\)-rule. For \(k> 0\), the \(s^k\)-rule results in an ergodic sequence in which the later iterates are assigned higher weights than the earlier ones. For larger values of \(k\), the weights are shifted towards later iterates. In Fig. 1, the convexity weights \(\mu _s^t\) are illustrated for \(t=10\) and \(k\in \{0, 1, 2, 10\}\). The following proposition establishes primal convergence for the ergodic sequence created by the \(s^k\)-rule, given that a harmonic series step length is utilized when solving the Lagrangian dual problem (4).

Fig. 1
figure 1

Illustration of the convexity weights, \(\mu _s^t\), when \(t=10\), for the \(s^k\)-rule when \(k=0, 1, 2, 10\) respectively

Proposition 6

(the \(s^k\)-rule satisfies Assumption 3) The convexity weights \(\mu _s^t\), chosen according to Definition 1, fulfill Assumption 3.

Proof

The convexity weights \(\mu _s^t\) clearly fulfill the condition B1 of Assumption 3. For any \(t\ge 2\), it holds that

$$\begin{aligned} \varDelta \mu _{\text {max}}^t = \max _{s\in \{1, \ldots , t-1\}}\{\mu _{s}^t-\mu _{s-1}^t\} = \max _{s\in \{1, \ldots , t-1\}}\left\{ \frac{(s+1)^k-s^k}{\sum _{l=0}^{t-1}(l+1)^k}\right\} , \end{aligned}$$

where the maximum is obtained for \(s=1\) when \(0\le k < 1\), and for \(s=t-1\) when \(k\ge 1\). Noting that

$$\begin{aligned} \sum _{l=0}^{t-1}(l+1)^k = \sum _{l=0}^{t-1}\int \limits _{l}^{l+1}\lceil x\rceil ^k \,dx \ge \sum _{l=0}^{t-1}\int \limits _{l}^{l+1}x^k\,dx = \int _{0}^{t}x^k\,dx = \frac{t^{k+1}}{k+1}, \end{aligned}$$
(16)

it follows that, for \(0 \le k < 1\),

$$\begin{aligned} t\varDelta \mu _{\text {max}}^t = t\frac{1}{\sum _{l=0}^{t-1}(l+1)^k} \le \frac{(k+1)}{t^{k}} \rightarrow 0, \quad \text {as} \quad t\rightarrow \infty . \end{aligned}$$

For \(k\ge 1\), it holds that

$$\begin{aligned} t\varDelta \mu _{\text {max}}^t&= t\frac{t^k - (t-1)^k}{\sum _{l=0}^{t-1}(l+1)^k} \le \frac{(k+1)t(t^k - (t-1)^k)}{t^{k+1}}\nonumber \\&= (k+1)\left( 1 - \left( \frac{t-1}{t}\right) ^k\right) \rightarrow 0, \end{aligned}$$

as \(t \rightarrow \infty \). Hence, condition B2 of Assumption 3 holds. Condition B3 in Assumption 3 holds, due to the fact that

$$\begin{aligned} t\mu _{t-1}^t = \frac{t^{k+1}}{\sum _{l=0}^{t-1} (l+1)^k} \le \frac{(k+1)t^{k+1}}{t^{k+1}} = k+1 < \infty , \quad t= 2, 3, \ldots , \end{aligned}$$

where the first inequality follows from (16). \(\square \)

One should note that when constructing the ergodic iterate, \(\overline{\mathbf{x}}^t\), defined by the \(s^k\)-rule, not all previously found iterates, \(\mathbf{x}^s,\,s = 0, \ldots , t-1\), have to be stored since the ergodic iterate can be updated by

$$\begin{aligned} \overline{\mathbf{x}}^0 = \mathbf{x}^0, \quad \overline{\mathbf{x}}^t = \frac{\sum _{s=0}^{t-2}(s+1)^k}{\sum _{s=0}^{t-1}(s+1)^k}\overline{\mathbf{x}}^{t-1} + \frac{t^k}{\sum _{s=0}^{t-1}(s+1)^k}\mathbf{x}^{t-1}, \quad t = 1, 2, \ldots . \end{aligned}$$
(17)

Hence, in iteration \(t\), only the previous ergodic iterate, \(\overline{\mathbf{x}}^{t-1}\), and the new primal iterate, \(\mathbf{x}^t\), are required for constructing the new ergodic iterate, \(\overline{\mathbf{x}}^t\).

We now summarize the results obtained in this section in the following theorem.

Theorem 2

(convergence of the \(s^k\)-rule) Suppose Assumption 1 holds, that \(\mathbf{u}^t\) is generated by the subgradient method (6) operated with harmonic series step lengths (14), and that \(\overline{\mathbf{x}}^t\) is generated according to (8), where the convexity weights are chosen according to the \(s^k\)-rule (Definition 1). Then,

$$\begin{aligned} \mathbf{u}^t \rightarrow \mathbf{u}^\infty \in U^* \quad and \quad {{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X^*) \rightarrow 0. \end{aligned}$$

Proof

By Proposition 3, it follows that \(\mathbf{u}^t \rightarrow \mathbf{u}^\infty \in U^*\). Using Propositions 5 and 6 yields that the assumptions of Theorem 1 hold, which completes the proof. \(\square \)

3.4 Connection with previous results

In this section, we present some previous proposals for choosing step lengths and convexity weights. For simplicity, we define

$$\begin{aligned} A_t = \sum _{s=0}^{t-1}\alpha _s, \quad B_t = \sum _{s=0}^{t-1}\alpha _s^2, \quad t=1, 2, \ldots . \end{aligned}$$

In the volume algorithm developed by Baharona and Anbil [7], each ergodic iterate is constructed as a convex combination of the previous ergodic iterate and the new primal iterate, i.e., \(\overline{\mathbf{x}}^t = \beta \mathbf{x}^t + (1 - \beta )\overline{\mathbf{x}}^{t-1}\), where \(0 < \beta < 1\). Translating this into the framework of our analysis, this is equivalent to letting

$$\begin{aligned} \mu _0^t = (1 - \beta )^{t-1}, \quad \mu _s^t = \beta (1 - \beta )^{t-s+1}, \quad s = 1, \ldots , t-1. \end{aligned}$$
(18)

In the taxonomy in Table 1, we present the following attributes of the previously developed algorithms which utilize the subgradient method to solve the dual problem:

Problem

Type of problem considered. For the case when problem (1) is a linear program, the assumptions are that \(f\) and \(h_i,\,i\in \mathcal {I}\), are affine functions and that \(X\) is a nonempty and bounded polyhedron. This is denoted in the table as LP. The case of a general convex optimization problem is denoted CP

Step lengths

The step lengths \(\alpha _{t}\) employed in the subgradient method (6). Step lengths defined according to (14) are denoted Harmonic series. If the step lengths fulfill \(\alpha _{t}>0,\,\lim _{t\rightarrow \infty } \alpha _{t} = 0\) and \(\lim _{t\rightarrow \infty } A_t = \infty \), we denote this by Divergent series and if also \(\lim _{t\rightarrow \infty } B_t < \infty \), we denote this by Divergent series, QB (quadratically bounded)

Conv. weights

The convexity weights, \(\mu _s^t\), defined in (8), defining the ergodic sequences of primal iterates

Theorem 1

Whether or not Theorem 1 guarantees the convergence of the method

Theorem 2

Whether or not Theorem 2 guarantees the convergence of the method

Table 1 Taxonomy of dual subgradient algorithms

Since the work presented in this paper utilizes the traditional subgradient method to solve the dual problem, we only include algorithms which employ this method for the dual problem in Table 1. More sophisticated methods for solving the dual problem include deflected conditional subgradient methods (d’Antonio and Frangioni [12], Burachik and Kaya [10]), bundle methods (Lemaréchal et al. [31], Kiwiel [20]), augmented Lagrangian methods (Rockafellar [41], Bertsekas [9]), and ballstep subgradient methods (Kiwiel et al. [22, 23]). All of these methods utilize information from previously computed subgradients when updating the iterates in the subgradient scheme. In order to approximate the primal solutions, the convexity weights defining the primal iterates are then acquired from the information obtained in these dual schemes (e.g., Robinson [40]).

4 Applications to multicommodity network flows

We apply subgradient methods to a Lagrangian dual of the NMFP. Primal solutions are computed from ergodic sequences of subproblem solutions. For a more thorough description of multicommodity flow problems and solution methods for these, see [19, 35, 36].

4.1 The nonlinear multicommodity network flow problem

Consider a graph \(\mathcal {G} = (\mathcal {N}, \mathcal {A})\) with a node set \(\mathcal {N}\) and a set of directed arcs \(\mathcal {A}\). There is a set \(\mathcal {C}\subseteq \mathcal {N}\times \mathcal {N}\) of origin-destination pairs (OD-pairs). For each pair \(k\in \mathcal {C}\), there is a flow demand, \(d_k> 0\), associated with a specific commodity. We denote the set of simple routes from the origin to the destination of OD-pair \(k\) by \(\mathcal {R}_k\), and the flow on route \(r\in \mathcal {R}_k\) by \(h_{kr}\). Let \([\delta _{kra}]_{r\in \mathcal {R}_k, k\in \mathcal {C}, a\in \mathcal {A}}\) be an arc–route incidence matrix for \(\mathcal {G}\), with

$$\begin{aligned} \delta _{kra} = \left\{ \begin{array}{rl} 1, &{} \text{ if } \text{ route } r\in \mathcal {R}_k \text{ contains } \text{ arc } a\in \mathcal {A}\text{, } \\ 0, &{} \text{ otherwise. } \end{array} \right. \end{aligned}$$

The flow on arc \(a\in \mathcal {A}\) is denoted by \(f_a\) and is defined by the route flows \(h_{kr}\) through

$$\begin{aligned} f_a = \sum _{k\in \mathcal {C}}\sum _{r\in \mathcal {R}_k}\delta _{kra}h_{kr}, \quad a \in \mathcal {A}. \end{aligned}$$
(19)

To each arc \(a\in \mathcal {A}\), we associate a convex cost function \(g_a: \mathbb {R}_+\rightarrow \mathbb {R}_+\) of its flow \(f_a\). The NMFP then is the program

$$\begin{aligned} z^* = \text {minimum} \quad \, \sum _{a\in \mathcal {A}}g_a(f_a), \end{aligned}$$
(20a)
$$\begin{aligned} \text {subject to} \quad \,\,\,\,\,\quad \sum _{r\in \mathcal {R}_k}h_{kr}&= d_k, \quad k\in \mathcal {C},\end{aligned}$$
(20b)
$$\begin{aligned} h_{kr}&\ge 0, \,\,\,\quad r\in \mathcal {R}_k,\,\, k\in \mathcal {C}, \end{aligned}$$
(20c)
$$\begin{aligned} \sum _{k\in \mathcal {C}}\sum _{r\in \mathcal {R}_k}\delta _{kra}h_{kr}&=f_a, \quad a\in \mathcal {A}, \end{aligned}$$
(20d)
$$\begin{aligned} f_a&\ge 0, \,\,\,\quad a\in \mathcal {A}. \end{aligned}$$
(20e)

One should note that the constraints (20e) are implied by (20c) and (20d), and do not have to be incorporated explicitly in the model. We will consider two definitions of the arc cost functions, \(g_a(f_a),\,a\in \mathcal {A}\). The first, BPR (Bureau of Public Roads) nonlinear delay, is used in the field of transportation (e.g., [6, 29]) and is defined as

$$\begin{aligned} g_{a}(f_a) = r_af_a\left( 1 + \frac{p_a}{q_a + 1}\left( \frac{f_a}{c_a}\right) ^{q_a}\right) ,\quad f_a\in \mathbb {R}_+, \,\, a\in \mathcal {A}, \end{aligned}$$
(21)

where \(r_a\ge 0\) is the free-flow travel time and \(c_a> 0\) is the practical capacity of arc \(a\in \mathcal {A}\). The parameters \(p_a\ge 0\) and \(q_a\ge 0\) are arc-specific. The second, Kleinrock’s average delays, is used in the field of telecommunications (e.g., [24, 35]) and is defined as

$$\begin{aligned} g_{a}(f_a) = \frac{f_a}{c_a - f_a}, \quad f_a \in [0, c_a), \,\, a\in \mathcal {A}, \end{aligned}$$
(22)

where \(c_a\), \(a\in \mathcal {A}\), are the arc capacities.

4.2 A Lagrangian dual formulation

For the NMFP, i.e., the program (20), we utilize a Lagrangian dual approach in which the arc flow defining constraints (20d) are relaxed. For a more thorough explanation of the Lagrangian reformulation, see [28]. The resulting Lagrangian subproblem essentially consists of solving one shortest path problem for each commodity \(k\in \mathcal {C}\).

Let \(\mathbf{u}= [u_a]_{a\in \mathcal {A}}\) be the multipliers associated with the constraints (20d). We define the Lagrangian dual objective function by

$$\begin{aligned} \theta (\mathbf{u}) = \sum _{k\in \mathcal {C}}\phi _k(\mathbf{u}) + \sum _{a\in \mathcal {A}}\xi _a(u_a), \end{aligned}$$
(23)

where, for each \(k\in \mathcal {C}\) and any \(\mathbf{u}\in \mathbb {R}^{|\mathcal {A}|},\,\phi _k(\mathbf{u})\) is the optimal value of the shortest simple route subproblem, with arc costs \(u_a\), \(a\in \mathcal {A}\), given by

$$\begin{aligned} \phi _k(\mathbf{u}) = \text {minimum}\quad \,\, \sum _{r\in \mathcal {R}_k}&\left( \sum _{a\in \mathcal {A}}u_a\delta _{kra}\right) h_{kr},\nonumber \\ \text {subject to}\quad \, \sum _{r\in \mathcal {R}_k}&h_{kr} = d_k, \\&h_{kr}\ge 0, \quad r\in \mathcal {R}_k,\nonumber \end{aligned}$$
(24)

and with solution set \(H_k(\mathbf{u})\subseteq {\mathbb {R}}_+^{|\mathcal {R}_k|}\). For each \(a\in \mathcal {A}\) and any \(u_a\in \mathbb {R},\,\xi _a(u_a)\) is the optimal value of the single-arc subproblem

$$\begin{aligned} \xi _a(u_a) := \underset{f_a\ge 0}{\text {minimum}}\,\,\left\{ g_a(f_a) - u_af_a\right\} \!, \end{aligned}$$
(25)

with solution \(f_a(u_a)\subseteq {\mathbb {R}}_+\). For the cost functions (21) and (22), \(f_a(u_a)\) can be expressed in closed form as

$$\begin{aligned} f_a(u_a) = \left\{ \begin{array}{ll} (g_a')^{-1}(u_a),\; &{}u_a\ge g_a'(0), \\ 0, &{} u_a < g_a'(0), \end{array} \right. \end{aligned}$$
(26)

where \((g_a')^{-1}\) is the continuous inverse mapping of the derivative of \(g_a\) at \(u_a\). The function \(\theta : \mathbb {R}^{|\mathcal {A}|}\rightarrow \mathbb {R}\) is the sum of \(|\mathcal {C}|\) concave and piecewise linear functions \(\phi _k\), \({k\in \mathcal {C}}\), and \(|\mathcal {A}|\) concave and differentiable functions \(\xi _a,\,a\in \mathcal {A}\). It is thus finite, continuous, concave, and subdifferentiable on \(\mathbb {R}^{|\mathcal {A}|}\). Its subdifferential mapping at \(\mathbf{u}\in \mathbb {R}^{|\mathcal {A}|}\) equals the bounded polyhedron

$$\begin{aligned} \partial \theta (\mathbf{u}) = \Big \{\Big [\sum _{k\in \mathcal {C}}\sum _{r\in \mathcal {R}_k}\delta _{kra}h_{kr} - f_a(u_a)\Big ]_{a\in \mathcal {A}} \,\, \Big |\,\, [h_{kr}]_{r\in \mathcal {R}_k}\in H_{k}(\mathbf{u}), \,\, k\in \mathcal {C}\Big \}. \end{aligned}$$
(27)

By weak duality, \(\theta (\mathbf{u}) \le z^*\) holds for all \(\mathbf{u}\in \mathbb {R}^{|\mathcal {A}|}\). Consider an arbitrary \(\mathbf{u}\in \mathbb {R}^{|\mathcal {A}|}\), and define \(\widehat{u}_a = \max \{u_a\,,\, g_a'(0)\},\,a\in \mathcal {A}\). Then, \(f_a(\widehat{u}_a) = f_a(u_a)\), implying that \(\xi _a(\widehat{u}_a) = \xi _{a}(u_a)\,a\in \mathcal {A}\). Further, \(\phi _{k}(\widehat{\mathbf{u}}) \ge \phi _k(\mathbf{u}),\,k\in \mathcal {C}\), since \(\widehat{\mathbf{u}}\ge \mathbf{u}\), and it follows that \(\theta (\widehat{\mathbf{u}})\ge \theta (\mathbf{u})\). Since the Lagrangian dual objective function \(\theta \), defined in (23), is to be maximized on \({\mathbb {R}}^{|\mathcal {A}|}\), one can, without loss of generality, impose the restriction \(u_a\ge g_a'(0),\,a\in \mathcal {A}\). The Lagrangian dual can thus be stated as

$$\begin{aligned} \begin{array}{rl} \theta ^* = \text {maximum} \,\,&{} \theta (\mathbf{u}),\\ \text {subject to} \;\,&{} u_a\ge g_a'(0), \quad a\in \mathcal {A}, \end{array} \end{aligned}$$
(28)

with solution set \(U^* \subseteq {\mathbb {R}}^{|\mathcal {A}|}\).

Proposition 7

(primal–dual optimality, [30, Proposition 6]) Let \(\mathbf{u}^*\in U^*\). Then, strong duality holds, that is, \(\theta ^* = \theta (\mathbf{u}^*) = z^*\). Further, \(f_a^* = f_a(u_a^*),\,a\in \mathcal {A}\), and

$$\begin{aligned} H_k^* = H_k(\mathbf{u}^*)\cap \left\{ [h_{kr}]_{r\in \mathcal {R}_k} \left| \sum _{l\in \mathcal {C}}\sum _{r\in \mathcal {R}_l}\delta _{lra}h_{lr} = f_a^*, \,\, a\in \mathcal {A}\right. \right\} , \quad k\in \mathcal {C}. \end{aligned}$$

Proposition 7 states that the optimal arc flow \([f_a^*]_{a\in \mathcal {A}}\) is obtained from the solutions to the subproblems \(\xi _a(u_a^*),\,a\in \mathcal {A}\). However, an optimal route flow pattern \({[h_{kr}^*]_{r\in \mathcal {R}_k}\in H_k^*}\) is, in general, not directly available from the solution to the subproblem (24). This depends on the set \(\prod _{k\in \mathcal {C}}H_k(\mathbf{u}^*)\) typically not being a singleton, since the functions \(\phi _k,\,k\in \mathcal {C}\), typically are nonsmooth at \(\mathbf{u}^*\).

4.3 The algorithm

We solve the Lagrangian dual problem (28) by the subgradient method defined in (6). By aggregating the feasible shortest route flow pattern \([h_{kr}(\mathbf{u}^t)]_{r\in \mathcal {R}_k, k\in \mathcal {C}}\) into a feasible arc flow solution, defined by

$$\begin{aligned} y_{a}^{t} = \sum _{k\in \mathcal {C}}\sum _{r\in \mathcal {R}_k}\delta _{kra}h_{kr}, \quad a\in \mathcal {A}, \end{aligned}$$
(29)

a subgradient of \(\theta \) at \(\mathbf{u}^t\) is given by the vector \([y_a^t - f_a(u_a^t)]_{a\in \mathcal {A}}\). The subgradient method (6) is then given by

$$\begin{aligned} u_a^{t+1} := \max \left\{ u_a^t + \alpha _{t}(y_a^t - f_a(u_a^t))\,, \, g_a'(0) \right\} , \quad a\in \mathcal {A},\,\, t=0, 1, \ldots , \end{aligned}$$

where \(\alpha _{t}\) is the step length used in iteration \(t\). We create ergodic sequences of arc flows according to

$$\begin{aligned} \hat{f}_a^t := \sum _{s=0}^{t-1}\mu _s^ty_a^s, \quad a\in \mathcal {A}, \quad t = 1, 2, \ldots , \end{aligned}$$

by choosing the convexity weights \(\mu _s^t\) according to the \(s^k\)-rule (see Definition 1). Since all arc flows \(y_a^s\), \(a\in \mathcal {A},\,s=0, \ldots , t-1\), are feasible, the ergodic iterate \(\hat{f}_a^t\) will also be feasible in (20) for \(t\ge 1\). The ergodic iterates \(\hat{f}^t_a\) are updated analogously to the update of \(\overline{\mathbf{x}}^t\) in (17). In each iteration \(t\ge 0\), we obtain a lower bound, \(\underline{z}^t\), and an upper bound, \(\overline{z}^{\,t}\), on the optimal objective value \(z^*\), according to

$$\begin{aligned} \underline{z}^t := \max _{s\in \{0, 1, \ldots , t\}}\{\theta (\mathbf{u}^s)\} \quad \text { and }\quad \overline{z}^{\,t} := \min _{s\in \{0, 1, \ldots , t\}}\left\{ \sum _{a\in \mathcal {A}}g_a(\hat{f}_a^{s})\right\} . \end{aligned}$$
(30)

5 Numerical tests and results

We now utilize the subgradient approach described in Sect. 4.3 on a set of convex multicommodity flow problems to evaluate the performance of a number of different rules for choosing the convexity weights defining the ergodic sequences.

5.1 Implementation issues

The algorithm described in Sect. 4.3 has been implemented in Fortran95 on a Pentium Dual Core 2.50 GHz with 4 GB RAM under Linux Red Hat 2.16.0.

To solve the shortest-path subproblems defined in (24), we use Dijkstra’s algorithm [13] as implemented in the subroutine L2QUE described in [18]. In every iteration, Dijkstra’s algorithm is called \(|\mathcal {S}|\) times, where \(\mathcal {S}\subseteq \mathcal {N}\) is the union of all origin nodes of the OD set \(\mathcal {C}\). No comparisons have been made between this implementation and other shortest-path solvers.

In the dual subgradient method (6), we adopt a harmonic series step length \(\alpha _{t} = \widehat{\alpha }/(t+1),\,t=0, 1, \ldots ,\) where \(\widehat{\alpha }\) is chosen for each specific problem instance. The subgradient algorithm is terminated when the relative optimality gap is below a specified limit, \(\varepsilon _{\text {opt}}>0\), i.e., when

$$\begin{aligned} \frac{\overline{z}^{\,t}-\underline{z}^{t}}{\max \{\underline{z}^t\,,\, 1\}} < \varepsilon _{\text {opt}}, \end{aligned}$$
(31)

where \(\overline{z}^{\,t}\) and \(\underline{z}^t\) are the upper and lower bounds defined in (30).

5.2 Test problems

We evaluate our algorithm on three sets of test problems, which are also used in [2] and [21]. The first set, the planar problems,Footnote 1 consists of ten instances, in which nodes have been randomly chosen as points in the plane, and the arcs are such that the resulting graph is planar; the OD-pairs have been chosen at random. The grid problems (see footnote 1) collection contains 15 networks with a grid structure, meaning that each node has four incoming and four outgoing arcs; the OD-pairs have been chosen at random. The third set consists of three telecommunication problems of various sizes. The arc cost functions have been generated as in [2, Section 8.1] for all the test problems.

In Table 2, the characteristics of the problems are given, where \(|\mathcal {N}|\) is the number of nodes, \(|\mathcal {A}|\) is the number of arcs, \(|\mathcal {C}|\) is the number of commodities and \(|\mathcal {S}|\) is the number of calls to Dijkstra’s algorithm needed in each iteration. Note that the characteristics are taken from [21] since some values in [2] are incorrect; see [3]. We also include in Table 2 some computational characteristics of the subgradient algorithm described in Sect. 4.3, CPU time and time spent on shortest path problems.

Table 2 Data for the test problems of Babonneau and Vial [2]

5.3 Convexity weight rules

We have chosen to analyze the \(1/t\)-rule [27, 30], the volume algorithm (VA) [7] and the proposed \(s^k\)-rule for \(k=1, 2, 4,\) and \(10\) on the problem instances listed in Table 2. For the VA, we update the ergodic iterates by \(\overline{\mathbf{x}}^t = \beta \mathbf{x}^{t} + (1-\beta )\overline{\mathbf{x}}^{t-1}\), where \(\beta =0.1\), as proposed in [7]. We decided not to include the rule described in [44, Chapter 4], since for most of the problem instances, it did not reach the optimality threshold chosen within 10,000 iterations.

5.4 Results

Tables 3 and 4 present our results when defining the arc cost functions as the BPR congestion function (21) and the Kleinrock function (22), respectively. In these tables,

  • \(\widehat{\alpha }\) represents the initial step length used in the subgradient algorithm (6) which was chosen as the integer power of \(10\) that yielded the best performance for each problem instance, and

  • for the \(1/t\)-rule, the VA, and the \(s^k\)-rules, the number of subgradient iterations required to reach an optimality gap below the given threshold, \(\varepsilon _{\text {opt}}\), are listed.

We impose a limit of 10,000 iterations, and denote by a dash (‘–’) when the optimality gap did not reach below \(\varepsilon _{\text {opt}}\) within this limit.

Table 3 Results for the BPR congestion function (21): the number of iterations until the relative optimality gap defined in (31) is below \(\varepsilon _{\text {opt}}= 10^{-4}\)
Table 4 Results for the Kleinrock delay function (22): the number of iterations until the relative optimality gap defined in (31) is below \(\varepsilon _{\text {opt}} = 10^{-2}\)

In Fig. 2, the performance profiles [14] for the methods are illustrated for the 56 test problems considered (28 using the BRP congestion function and 28 using the Kleinrock delay function). The graphs in the figure represent the portion of problems solved (that is, attained an optimality gap below the given threshold \(\varepsilon _{\text {opt}}\)) within a factor \(\tau \) times the number of iterations needed by the method that reached the threshold within the least number of iterations.

Fig. 2
figure 2

Performance profiles of the methods for the 56 test instances (28 instances, each with the BPR congestion function (21) and the Kleinrock delay function (22), respectively). The graphs illustrate the proportion of the instances which each of the methods solved within \(\tau \) times the number of iterations required by the method which solved each corresponding instance within the least number of iterations

The \(s^k\)-rule for \(k = 1, 2, 4,\) and \(10\) clearly outperforms both the \(1/t\)-rule and the VA for the test instances. The best performance was shown by the \(s^4\)-rule which reached the acquired relative optimality gap [defined in (31)] for 37 out of the 56 instances using the least number of iterations. For the problem instance where the \(s^4\)-rule performed the poorest it still solved the problem within a factor \(\tau \approx 1.25\) times the number of iterations needed by the method which solved that instance within the least number of iterations. The VA (\(1/t\)-rule) failed to obtain the given optimality threshold within 10,000 iterations for ten (five) problem instances, while the \(s^k\)-rules failed on only four of the problem instances.

6 Conclusions and future research

We generalize the convergence results in [42] to convex optimization problems and extend the analysis in [30] to include more general convex combinations for creating the ergodic sequences.

The proposed \(s^k\)-rule for choosing convexity weights for the primal iterates allows putting more weight on later iterates in the subgradient scheme. Computational results for three sets of NMFPs demonstrate that the \(s^k\)-rule is convincing and shows a performance superior to that of previously proposed rules. Section 5 presents a comparison between different rules for choosing the convexity weights in the subgradient scheme, and should not be viewed as an attempt to provide a new, competitive solution method for the NMFP.

Since the convergence results are presented for general convex optimization problems, we have not analyzed the performance of the \(s^k\)-rule specifically for linear programs. Preliminary numerical tests indicate, however, a similar performance.

Future interesting research includes an analysis of the performance of the \(s^k\)-rule for other problems, for which subgradient schemes have proven to be successful. Examples are found within the fields of discrete optimization (e.g., Ceria et al. [11], Fisher [16]), network design (e.g., Balakrishnan et al. [5], Frangioni and Gendron [17]), and traffic assignment (e.g., Patriksson [36]).

The \(s^k\)-rule is utilized together with harmonic series step lengths, and future research should also investigate convergence results and the practical performance of the rule when utilizing other step lengths, for example Polyak step lengths [39, Chapter 5.3]. We also aim at analyzing the convergence rate of the ergodic sequences in terms of infeasibility and sub-optimality depending on the convexity weight rules utilized. Another extension of the results presented here would be to analyze the convergence of the ergodic sequences when allowing inexact solutions of the subproblems; such solutions would provide \(\varepsilon \)-subgradients of the dual objective function (e.g., d’Antonio and Frangioni [12]).

We are currently investigating the feasibility and computational potential of using the \(s^k\)-rule when employing other methods for solving the dual problem, for example augmented Lagrangian methods (e.g., Rockafellar [41], Bertsekas [9]), bundle methods (e.g., Lemaréchal et al. [31]) and ballstep subgradient methods (e.g., Kiwiel et al. [22, 23]).