Primal convergence from dual subgradient methods for convex optimization
 2.3k Downloads
 9 Citations
Abstract
When solving a convex optimization problem through a Lagrangian dual reformulation subgradient optimization methods are favorably utilized, since they often find nearoptimal dual solutions quickly. However, an optimal primal solution is generally not obtained directly through such a subgradient approach unless the Lagrangian dual function is differentiable at an optimal solution. We construct a sequence of convex combinations of primal subproblem solutions, a so called ergodic sequence, which is shown to converge to an optimal primal solution when the convexity weights are appropriately chosen. We generalize previous convergence results from linear to convex optimization and present a new set of rules for constructing the convexity weights that define the ergodic sequence of primal solutions. In contrast to previously proposed rules, they exploit more information from later subproblem solutions than from earlier ones. We evaluate the proposed rules on a set of nonlinear multicommodity flow problems and demonstrate that they clearly outperform the ones previously proposed.
Keywords
Convex programming Lagrangian duality Subgradient optimization Ergodic convergence Primal recovery Nonlinear multicommodity flow problemMathematics Subject Classification (2010)
90C25 90C30 90C461 Introduction and motivation
Lagrangian relaxation is a frequently utilized tool for solving largescale convex minimization problems due to its simplicity and its property of systematically providing optimistic estimates on the optimal value. One popular tool for solving the dual problems of Lagrangian relaxation schemes is subgradient optimization. The advantage of subgradient methods is that they often find nearoptimal dual solutions quickly, whilst a drawback is that nearoptimal primal feasible solutions can not, in general, be obtained directly from the subgradient scheme. As the dual iterates in a subgradient scheme converge towards an optimal dual solution, primal convergence towards a nearoptimal primal solution is not in general achieved by simply using the subproblem solutions as primal iterates. Even with a dual optimal solution at hand, an optimal primal solution can not easily be obtained. The reason for this inconvenience is that the dual objective function is typically nonsmooth at an optimal point, whence an optimal primal solution is a nontrivial convex combination of the extreme subproblem solutions.
This paper analyzes what is called ergodic sequences by Larsson et al. [30] or recovering primal solutions by Sherali and Choi [42]; we will use the notion ergodic sequences. To guarantee primal convergence for a linear program in a subgradient scheme, Shor [44, Chapter 4] and Larsson and Liu [27] (originally developed in [26]) utilize a strategy which, rather than using the subproblem solution as primal iterate, uses a convex combination of previously found subproblem solutions, denoted as an ergodic sequence. In [44, Chapter 4] the convex combinations are determined by the step lengths used in the subgradient scheme, while in [27, Theorem 3] the mean of the iterates previously found are used. These results are extended in [42] to a more general case of convex combinations and step lengths in the subgradient algorithm applied to linear programs. Larsson et al. [30] show that the convex combinations used in [44] and [27] yield primal convergence also for general convex optimization problems.
Several other methods for generating approximate primal solutions in a subgradient scheme have been studied. Barahona and Anbil [7] propose a method for approximating the solution to a linear program by utilizing a subgradient method in which a primal solution is created as a convex combination of the previous solution and the primal iterate obtained from the subgradient method. The method is denoted the volume algorithm and was revisited by Bahiense et al. [4] and Sherali and Lim [43], where they extended it to include more information in the dual scheme. Nesterov [34] analyzes a primal–dual subgradient method where a primal feasible approximation to the optimum is obtained by using control sequences in both the primal and dual space. Nedić and Ozdaglar [32, 33] study methods which utilize the average of all previously found iterates as primal solutions. The latter algorithms employ a constant step length due to its simplicity and practical significance. For a more thorough overview of the history of strategies for the construction of primal iterates in dual subgradient schemes, see Anstreicher and Wolsey [1].
This paper generalizes the results in [42] to the class of convex programs, and extends the results in [30] to include more general convex combinations in the definition of the ergodic sequences. We present a new set of rules for constructing the convexity weights defining the ergodic sequence of primal iterates. In contrast to rules previously utilized, they exploit more information from later subproblem solutions than from earlier ones. We evaluate the new rules on a set of nonlinear multicommodity flow problems (NMFPs) and show that they clearly outperform the previously utilized ones.
The remainder of this paper is organized as follows. In Sect. 2 we introduce some basic concepts regarding Lagrangian relaxation and subgradient methods. In Sect. 3, we describe the notion of primal ergodic sequences and present an important theorem regarding their convergence when considering general convex problems. Section 3 also includes a new set of rules for choosing convexity weights when defining the ergodic sequences. The final part of Sect. 3 includes a taxonomy of previous results and their connection to the results presented in this paper. In Sect. 4 we introduce the NMFP and describe a solution approach based on Lagrangian relaxation. Computational results for a set of NMFP test instances employing the new rules for choosing the convexity weights are presented in Sect. 5. Conclusions are then drawn in Sect. 6.
2 Background
Lemma 1
(\(X( \cdot )\) is a closed map [30, Lemma 1]) Let the sequence \(\{ \mathbf{u}^t\}\subset {\mathbb {R}}^{m}\), the map \(X(\cdot ) : {\mathbb {R}}^m\rightarrow 2^X\) be given by the definition (3), and the sequence \(\{\mathbf{x}^t\}\) be given by the inclusion \(\mathbf{x}^t\in X(\mathbf{u}^t)\). If \(\{\mathbf{u}^t\}\rightarrow \mathbf{u}\in {\mathbb {R}}^m\), then \({{\mathrm{{\mathrm {dist}}}}}(\mathbf{x}^t, X(\mathbf{u}))\rightarrow 0\). If, in addition, \(X(\mathbf{u}) = \{\mathbf{x}\}\), then \(\{\mathbf{x}^t\}\rightarrow \mathbf{x}\).
Lemma 2
(affineness of the Lagrange function [30, Lemma 2]) The functions \(f\) and \(h_i,\,i\in \mathcal {I}(\mathbf{u})\), are affine on \(X(\mathbf{u})\) for every \(\mathbf{u}\in {\mathbb {R}}^m_+\). Further, if the function \(f\) is (the functions \(h_i\), \(i\in \mathcal {I}(\mathbf{u})\), are) differentiable, then \(\nabla f\) is ( \(\nabla h_i\), \(i\in \mathcal {I}(\mathbf{u})\) are) constant on \(X(\mathbf{u})\).
Proposition 1
(subdifferential to the dual function [30, Proposition 1]) For each \(\mathbf{u}\in {\mathbb {R}}^m\), it holds that \(\partial \theta (\mathbf{u}) = \{\mathbf{h}(\mathbf{x}) \;\; \mathbf{x}\in X(\mathbf{u})\}\). Further, \(\theta \) is differentiable at \(\mathbf{u}\) if and only if each \(h_i\) is constant on \(X(\mathbf{u})\), in which case \(\nabla \theta (\mathbf{u}) = \mathbf{h}(\mathbf{x})\) for all \(\mathbf{x}\in X(\mathbf{u})\).
To obtain primal–dual optimality relations, we assume Slater’s constraint qualification as stated in Assumption 1.
Assumption 1
(Slater constraint qualification) The set \(\{\,\mathbf{x}\in X \,\, \mathbf{h}(\mathbf{x})<\mathbf{0}^m\,\}\) is nonempty.
Under Assumption 1, the solution set \(U^*\) is nonempty and compact and, by strong duality, the equality \(\theta (\mathbf{u}^*) = f(\mathbf{x}^*)\) holds for some pair of primal–dual solutions \((\mathbf{x}^*, \mathbf{u}^*)\) fulfilling \(\mathbf{u}^*\in {\mathbb {R}}^m_+,\,\mathbf{x}^*\in X\) and \(\mathbf{h}(\mathbf{x}^*)\le \mathbf{0}^m\) ([8, Theorem 6.2.5]).
Proposition 2
(optimality conditions, [8, Theorem 6.2.5]) Let Assumption 1 hold. Then, \(\mathbf{u}\in U^*\) and \(\mathbf{x}\in X^*\) if and only if \(\mathbf{u}\in {\mathbb {R}}^m_+,\,\mathbf{x}\in X(\mathbf{u}),\,\mathbf{h}(\mathbf{x})\le \mathbf{0}^m\) and \(\mathbf{u}^T\mathbf{h}(\mathbf{x}) = 0\).
2.1 Subgradient optimization
Proposition 3
3 Ergodic primal convergence
Assumption 2

A1: \(\gamma _s^{\,t}\ge \gamma _{s1}^{\,t}, \; s=1, \ldots , t1, \; t=2, 3, \ldots \),

A2: \(\varDelta \gamma _{\max }^{\,t}\rightarrow 0 \text { as } t\rightarrow \infty , \text { and }\)

A3: \(\gamma _{0}^{\,t} \rightarrow 0 \text { as } t\rightarrow \infty \text { and, for some } \varGamma >0, \gamma _{t1}^{\,t}\le \varGamma \text { for all } t\).
The condition A1 requires that \(\mu _s^t/\mu _{s1}^t \ge \alpha _{s}/\alpha _{s1}, s= 1, \ldots , t1, t=1, 2, \ldots \). This can be interpreted as the requirement that whenever the step length at iteration \(s\,(\alpha _s)\) is larger than the previous one at iteration \(s1\,(\alpha _{s1})\), the corresponding convexity weight (\(\mu _s^t\)) should be larger than the previous one \((\mu _{s1}^t)\). By condition A2, the difference between each pair of subsequent convexity weights tends to zero as \(t\) increases, meaning that no primal iterate should be completely neglected. Condition A3 assures that, for decreasing step lengths, the convexity weights decrease at a rate not slower than that of the step lengths.
Remark 1
For any fixed value of \(s\in \{0, \ldots , t1\}\), it follows from Assumption 2 that \(\gamma _s^t \le \gamma _0^t + s\varDelta \gamma _{\text {max}}^t \rightarrow 0\) as \(t\rightarrow \infty \). This implies that \(\gamma _s^t = \mu _s^t/\alpha _s \rightarrow 0\), which yields that \(\mu _s^t \rightarrow 0\) as \(t\rightarrow \infty \), since \(0< \alpha _s < \infty \). \(\square \)
One example of convexity weights and step lengths fulfilling Assumption 2 is when each ergodic iterate equals the average of all previously found subproblem solutions, i.e., \(\mu _s^t = 1/t,\,s = 0, \ldots , t1,\,t =1, 2, \ldots \), and the step lengths are chosen according to a harmonic series, i.e., \(\alpha _t = a/(b + ct),\,t = 0, 1, \ldots \), where \(a, b, c > 0\). Note that in [42, Theorem 1], Assumption 2 is included in the hypothesis.
We now present a special case of a result of Silverman and Toeplitz (proven in [25]) which will be utilized in the analysis to follow.
Lemma 3
3.1 Feasibility in the limit
We here show that, assuming convergence towards a dual feasible point in the subgradient method (6), and that the step lengths, \(\alpha _{t}\), and convexity weights, \(\mu _s^t\), are chosen such that Assumption 2 is fulfilled, the ergodic sequence of iterates, \(\overline{\mathbf{x}}^t\), converges to the set of primal feasible solutions.
Proposition 4
Proof
Proposition 4 states that as long as the sequence \(\{\mathbf{u}^t\}\) of dual iterates converges to some feasible point in the Lagrangian dual problem (4), and the step lengths and convexity weights are appropriately chosen, the corresponding sequence of primal iterates defined by (8) will produce a primal feasible solution in the limit. If there is an accumulation point \(\overline{\mathbf{x}}\) such that \(\{\overline{\mathbf{x}}^t\}\rightarrow \overline{\mathbf{x}}\), then Proposition 4 states that \(\overline{\mathbf{x}}\) is feasible in the original problem (1). If the functions \(f\) and \(h_i\), \(i\in \mathcal {I}\), are affine, and the set \(X\) is a polytope, then Proposition 4 reduces to [42, Theorem 1].
Note that the conditions A2 and A3 of Assumption 2 are fulfilled if condition A1 in Assumption 2 holds together with the condition that \(\gamma _{t1}^{\,t}\rightarrow 0\) as \(t\rightarrow \infty \). Below, we present a result for strengthened assumptions on the convexity weights and step lengths, but where the sequence \(\{\mathbf{u}^t\}\) is only assumed to be bounded.
Corollary 1
Proof
From the relations (10a)–(10b) and the condition A1 of Assumption 2 follows that \(\mathbf{h}(\overline{\mathbf{x}}^t)\le \gamma _{t1}^{\,t}\mathbf{u}^t,\,t\ge 2\). Since \(\gamma _{t1}^{\,t}\rightarrow 0\) and \(\{\mathbf{u}^t\}\) is bounded, \(\limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m\) holds. \(\square \)
Note that, under the assumptions of Corollary 1, any accumulation point \(\overline{\mathbf{x}}\) to the sequence \(\{\overline{\mathbf{x}}^t\}\) is feasible in (1).
3.2 Optimality in the limit
We next establish—assuming that Slater’s constraint qualification (Assumption 1) is fulfilled—primal convergence to the set of optimal solutions \(X^*\) of the problem (1) as long as the step lengths, \(\alpha _{t}\), and the convexity weights, \(\mu _s^t\), are chosen to satisfy Assumption 2.
Theorem 1
Proof
From Proposition 4 follows that \(\limsup _{t\rightarrow \infty } \mathbf{h}(\overline{\mathbf{x}}^t) \le \mathbf{0}^m\) and \(\overline{\mathbf{x}}^t\in X,\,t\ge 1\). In view of Proposition 2, it suffices to show that \(\{{{\mathrm{{\mathrm {dist}}}}}(\overline{\mathbf{x}}^t, X(\mathbf{u}^\infty ))\} \rightarrow 0\) and that \(\{\mathbf{h}(\overline{\mathbf{x}}^t)^T\mathbf{u}^\infty \}\rightarrow 0\) as \(t\rightarrow \infty \).
For the case when (a) the functions \(f\) and \(h_i\), \(i\in \mathcal {I}\), are affine, and (b) the set \(X\) is a polytope, Theorem 1 reduces to the result of Sherali and Choi [42, Theorem 2].
3.3 A new rule for choosing the convexity weights when utilizing harmonic series step lengths
Assumption 3

B1: \(\mu _s^t\ge \mu _{s1}^t, \; s=1, \ldots , t1, \,\,\, t=2, 3, \ldots , \)

B2: \(t\varDelta \mu _{\text {max}}^t\rightarrow 0, \; \text {as } t\rightarrow \infty , \text { and} \)

B3: \(t\mu _{t1}^t\le M<\infty , \; t=1, 2, \ldots .\)
Condition B1 states that the convex combinations \(\overline{\mathbf{x}}^t\), defined in (8), should put more weight on later observations (that is, primal subproblem solutions \(\mathbf{x}^t\)). Condition B2 states that no particular primal iterate should be favoured, meaning that the difference between the weights for two subsequent iterates should tend to zero. Condition B3 states that the convexity weights \(\mu _{t1}^t\) should decrease at a rate not lower than \(1/t\) as \(t\rightarrow \infty \).
Consider the following result.
Proposition 5
(convexity weights fulfilling Assumption 3 together with step lengths defined by (14) fulfill Assumption 2) If the step lengths, \(\alpha _{t}\), fulfill (14) and the convexity weights, \(\mu _s^t\), satisfy Assumption 3, then Assumption 2 is fulfilled.
Proof
Larsson et al. [30] show that by using the convexity weights \(\mu _s^t = 1/t\), primal convergence can always be guaranteed for the harmonic series step lengths (14). We here refer to this rule for creating an ergodic sequence as the \(1/t\)rule; it was first analyzed by Larsson and Liu [27], who prove convergence for the case when (1) is a linear program. Clearly, the \(1/t\)rule fulfills the conditions of Corollary 5; hence the primal convergence of the \(1/t\)rule is a special case of the analysis above.
One drawback of the \(1/t\)rule is the fact that when creating the ergodic sequences of primal solutions, all previously found iterates are weighted equally. We expect that later subproblem solutions in the subgradient method are more likely to belong to the set of optimal solutions to the subproblem (2) at a dual optimal solution, \(\mathbf{u}^*\in U^*\). We therefore propose a more general set of rules for creating ergodic sequences of primal iterates which allows for later primal iterates to receive larger convexity weights.
Definition 1
Proposition 6
(the \(s^k\)rule satisfies Assumption 3) The convexity weights \(\mu _s^t\), chosen according to Definition 1, fulfill Assumption 3.
Proof
We now summarize the results obtained in this section in the following theorem.
Theorem 2
Proof
By Proposition 3, it follows that \(\mathbf{u}^t \rightarrow \mathbf{u}^\infty \in U^*\). Using Propositions 5 and 6 yields that the assumptions of Theorem 1 hold, which completes the proof. \(\square \)
3.4 Connection with previous results
Problem  Type of problem considered. For the case when problem (1) is a linear program, the assumptions are that \(f\) and \(h_i,\,i\in \mathcal {I}\), are affine functions and that \(X\) is a nonempty and bounded polyhedron. This is denoted in the table as LP. The case of a general convex optimization problem is denoted CP 
Step lengths  The step lengths \(\alpha _{t}\) employed in the subgradient method (6). Step lengths defined according to (14) are denoted Harmonic series. If the step lengths fulfill \(\alpha _{t}>0,\,\lim _{t\rightarrow \infty } \alpha _{t} = 0\) and \(\lim _{t\rightarrow \infty } A_t = \infty \), we denote this by Divergent series and if also \(\lim _{t\rightarrow \infty } B_t < \infty \), we denote this by Divergent series, QB (quadratically bounded) 
Conv. weights  The convexity weights, \(\mu _s^t\), defined in (8), defining the ergodic sequences of primal iterates 
Theorem 1  Whether or not Theorem 1 guarantees the convergence of the method 
Theorem 2  Whether or not Theorem 2 guarantees the convergence of the method 
Taxonomy of dual subgradient algorithms
Problem  Step lengths  Conv. weights  Theorem 1  Theorem 2  

Shor [44, Chapter 4]  LP  Divergent series  \(\mu _s^t = \alpha _s/A_t\)  Yes  No 
Larsson and Liu [26]  LP  Harmonic series  \(\mu _s^t = 1/t\)  Yes  Yes 
Sherali and Choi [42]  LP  \(\alpha _{t} = (t+1)^{\kappa }\)  \(\mu _s^t =1/t\)  Yes  No 
Baharona and Anbil [7]  LP  Polyak step size  (18)  No  No 
Larsson et al. [30]  CP  Divergent series, QB  \(\mu _s^t =\alpha _s/A_t\)  Yes  No 
Larsson et al. [30]  CP  Harmonic series  \(\mu _s^t =1/t\)  Yes  Yes 
Nedić and Ozdaglar [32]  CP  Constant  \(\mu _s^t =1/t\)  No  No 
Gustavsson et al. (this art.)  CP  Harmonic series  \(s^k\)rule (Definition 1)  Yes  Yes 
Since the work presented in this paper utilizes the traditional subgradient method to solve the dual problem, we only include algorithms which employ this method for the dual problem in Table 1. More sophisticated methods for solving the dual problem include deflected conditional subgradient methods (d’Antonio and Frangioni [12], Burachik and Kaya [10]), bundle methods (Lemaréchal et al. [31], Kiwiel [20]), augmented Lagrangian methods (Rockafellar [41], Bertsekas [9]), and ballstep subgradient methods (Kiwiel et al. [22, 23]). All of these methods utilize information from previously computed subgradients when updating the iterates in the subgradient scheme. In order to approximate the primal solutions, the convexity weights defining the primal iterates are then acquired from the information obtained in these dual schemes (e.g., Robinson [40]).
4 Applications to multicommodity network flows
We apply subgradient methods to a Lagrangian dual of the NMFP. Primal solutions are computed from ergodic sequences of subproblem solutions. For a more thorough description of multicommodity flow problems and solution methods for these, see [19, 35, 36].
4.1 The nonlinear multicommodity network flow problem
4.2 A Lagrangian dual formulation
For the NMFP, i.e., the program (20), we utilize a Lagrangian dual approach in which the arc flow defining constraints (20d) are relaxed. For a more thorough explanation of the Lagrangian reformulation, see [28]. The resulting Lagrangian subproblem essentially consists of solving one shortest path problem for each commodity \(k\in \mathcal {C}\).
Proposition 7
Proposition 7 states that the optimal arc flow \([f_a^*]_{a\in \mathcal {A}}\) is obtained from the solutions to the subproblems \(\xi _a(u_a^*),\,a\in \mathcal {A}\). However, an optimal route flow pattern \({[h_{kr}^*]_{r\in \mathcal {R}_k}\in H_k^*}\) is, in general, not directly available from the solution to the subproblem (24). This depends on the set \(\prod _{k\in \mathcal {C}}H_k(\mathbf{u}^*)\) typically not being a singleton, since the functions \(\phi _k,\,k\in \mathcal {C}\), typically are nonsmooth at \(\mathbf{u}^*\).
4.3 The algorithm
5 Numerical tests and results
We now utilize the subgradient approach described in Sect. 4.3 on a set of convex multicommodity flow problems to evaluate the performance of a number of different rules for choosing the convexity weights defining the ergodic sequences.
5.1 Implementation issues
The algorithm described in Sect. 4.3 has been implemented in Fortran95 on a Pentium Dual Core 2.50 GHz with 4 GB RAM under Linux Red Hat 2.16.0.
To solve the shortestpath subproblems defined in (24), we use Dijkstra’s algorithm [13] as implemented in the subroutine L2QUE described in [18]. In every iteration, Dijkstra’s algorithm is called \(\mathcal {S}\) times, where \(\mathcal {S}\subseteq \mathcal {N}\) is the union of all origin nodes of the OD set \(\mathcal {C}\). No comparisons have been made between this implementation and other shortestpath solvers.
5.2 Test problems
We evaluate our algorithm on three sets of test problems, which are also used in [2] and [21]. The first set, the planar problems,^{1} consists of ten instances, in which nodes have been randomly chosen as points in the plane, and the arcs are such that the resulting graph is planar; the ODpairs have been chosen at random. The grid problems (see footnote 1) collection contains 15 networks with a grid structure, meaning that each node has four incoming and four outgoing arcs; the ODpairs have been chosen at random. The third set consists of three telecommunication problems of various sizes. The arc cost functions have been generated as in [2, Section 8.1] for all the test problems.
Data for the test problems of Babonneau and Vial [2]
Problem ID  \(\mathcal {N}\)  \(\mathcal {A}\)  \(\mathcal {C}\)  \(\mathcal {S}\)  CPU (ms)  %SP 

Planar problems  
planar30  30  150  92  29  0.09  83.5 
planar50  50  250  267  50  0.19  85.8 
planar80  80  440  543  80  0.75  92.4 
planar100  100  532  1,085  100  0.92  89.8 
planar150  150  850  2,239  150  2.79  92.9 
planar300  300  1,680  3,584  300  9.55  94.1 
planar500  500  2,842  3,525  500  50.06  94.3 
planar800  800  4,388  12,756  800  185.88  95.1 
planar1000  1,000  5,200  20,026  1,000  130.19  88.3 
planar2500  2,500  12,990  81,430  2,500  1,382.11  91.8 
Grid problems  
grid1  25  80  50  23  0.06  78.1 
grid2  25  80  100  25  0.09  82.7 
grid3  100  360  50  40  0.17  77.9 
grid4  100  360  100  63  0.33  84.5 
grid5  225  840  100  83  0.73  80.8 
grid6  225  840  200  135  1.32  87.5 
grid7  400  1,520  400  247  5.39  76.0 
grid8  625  2,400  500  343  12.22  79.4 
grid9  625  2,400  1,000  495  21.03  78.6 
grid10  625  2,400  2,000  593  38.40  85.9 
grid11  625  2,400  4,000  625  74.58  90.0 
grid12  900  3,480  6,000  899  164.67  89.3 
grid13  900  3,480  12,000  900  317.34  91.9 
grid14  1,225  4,760  16,000  1,225  593.30  91.8 
grid15  1,225  4,750  32,000  1,225  1,180.56  92.5 
Telecommunication problems  
ndo22  14  22  23  5  0.00  30.5 
ndo148  61  148  122  61  0.17  66.5 
904  106  904  11,130  106  1.65  80.5 
5.3 Convexity weight rules
We have chosen to analyze the \(1/t\)rule [27, 30], the volume algorithm (VA) [7] and the proposed \(s^k\)rule for \(k=1, 2, 4,\) and \(10\) on the problem instances listed in Table 2. For the VA, we update the ergodic iterates by \(\overline{\mathbf{x}}^t = \beta \mathbf{x}^{t} + (1\beta )\overline{\mathbf{x}}^{t1}\), where \(\beta =0.1\), as proposed in [7]. We decided not to include the rule described in [44, Chapter 4], since for most of the problem instances, it did not reach the optimality threshold chosen within 10,000 iterations.
5.4 Results

\(\widehat{\alpha }\) represents the initial step length used in the subgradient algorithm (6) which was chosen as the integer power of \(10\) that yielded the best performance for each problem instance, and

for the \(1/t\)rule, the VA, and the \(s^k\)rules, the number of subgradient iterations required to reach an optimality gap below the given threshold, \(\varepsilon _{\text {opt}}\), are listed.
Problem ID  \(\widehat{\alpha }\)  \(1/t\)rule (\(s^0\)rule)  VA  \(s^k\)rule  

\(k=1\)  \(k=2\)  \(k=4\)  \(k=10\)  
Planar problems  
planar30  \(10^2\)  9,965  4,650  4,986  4,916  4,814  4,773 
planar50  \(10^2\)  1,846  350  273  252  252  273 
planar80  \(10^2\)  6,317  –  1,353  1,353  1,353  1,353 
planar100  \(10^1\)  1,497  477  271  266  266  266 
planar150  \(10^1\)  6,371  –  1,613  1,568  1,568  1,568 
planar300  \(10^0\)  1,122  493  338  332  314  332 
planar500  \(10^0\)  3,323  162  189  162  142  148 
planar800  \(10^{1}\)  1,458  510  524  468  428  415 
planar1000  \(10^{1}\)  –  –  –  –  –  – 
planar2500  \(10^{1}\)  –  –  –  –  –  – 
Grid problems  
grid1  \(10^{1}\)  434  164  162  162  162  163 
grid2  \(10^{1}\)  712  233  233  188  167  168 
grid3  \(10^{1}\)  648  143  143  136  114  136 
grid4  \(10^{1}\)  758  166  161  157  157  157 
grid5  \(10^{1}\)  755  156  155  137  137  140 
grid6  \(10^{1}\)  916  426  271  242  242  252 
grid7  \(10^{2}\)  277  138  138  126  119  132 
grid8  \(10^{2}\)  839  988  509  470  443  432 
grid9  \(10^{2}\)  400  232  159  149  149  175 
grid10  \(10^{2}\)  597  176  154  128  128  150 
grid11  \(10^{3}\)  720  534  470  436  413  406 
grid12  \(10^{3}\)  209  73  80  70  69  79 
grid13  \(10^{3}\)  374  77  96  78  74  78 
grid14  \(10^{3}\)  488  68  78  56  56  68 
grid15  \(10^{4}\)  338  210  214  195  185  185 
Telecommunication problems  
ndo22  \(10^0\)  \(2{,}279\)  68  56  21  15  15 
ndo148  \(10^0\)  341  80  80  80  76  76 
904  \(10^1\)  \(7{,}698\)  \(2{,}322\)  1,302  1,302  1,302  1,302 
Problem ID  \(\widehat{\alpha }\)  \(1/t\)rule (\(s^0\)rule)  VA  \(s^k\)rule  

\(k=1\)  \(k=2\)  \(k=4\)  \(k=10\)  
Planar problems  
planar30  \(10^{3}\)  116  49  54  46  46  62 
planar50  \(10^{3}\)  350  113  140  122  114  113 
planar80  \(10^{3}\)  360  132  146  132  129  131 
planar100  \(10^{4}\)  161  54  50  45  45  59 
planar150  \(10^{4}\)  1,732  7,365  736  599  564  456 
planar300  \(10^{5}\)  71  36  28  26  26  35 
planar500  \(10^{5}\)  112  40  36  28  26  27 
planar800  \(10^{6}\)  54  31  22  18  18  26 
planar1000  \(10^{6}\)  234  125  114  103  103  120 
planar2500  \(10^{7}\)  –  5,784  7,279  6,162  5,358  4,600 
Grid problems  
grid1  \(10^{4}\)  830  503  435  420  418  418 
grid2  \(10^{4}\)  –  –  –  –  –  – 
grid3  \(10^{4}\)  150  64  63  63  63  81 
grid4  \(10^{4}\)  348  157  144  136  135  143 
grid5  \(10^{4}\)  219  100  96  85  85  102 
grid6  \(10^{4}\)  884  793  515  488  465  462 
grid7  \(10^{5}\)  120  82  64  60  67  95 
grid8  \(10^{5}\)  697  3,004  448  423  414  431 
grid9  \(10^{5}\)  665  –  436  404  397  436 
grid10  \(10^{6}\)  5,683  –  5,226  5,191  5,177  5,163 
grid11  \(10^{7}\)  1,089  –  956  948  956  986 
grid12  \(10^{7}\)  147  229  101  98  106  142 
grid13  \(10^{8}\)  921  –  810  807  810  843 
grid14  \(10^{8}\)  103  144  81  81  89  121 
grid15  \(10^{9}\)  147  349  118  114  121  156 
Telecommunication problems  
ndo22  \(10^{1}\)  119  98  98  98  98  98 
ndo148  \(10^{2}\)  –  –  –  –  –  – 
904  \(10^{4}\)  627  456  451  444  444  444 
The \(s^k\)rule for \(k = 1, 2, 4,\) and \(10\) clearly outperforms both the \(1/t\)rule and the VA for the test instances. The best performance was shown by the \(s^4\)rule which reached the acquired relative optimality gap [defined in (31)] for 37 out of the 56 instances using the least number of iterations. For the problem instance where the \(s^4\)rule performed the poorest it still solved the problem within a factor \(\tau \approx 1.25\) times the number of iterations needed by the method which solved that instance within the least number of iterations. The VA (\(1/t\)rule) failed to obtain the given optimality threshold within 10,000 iterations for ten (five) problem instances, while the \(s^k\)rules failed on only four of the problem instances.
6 Conclusions and future research
We generalize the convergence results in [42] to convex optimization problems and extend the analysis in [30] to include more general convex combinations for creating the ergodic sequences.
The proposed \(s^k\)rule for choosing convexity weights for the primal iterates allows putting more weight on later iterates in the subgradient scheme. Computational results for three sets of NMFPs demonstrate that the \(s^k\)rule is convincing and shows a performance superior to that of previously proposed rules. Section 5 presents a comparison between different rules for choosing the convexity weights in the subgradient scheme, and should not be viewed as an attempt to provide a new, competitive solution method for the NMFP.
Since the convergence results are presented for general convex optimization problems, we have not analyzed the performance of the \(s^k\)rule specifically for linear programs. Preliminary numerical tests indicate, however, a similar performance.
Future interesting research includes an analysis of the performance of the \(s^k\)rule for other problems, for which subgradient schemes have proven to be successful. Examples are found within the fields of discrete optimization (e.g., Ceria et al. [11], Fisher [16]), network design (e.g., Balakrishnan et al. [5], Frangioni and Gendron [17]), and traffic assignment (e.g., Patriksson [36]).
The \(s^k\)rule is utilized together with harmonic series step lengths, and future research should also investigate convergence results and the practical performance of the rule when utilizing other step lengths, for example Polyak step lengths [39, Chapter 5.3]. We also aim at analyzing the convergence rate of the ergodic sequences in terms of infeasibility and suboptimality depending on the convexity weight rules utilized. Another extension of the results presented here would be to analyze the convergence of the ergodic sequences when allowing inexact solutions of the subproblems; such solutions would provide \(\varepsilon \)subgradients of the dual objective function (e.g., d’Antonio and Frangioni [12]).
We are currently investigating the feasibility and computational potential of using the \(s^k\)rule when employing other methods for solving the dual problem, for example augmented Lagrangian methods (e.g., Rockafellar [41], Bertsekas [9]), bundle methods (e.g., Lemaréchal et al. [31]) and ballstep subgradient methods (e.g., Kiwiel et al. [22, 23]).
Footnotes
 1.
Available at http://www.di.unipi.it/di/groups/optimize/Data/MMCF.html (Accessed 20131210).
References
 1.Anstreicher, K.M., Wolsey, L.A.: Two “wellknown” properties of subgradient optimization. Math. Program. 120, 213–220 (2009)CrossRefMATHMathSciNetGoogle Scholar
 2.Babonneau, F., Vial, J.P.: ACCPM with a nonlinear constraint and an active set strategy to solve nonlinear multicommodity flow problems. Math. Program. 120, 170–210 (2009)Google Scholar
 3.Babonneau, F., Vial, J.P.: ACCPM with a nonlinear constraint and an active set strategy to solve nonlinear multicommodity flow problems: a corrigendum. Math. Program. 120, 211–212 (2009)CrossRefMATHMathSciNetGoogle Scholar
 4.Bahiense, L., Maculan, N., Sagastizábal, C.: The volume algorithm revisited: relation with bundle methods. Math. Program. 94, 41–69 (2002)CrossRefMATHMathSciNetGoogle Scholar
 5.Balakrishnan, A., Magnanti, T.L., Wong, R.T.: A dualascent procedure for largescale uncapacitated network design. Oper. Res. 37, 716–740 (1989)CrossRefMATHMathSciNetGoogle Scholar
 6.BarGera, H.: Originbased algorithm for the traffic assignment problem. Transp. Sci. 36(4), 398–417 (2002)CrossRefMATHGoogle Scholar
 7.Barahona, F., Anbil, R.: The volume algorithm: producing primal solutions with a subgradient method. Math. Program. 87, 385–399 (2000)CrossRefMATHMathSciNetGoogle Scholar
 8.Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming Theory and Applications, 2nd edn. Wiley, New York (1993)Google Scholar
 9.Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, San Diego, CA (1982)MATHGoogle Scholar
 10.Burachik, R.S., Kaya, C.Y.: A deflected subgradient method using a general augmented Lagrangian duality with implications on penalty methods. In: Burachik, R.S., Yao, J.C. (eds.) Variational Analysis and Generalized Differentiation in Optimization and Control, Springer Optimization and Its Applications, vol. 47, pp. 109–132. Springer, New York (2010)CrossRefGoogle Scholar
 11.Ceria, S., Nobili, P., Sassano, A.: A Lagrangianbased heuristic for largescale set covering problems. Math. Program. 81, 215–228 (1998)MATHMathSciNetGoogle Scholar
 12.d’Antonio, G., Frangioni, A.: Convergence analysis of deflected conditional approximate subgradient methods. SIAM J. Optim. 20, 357–386 (2009)CrossRefMATHMathSciNetGoogle Scholar
 13.Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)CrossRefMATHMathSciNetGoogle Scholar
 14.Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)CrossRefMATHMathSciNetGoogle Scholar
 15.Ermol’ev, Y.M.: Methods of solution of nonlinear extremal problems. Cybernetics 2, 1–14 (1966)CrossRefGoogle Scholar
 16.Fisher, M.L.: The Lagrangian relaxation method for solving integer programming problems. Manag. Sci. 27, 626–642 (1991)Google Scholar
 17.Frangioni, A., Gendron, B.: 01 reformulations of the multicommodity capacitated network design problem. Discret. Appl. Math. 157, 1229–1241 (2009)CrossRefMATHMathSciNetGoogle Scholar
 18.Gallo, G., Pallottino, S.: Shortest path algorithms. Ann. Oper. Res. 13, 1–79 (1988)CrossRefMathSciNetGoogle Scholar
 19.Goffin, J.L., Gondzio, J., Sarkissian, R., Vial, J.P.: Solving nonlinear multicommodity flow problems by the analytic center cutting plane method. Math. Program. 76, 131–154 (1996)MathSciNetGoogle Scholar
 20.Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Program. 46, 105–122 (1990)CrossRefMATHMathSciNetGoogle Scholar
 21.Kiwiel, K.C.: An alternative linearization bundle method for convex optimization and nonlinear multicommodity flow problems. Math. Program. 130, 59–84 (2011)CrossRefMATHMathSciNetGoogle Scholar
 22.Kiwiel, K.C., Larsson, T., Lindberg, P.O.: The efficiency of ballstep subgradient level methods for convex optimization. Math. Oper. Res. 24, 237–254 (1999)CrossRefMATHMathSciNetGoogle Scholar
 23.Kiwiel, K.C., Larsson, T., Lindberg, P.O.: Lagrangian relaxation via ballstep subgradient methods. Math. Oper. Res. 32, 669–686 (2007)CrossRefMATHMathSciNetGoogle Scholar
 24.Kleinrock, L.: Communication Nets; Stochastic Message Flow and Delay. Dover, New York (1972)MATHGoogle Scholar
 25.Knopp, K.: Infinite Sequences and Series. Dover Publications, New York, NY (1956)MATHGoogle Scholar
 26.Larsson, T., Liu, Z.: A Primal Convergence Result for Dual Subgradient Optimization with Application to MultiCommodity Network Flows. Technical Report. Department of Mathematics, Linköping Institute of Technology (1989)Google Scholar
 27.Larsson, T., Liu, Z.: A Lagrangean relaxation scheme for structured linear programs with application to multicommodity network flows. Optimization 40, 247–284 (1997)CrossRefMATHMathSciNetGoogle Scholar
 28.Larsson, T., Liu, Z., Patriksson, M.: A dual scheme for traffic assignment problems. Optimization 42, 323–358 (1997)CrossRefMATHMathSciNetGoogle Scholar
 29.Larsson, T., Patriksson, M.: An augmented Lagrangean dual algorithm for link capacity side constrained traffic assignment problems. Transp. Res. Part B Methodol. 29(6), 433–455 (1995)CrossRefGoogle Scholar
 30.Larsson, T., Patriksson, M., Strömberg, A.B.: Ergodic, primal convergence in dual subgradient schemes for convex programming. Math. Program. 86, 283–312 (1999)CrossRefMATHMathSciNetGoogle Scholar
 31.Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995)CrossRefMATHGoogle Scholar
 32.Nedić, A., Ozdaglar, A.: Approximate primal solutions and rate analysis for dual subgradient methods. SIAM J. Optim. 19, 1757–1780 (2009)CrossRefMATHGoogle Scholar
 33.Nedić, A., Ozdaglar, A.: Subgradient methods for saddlepoint problems. J. Optim. Theory Appl. 142, 205–228 (2009)CrossRefMATHMathSciNetGoogle Scholar
 34.Nesterov, Y.: Primaldual subgradient methods for convex problems. Math. Program. Ser. B 120, 221–259 (2009)CrossRefMATHMathSciNetGoogle Scholar
 35.Ouorou, A., Mahey, P., Vial, J.P.: A survey of algorithms for convex multicommodity flow problems. Manag. Sci. 46, 126–147 (2000)CrossRefMATHGoogle Scholar
 36.Patriksson, M.: The Traffic Assignment Problem: Models and Methods. Topics in Transportation series, VSP, Utrecht, The Netherlands (1994)Google Scholar
 37.Polyak, B.T.: A general method of solving extremum problems. Sov. Math. Dokl. 8, 593–597 (1967)MATHGoogle Scholar
 38.Polyak, B.T.: Minimization of unsmooth functionals. Comput. Math. Math. Phys. 9, 14–29 (1969)CrossRefGoogle Scholar
 39.Polyak, B.T.: Introduction to Optimization. Optimization Software, Publications Division, NY (1987)Google Scholar
 40.Robinson, S.M.: Bundlebased decomposition: conditions for convergence. In: International Institute for Applied Systems Analysis (1987)Google Scholar
 41.Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex optimization. Math. Oper. Res. 1, 97–116 (1976)CrossRefMATHMathSciNetGoogle Scholar
 42.Sherali, H.D., Choi, G.: Recovery of primal solutions when using subgradient optimization methods to solve Lagrangian duals of linear programs. Oper. Res. Lett. 19, 105–113 (1996)CrossRefMATHMathSciNetGoogle Scholar
 43.Sherali, H.D., Lim, C.: On embedding the volume algorithm in a variable target value method. Oper. Res. Lett. 32, 455–462 (2004)CrossRefMATHMathSciNetGoogle Scholar
 44.Shor, N.Z.: Minimization Methods for NonDifferentiable Functions. Springer, Berlin (1985)CrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.