Primal convergence from dual subgradient methods for convex optimization

When solving a convex optimization problem through a Lagrangian dual reformulation subgradient optimization methods are favorably utilized, since they often find near-optimal dual solutions quickly. However, an optimal primal solution is generally not obtained directly through such a subgradient approach unless the Lagrangian dual function is differentiable at an optimal solution. We construct a sequence of convex combinations of primal subproblem solutions, a so called ergodic sequence, which is shown to converge to an optimal primal solution when the convexity weights are appropriately chosen. We generalize previous convergence results from linear to convex optimization and present a new set of rules for constructing the convexity weights that define the ergodic sequence of primal solutions. In contrast to previously proposed rules, they exploit more information from later subproblem solutions than from earlier ones. We evaluate the proposed rules on a set of nonlinear multicommodity flow problems and demonstrate that they clearly outperform the ones previously proposed.


Introduction and motivation
Lagrangian relaxation is a frequently utilized tool for solving large-scale convex minimization problems due to its simplicity and its property of systematically providing optimistic estimates on the optimal value. One popular tool for solving the dual problems of Lagrangian relaxation schemes is subgradient optimization. The advantage of subgradient methods is that they often find near-optimal dual solutions quickly, whilst a drawback is that near-optimal primal feasible solutions can not, in general, be obtained directly from the subgradient scheme. As the dual iterates in a subgradient scheme converge towards an optimal dual solution, primal convergence towards a near-optimal primal solution is not in general achieved by simply using the subproblem solutions as primal iterates. Even with a dual optimal solution at hand, an optimal primal solution can not easily be obtained. The reason for this inconvenience is that the dual objective function is typically nonsmooth at an optimal point, whence an optimal primal solution is a nontrivial convex combination of the extreme subproblem solutions.
This paper analyzes what is called ergodic sequences by Larsson et al. [30] or recovering primal solutions by Sherali and Choi [42]; we will use the notion ergodic sequences. To guarantee primal convergence for a linear program in a subgradient scheme, Shor [44,Chapter 4] and Larsson and Liu [27] (originally developed in [26]) utilize a strategy which, rather than using the subproblem solution as primal iterate, uses a convex combination of previously found subproblem solutions, denoted as an ergodic sequence. In [44,Chapter 4] the convex combinations are determined by the step lengths used in the subgradient scheme, while in [27,Theorem 3] the mean of the iterates previously found are used. These results are extended in [42] to a more general case of convex combinations and step lengths in the subgradient algorithm applied to linear programs. Larsson et al. [30] show that the convex combinations used in [44] and [27] yield primal convergence also for general convex optimization problems.
Several other methods for generating approximate primal solutions in a subgradient scheme have been studied. Barahona and Anbil [7] propose a method for approximating the solution to a linear program by utilizing a subgradient method in which a primal solution is created as a convex combination of the previous solution and the primal iterate obtained from the subgradient method. The method is denoted the volume algorithm and was revisited by Bahiense et al. [4] and Sherali and Lim [43], where they extended it to include more information in the dual scheme. Nesterov [34] analyzes a primal-dual subgradient method where a primal feasible approximation to the optimum is obtained by using control sequences in both the primal and dual space. Nedić and Ozdaglar [32,33] study methods which utilize the average of all previously found iterates as primal solutions. The latter algorithms employ a constant step length due to its simplicity and practical significance. For a more thorough overview of the history of strategies for the construction of primal iterates in dual subgradient schemes, see Anstreicher and Wolsey [1].
This paper generalizes the results in [42] to the class of convex programs, and extends the results in [30] to include more general convex combinations in the definition of the ergodic sequences. We present a new set of rules for constructing the convexity weights defining the ergodic sequence of primal iterates. In contrast to rules previously utilized, they exploit more information from later subproblem solutions than from earlier ones. We evaluate the new rules on a set of nonlinear multicommodity flow problems (NMFPs) and show that they clearly outperform the previously utilized ones.
The remainder of this paper is organized as follows. In Sect. 2 we introduce some basic concepts regarding Lagrangian relaxation and subgradient methods. In Sect. 3, we describe the notion of primal ergodic sequences and present an important theorem regarding their convergence when considering general convex problems. Section 3 also includes a new set of rules for choosing convexity weights when defining the ergodic sequences. The final part of Sect. 3 includes a taxonomy of previous results and their connection to the results presented in this paper. In Sect. 4 we introduce the NMFP and describe a solution approach based on Lagrangian relaxation. Computational results for a set of NMFP test instances employing the new rules for choosing the convexity weights are presented in Sect. 5. Conclusions are then drawn in Sect. 6.

Background
Let f : R n → R and h i : R n → R, i ∈ I := {1, . . . , m}, be convex and (possibly) nonsmooth functions and the set X ⊂ R n be convex and compact. Consider the program with solution set X * ⊂ R n . We assume that the set X is simple and that the feasible set {x ∈ X | h i (x) ≤ 0, i ∈ I} is nonempty, implying that the solution set X * is also nonempty. We define the Lagrange function L : R n × R m → R with respect to the relaxation of the constraints (1b) as L(x, u) : The set X is compact which implies that θ is continuous [8, Theorem 6.3.1] on R m . The function θ is also concave on R m and the nonempty and compact solution set for the subproblem in (2) at u ∈ R m is For u ∈ R m + , the set X (u) is also convex. The Lagrangian dual problem is defined as whose solution set U * ⊆ R m + is convex. By weak duality, the inequality θ(u) ≤ f (x) holds for all u ∈ R m + and all x ∈ X such that h(x) ≤ 0 m [8, Theorem 6.2.1]. Let S be a nonempty, closed, and convex set. We define the projection and distance operators by Note that the function dist(·, S) is convex and continuous. We denote by closed map a point-to-set map X : for all t, and {x t } → x, this implies x ∈ X (u). The following result states that the point-to-set map X (·) defined in (3) is a closed map.
: R m → 2 X be given by the definition (3), and the sequence {x t } be given by For each u ∈ R m , we define the set of indices corresponding to strictly positive multiplier values as From Lemma 2 follows that for all u ∈ R m + and every i ∈ I(u), ∂h i is constant on rint X (u); hence for every x ∈ rint X (u), each subgradient ξ ∈ ∂h i (x) defines a hyperplane that supports the function h i at every x∈ rint X (u). We define the subdifferential of θ at u ∈ R m as the set To obtain primal-dual optimality relations, we assume Slater's constraint qualification as stated in Assumption 1.

Subgradient optimization
We consider solving the Lagrangian dual program by the subgradient optimization method. We start at some u 0 ∈ R m + and compute iterates u t according to where x t ∈ X (u t ) solves the subproblem defined in (2) at u t , implying that h(x t ) ∈ ∂θ(u t ), α t > 0 is the step length chosen at iteration t, and [ · ] + denotes the Euclidian projection onto the nonnegative orthant R m + . For some early development of the theory of the subgradient optimization method, see Shor [44,Chapter 2], Polyak [37,38], and Ermol'ev [15]. The convergence of the method (6) for the special case of a divergent series step length rule is established in the following proposition.

Ergodic primal convergence
In this section, we introduce the notion of an ergodic sequence and present two important results regarding the convergence of ergodic sequences depending on convexity weights and step lengths. Assume that the method (6) is applied to the problem (4). At each iteration t, an ergodic primal iterate x t is composed according to where x s is the primal solution found in iteration s, i.e., the solution to the subproblem defined in (2). The vector x t thus is a convex combination of all previous subproblem solutions. We define and Assumption 2 (relations between convexity weights and step lengths) The step lengths α t and the convexity weights μ t s are chosen such that the following conditions are satisfied: A1: γ t s ≥ γ t s−1 , s = 1, . . . , t − 1, t = 2, 3, . . ., A2: Δγ t max → 0 as t → ∞, and A3: γ t 0 → 0 as t → ∞ and, for some Γ > 0, γ t t−1 ≤ Γ for all t.
The condition A1 requires that μ t s /μ t s−1 ≥ α s /α s−1 , s = 1, . . . , t − 1, t = 1, 2, . . .. This can be interpreted as the requirement that whenever the step length at iteration s (α s ) is larger than the previous one at iteration s − 1 (α s−1 ), the corresponding convexity weight (μ t s ) should be larger than the previous one (μ t s−1 ). By condition A2, the difference between each pair of subsequent convexity weights tends to zero as t increases, meaning that no primal iterate should be completely neglected. Condition A3 assures that, for decreasing step lengths, the convexity weights decrease at a rate not slower than that of the step lengths.
We now present a special case of a result of Silverman and Toeplitz (proven in [25]) which will be utilized in the analysis to follow.

Feasibility in the limit
We here show that, assuming convergence towards a dual feasible point in the subgradient method (6), and that the step lengths, α t , and convexity weights, μ t s , are chosen such that Assumption 2 is fulfilled, the ergodic sequence of iterates, x t , converges to the set of primal feasible solutions.
Proposition 4 (feasibility of x t in the limit) Suppose that the method (6) operated with a suitable step length rule yields {u t } → u ∞ ∈ R m + . If the step lengths α t and convexity weights μ t s fulfill Assumption 2, then the sequence {x t } generated according to (8) fulfills where the inequalities in (10a) follow from the convexity of the function h and the iteration formula (6), respectively. By the condition A3 in Assumption 2, the first term in (10d) tends to 0 m as t → ∞. From the hypothesis, {u t } → u ∞ and by condition A3 in Assumption 2, γ t t−1 ≤ Γ holds, implying that the second term in (10d) tends to 0 m as t → ∞.
We need to show that σ t → 0 m as t → ∞. Given any ε > 0, let κ ≥ 1 be large enough so that ||u ∞ − u s || ≤ ε/2Γ holds for all s ≥ κ + 1. Then, for t ≥ κ + 2 and large enough so that Δγ t max κ s=1 ||u ∞ − u s || ≤ ε/2 holds, we have that where the inequality (11a) follows from the triangle inequality and condition A1 in Assumption 2, and the inequality (11b) from condition A2 in Assumption 2 and the assumption that κ ≥ 1 is large enough. The inequality (11c) follows from the assumption that t ≥ κ + 2 is large enough, and the inequality (11d) follows from the condition A3 in Assumption 2 and the fact that γ t κ ≥ 0. Since ε > 0 is arbitrary, we deduce that σ t → 0 m as t → ∞. It follows that Proposition 4 states that as long as the sequence {u t } of dual iterates converges to some feasible point in the Lagrangian dual problem (4), and the step lengths and convexity weights are appropriately chosen, the corresponding sequence of primal iterates defined by (8) will produce a primal feasible solution in the limit. If there is an accumulation point x such that {x t } → x, then Proposition 4 states that x is feasible in the original problem (1). If the functions f and h i , i ∈ I, are affine, and the set X is a polytope, then Proposition 4 reduces to [42,Theorem 1].
Note that the conditions A2 and A3 of Assumption 2 are fulfilled if condition A1 in Assumption 2 holds together with the condition that γ t t−1 → 0 as t → ∞. Below, we present a result for strengthened assumptions on the convexity weights and step lengths, but where the sequence {u t } is only assumed to be bounded. (6) is bounded, and that condition A1 of Assumption 2 holds along with the condition that

Corollary 1 (bounded dual sequence) Suppose that the sequence {u t } generated by the formula
Proof From the relations (10a)-(10b) and the condition A1 of Assumption 2 follows that h( Note that, under the assumptions of Corollary 1, any accumulation point x to the sequence {x t } is feasible in (1).

Optimality in the limit
We next establish-assuming that Slater's constraint qualification (Assumption 1) is fulfilled-primal convergence to the set of optimal solutions X * of the problem (1) as long as the step lengths, α t , and the convexity weights, μ t s , are chosen to satisfy Assumption 2.
Theorem 1 (optimality of x t in the limit) Suppose Assumption 1 holds and that the subgradient method (6) operated with a suitable step length rule attains dual convergence, i.e., {u t } → u ∞ ∈ R m + , and let x t be generated according to (8). If the step lengths α t and the convexity weights μ t s satisfy Assumption 2, then Proof From Proposition 4 follows that lim sup t→∞ h(x t ) ≤ 0 m and x t ∈ X, t ≥ 1.
In view of Proposition 2, it suffices to show that {dist(x t , X (u ∞ ))} → 0 and that By the convexity and nonnegativity of the function dist(·, S), and the definition (8), the inequalities Therefore, by the iteration formula (6), it holds that The function h i is uniformly continuous over the compact set X , so for every δ > 0 there exists an ε > 0 such that for any x with dist(x, X (u ∞ )) ≤ ε, the inequality holds. If rint X (u ∞ ) = ∅, i.e., the set X (u ∞ ) is a singleton, the same reasoning holds when {x} = X (u ∞ ). From Lemma 1, we know that {dist(x s , X (u ∞ ))} → 0 as t → ∞, and hence, for some fixed κ ≥ τ + 1, the inequality dist(x s , X (u ∞ )) ≤ ε holds for all s ≥ κ. Therefore, it holds that Hence, for t ≥ κ + 1, we have that where the inequality in (13a) follows from the convexity of h i , the inequality (13b) follows from (12), and the inequality (13c) from the fact that μ t s ≤ 1 for s = 0, . . . , t − 1. For t ≥ κ, under the conditions A1-A3 of Assumption 2, we have that By the same reasoning as in the proof of Proposition 4 [the inequalities (11)], for large enough values of t, the inequality t−1 for large enough values of t ≥ κ + 1. Therefore, lim inf t→∞ h i (x t ) ≥ 0 holds. From Proposition 4 follows that lim sup t→∞ h i (x t ) ≤ 0. We deduce that lim t→∞ h i (x t ) = 0. Since this result holds for all i ∈ I(u ∞ ), and since u ∞ i = 0 whenever i ∈ I \I(u ∞ ), it follows that By Proposition 2, the theorem then follows.

A new rule for choosing the convexity weights when utilizing harmonic series step lengths
We now study the special case when the step lengths α t are chosen according to a harmonic series, i.e., This choice of step lengths was used by Larsson and Liu [27], and Larsson et al. [30] and guarantees, according to Proposition 3, convergence to a dual optimum. We where μ t s are the convexity weights employed in (8).
Assumption 3 (convexity weights when employing harmonic step lengths) The convexity weights are chosen to satisfy Condition B1 states that the convex combinations x t , defined in (8), should put more weight on later observations (that is, primal subproblem solutions x t ). Condition B2 states that no particular primal iterate should be favoured, meaning that the difference between the weights for two subsequent iterates should tend to zero. Condition B3 states that the convexity weights μ t t−1 should decrease at a rate not lower than 1/t as t → ∞.
Consider the following result.
Proof From (14) follows that the inequality α s ≤ α s−1 holds for all s ≥ 1, which implies that γ t Hence, the condition A1 in Assumption 2 is fulfilled. Next, we have that which implies that where the inequality (15c) follows by maximizing each term in (15b) separately. The first term in (15c) tends to zero due to condition B2 in Assumption 3, the second term converges to zero due to the conditions B1 and B3 in Assumption 3 and the third term converges to zero due to the condition B2 in Assumption 3. Hence, the condition A2 in Assumption 2 is fulfilled. We also have that for any t ≥ 1, which implies that the condition A3 in Assumption 2 is satisfied.
Larsson et al. [30] show that by using the convexity weights μ t s = 1/t, primal convergence can always be guaranteed for the harmonic series step lengths (14). We here refer to this rule for creating an ergodic sequence as the 1/t-rule; it was first analyzed by Larsson and Liu [27], who prove convergence for the case when (1) is a linear program. Clearly, the 1/t-rule fulfills the conditions of Corollary 5; hence the primal convergence of the 1/t-rule is a special case of the analysis above.
One drawback of the 1/t-rule is the fact that when creating the ergodic sequences of primal solutions, all previously found iterates are weighted equally. We expect that later subproblem solutions in the subgradient method are more likely to belong to the set of optimal solutions to the subproblem (2) at a dual optimal solution, u * ∈ U * . We therefore propose a more general set of rules for creating ergodic sequences of primal iterates which allows for later primal iterates to receive larger convexity weights.
Definition 1 (the s k -rule) Let k ≥ 0. The s k -rule creates an ergodic sequence by choosing convexity weights according to Note that the s 0 -rule is equivalent to the 1/t-rule. For k > 0, the s k -rule results in an ergodic sequence in which the later iterates are assigned higher weights than the earlier ones. For larger values of k, the weights are shifted towards later iterates. In Fig. 1, the convexity weights μ t s are illustrated for t = 10 and k ∈ {0, 1, 2, 10}. The following proposition establishes primal convergence for the ergodic sequence created  Fig. 1 Illustration of the convexity weights, μ t s , when t = 10, for the s k -rule when k = 0, 1, 2, 10 respectively by the s k -rule, given that a harmonic series step length is utilized when solving the Lagrangian dual problem (4).

Proposition 6 (the s k -rule satisfies Assumption 3) The convexity weights μ t s , chosen according to Definition 1, fulfill Assumption 3.
Proof The convexity weights μ t s clearly fulfill the condition B1 of Assumption 3. For any t ≥ 2, it holds that where the maximum is obtained for s = 1 when 0 ≤ k < 1, and for s = t − 1 when k ≥ 1. Noting that it follows that, for 0 ≤ k < 1, For k ≥ 1, it holds that as t → ∞. Hence, condition B2 of Assumption 3 holds. Condition B3 in Assumption 3 holds, due to the fact that where the first inequality follows from (16).
One should note that when constructing the ergodic iterate, x t , defined by the s krule, not all previously found iterates, x s , s = 0, . . . , t − 1, have to be stored since the ergodic iterate can be updated by Hence, in iteration t, only the previous ergodic iterate, x t−1 , and the new primal iterate, x t , are required for constructing the new ergodic iterate, x t . We now summarize the results obtained in this section in the following theorem.
Theorem 2 (convergence of the s k -rule) Suppose Assumption 1 holds, that u t is generated by the subgradient method (6) operated with harmonic series step lengths (14), and that x t is generated according to (8), where the convexity weights are chosen according to the s k -rule (Definition 1). Then, Proof By Proposition 3, it follows that u t → u ∞ ∈ U * . Using Propositions 5 and 6 yields that the assumptions of Theorem 1 hold, which completes the proof.

Connection with previous results
In this section, we present some previous proposals for choosing step lengths and convexity weights. For simplicity, we define In the volume algorithm developed by Baharona and Anbil [7], each ergodic iterate is constructed as a convex combination of the previous ergodic iterate and the new primal iterate, i.e., x t = βx t + (1 − β)x t−1 , where 0 < β < 1. Translating this into the framework of our analysis, this is equivalent to letting In the taxonomy in Table 1, we present the following attributes of the previously developed algorithms which utilize the subgradient method to solve the dual problem:

Problem
Type of problem considered. For the case when problem (1) is a linear program, the assumptions are that f and h i , i ∈ I, are affine functions and that X is a nonempty and bounded polyhedron. This is denoted in the table as LP. The case of a general convex optimization problem is denoted CP Step lengths The step lengths α t employed in the subgradient method (6).
Step lengths defined according to (14) are denoted Harmonic series. If the step lengths fulfill α t > 0, lim t→∞ α t = 0 and lim t→∞ A t = ∞, we denote this by Divergent series and if also lim t→∞ B t < ∞, we denote this by Divergent series, QB (quadratically bounded) Conv. weights The convexity weights, μ t s , defined in (8), defining the ergodic sequences of primal iterates Theorem 1 Whether or not Theorem 1 guarantees the convergence of the method Theorem 2 Whether or not Theorem 2 guarantees the convergence of the method Since the work presented in this paper utilizes the traditional subgradient method to solve the dual problem, we only include algorithms which employ this method for the dual problem in Table 1. More sophisticated methods for solving the dual problem include deflected conditional subgradient methods (d'Antonio and Frangioni [12], Burachik and Kaya [10]), bundle methods (Lemaréchal et al. [31], Kiwiel [20]), augmented Lagrangian methods (Rockafellar [41], Bertsekas [9]), and ballstep subgradient methods (Kiwiel et al. [22,23]). All of these methods utilize information from previously computed subgradients when updating the iterates in the subgradient scheme. In order to approximate the primal solutions, the convexity weights defining the primal iterates are then acquired from the information obtained in these dual schemes (e.g., Robinson [40]).

Applications to multicommodity network flows
We apply subgradient methods to a Lagrangian dual of the NMFP. Primal solutions are computed from ergodic sequences of subproblem solutions. For a more thorough description of multicommodity flow problems and solution methods for these, see [19,35,36].

The nonlinear multicommodity network flow problem
Consider a graph G = (N , A) with a node set N and a set of directed arcs A. There is a set C ⊆ N × N of origin-destination pairs (OD-pairs). For each pair k ∈ C, there is a flow demand, d k > 0, associated with a specific commodity. We denote the set of simple routes from the origin to the destination of OD-pair k by R k , and the flow on route r ∈ R k by h kr . Let [δ kra ] r ∈R k ,k∈C,a∈A be an arc-route incidence matrix for G, with δ kra = 1, if route r ∈ R k contains arc a ∈ A, 0, otherwise.
The flow on arc a ∈ A is denoted by f a and is defined by the route flows h kr through To each arc a ∈ A, we associate a convex cost function g a : R + → R + of its flow f a . The NMFP then is the program One should note that the constraints (20e) are implied by (20c) and (20d), and do not have to be incorporated explicitly in the model. We will consider two definitions of the arc cost functions, g a ( f a ), a ∈ A. The first, BPR (Bureau of Public Roads) nonlinear delay, is used in the field of transportation (e.g., [6,29]) and is defined as where r a ≥ 0 is the free-flow travel time and c a > 0 is the practical capacity of arc a ∈ A. The parameters p a ≥ 0 and q a ≥ 0 are arc-specific. The second, Kleinrock's average delays, is used in the field of telecommunications (e.g., [24,35]) and is defined as where c a , a ∈ A, are the arc capacities.

A Lagrangian dual formulation
For the NMFP, i.e., the program (20), we utilize a Lagrangian dual approach in which the arc flow defining constraints (20d) are relaxed. For a more thorough explanation of the Lagrangian reformulation, see [28]. The resulting Lagrangian subproblem essentially consists of solving one shortest path problem for each commodity k ∈ C. Let u = [u a ] a∈A be the multipliers associated with the constraints (20d). We define the Lagrangian dual objective function by where, for each k ∈ C and any u ∈ R |A| , φ k (u) is the optimal value of the shortest simple route subproblem, with arc costs u a , a ∈ A, given by h kr ≥ 0, r ∈ R k , and with solution set For each a ∈ A and any u a ∈ R, ξ a (u a ) is the optimal value of the single-arc subproblem with solution f a (u a ) ⊆ R + . For the cost functions (21) and (22), f a (u a ) can be expressed in closed form as where (g a ) −1 is the continuous inverse mapping of the derivative of g a at u a . The function θ : R |A| → R is the sum of |C| concave and piecewise linear functions φ k , k ∈ C, and |A| concave and differentiable functions ξ a , a ∈ A. It is thus finite, continuous, concave, and subdifferentiable on R |A| . Its subdifferential mapping at u ∈ R |A| equals the bounded polyhedron By weak duality, θ(u) ≤ z * holds for all u ∈ R |A| . Consider an arbitrary u ∈ R |A| , and define u a = max{u a , g a (0)}, a ∈ A. Then, , k ∈ C, since u ≥ u, and it follows that θ( u) ≥ θ(u). Since the Lagrangian dual objective function θ , defined in (23), is to be maximized on R |A| , one can, without loss of generality, impose the restriction u a ≥ g a (0), a ∈ A. The Lagrangian dual can thus be stated as with solution set U * ⊆ R |A| .
Proposition 7 (primal-dual optimality, [30, Proposition 6]) Let u * ∈ U * . Then, strong duality holds, that is, θ * = θ(u * ) = z * . Further, f * a = f a (u * a ), a ∈ A, and Proposition 7 states that the optimal arc flow [ f * a ] a∈A is obtained from the solutions to the subproblems ξ a (u * a ), a ∈ A. However, an optimal route flow pattern [h * kr ] r ∈R k ∈ H * k is, in general, not directly available from the solution to the subproblem (24). This depends on the set k∈C H k (u * ) typically not being a singleton, since the functions φ k , k ∈ C, typically are nonsmooth at u * .

The algorithm
We solve the Lagrangian dual problem (28) by the subgradient method defined in (6). By aggregating the feasible shortest route flow pattern [h kr (u t )] r ∈R k ,k∈C into a feasible arc flow solution, defined by . Since all arc flows y s a , a ∈ A, s = 0, . . . , t − 1, are feasible, the ergodic iteratef t a will also be feasible in (20) for t ≥ 1. The ergodic iteratesf t a are updated analogously to the update of x t in (17). In each iteration t ≥ 0, we obtain a lower bound, z t , and an upper bound, z t , on the optimal objective value z * , according to

Numerical tests and results
We now utilize the subgradient approach described in Sect. 4.3 on a set of convex multicommodity flow problems to evaluate the performance of a number of different rules for choosing the convexity weights defining the ergodic sequences.

Implementation issues
The algorithm described in Sect. 4 (24), we use Dijkstra's algorithm [13] as implemented in the subroutine L2QUE described in [18]. In every iteration, Dijkstra's algorithm is called |S| times, where S ⊆ N is the union of all origin nodes of the OD set C. No comparisons have been made between this implementation and other shortest-path solvers.
In the dual subgradient method (6), we adopt a harmonic series step length α t = α/(t + 1), t = 0, 1, . . . , where α is chosen for each specific problem instance. The subgradient algorithm is terminated when the relative optimality gap is below a specified limit, ε opt > 0, i.e., when where z t and z t are the upper and lower bounds defined in (30).

Test problems
We evaluate our algorithm on three sets of test problems, which are also used in [2] and [21]. The first set, the planar problems, 1 consists of ten instances, in which nodes have been randomly chosen as points in the plane, and the arcs are such that the resulting graph is planar; the OD-pairs have been chosen at random. The grid problems (see footnote 1) collection contains 15 networks with a grid structure, meaning that each node has four incoming and four outgoing arcs; the OD-pairs have been chosen at random. The third set consists of three telecommunication problems of various sizes. The arc cost functions have been generated as in [2,Section 8.1] for all the test problems. In Table 2, the characteristics of the problems are given, where |N | is the number of nodes, |A| is the number of arcs, |C| is the number of commodities and |S| is the number of calls to Dijkstra's algorithm needed in each iteration. Note that the characteristics are taken from [21] since some values in [2] are incorrect; see [3]. We also include in Table 2 some computational characteristics of the subgradient algorithm described in Sect. 4.3, CPU time and time spent on shortest path problems.

Convexity weight rules
We have chosen to analyze the 1/t-rule [27,30], the volume algorithm (VA) [7] and the proposed s k -rule for k = 1, 2, 4, and 10 on the problem instances listed in Table 2. For the VA, we update the ergodic iterates by x t = βx t + (1 − β)x t−1 , where β = 0.1, as proposed in [7]. We decided not to include the rule described in [44,Chapter 4], since for most of the problem instances, it did not reach the optimality threshold chosen within 10,000 iterations.

Results
Tables 3 and 4 present our results when defining the arc cost functions as the BPR congestion function (21) and the Kleinrock function (22), respectively. In these tables, -α represents the initial step length used in the subgradient algorithm (6) which was chosen as the integer power of 10 that yielded the best performance for each problem instance, and -for the 1/t-rule, the VA, and the s k -rules, the number of subgradient iterations required to reach an optimality gap below the given threshold, ε opt , are listed.
We impose a limit of 10,000 iterations, and denote by a dash ('-') when the optimality gap did not reach below ε opt within this limit. In Fig. 2, the performance profiles [14] for the methods are illustrated for the 56 test problems considered (28 using the BRP congestion function and 28 using the Kleinrock delay function). The graphs in the figure represent the portion of problems solved (that is, attained an optimality gap below the given threshold ε opt ) within a factor τ times the number of iterations needed by the method that reached the threshold within the least number of iterations.
The s k -rule for k = 1, 2, 4, and 10 clearly outperforms both the 1/t-rule and the VA for the test instances. The best performance was shown by the s 4 -rule which reached the acquired relative optimality gap [defined in (31)] for 37 out of the 56 instances using the least number of iterations. For the problem instance where the s 4 -rule performed the poorest it still solved the problem within a factor τ ≈ 1.25 times the number of iterations needed by the method which solved that instance within the least number of iterations. The VA (1/t-rule) failed to obtain the given optimality threshold within 10,000 iterations for ten (five) problem instances, while the s k -rules failed on only four of the problem instances.

Conclusions and future research
We generalize the convergence results in [42] to convex optimization problems and extend the analysis in [30] to include more general convex combinations for creating the ergodic sequences.
The proposed s k -rule for choosing convexity weights for the primal iterates allows putting more weight on later iterates in the subgradient scheme. Computational results for three sets of NMFPs demonstrate that the s k -rule is convincing and shows a performance superior to that of previously proposed rules. Section 5 presents a comparison between different rules for choosing the convexity weights in the subgradient scheme, and should not be viewed as an attempt to provide a new, competitive solution method for the NMFP.
Since the convergence results are presented for general convex optimization problems, we have not analyzed the performance of the s k -rule specifically for linear programs. Preliminary numerical tests indicate, however, a similar performance.
Future interesting research includes an analysis of the performance of the s k -rule for other problems, for which subgradient schemes have proven to be successful. Examples Table 3 Results for the BPR congestion function (21): the number of iterations until the relative optimality gap defined in (31)    The s k -rule is utilized together with harmonic series step lengths, and future research should also investigate convergence results and the practical performance of the rule when utilizing other step lengths, for example Polyak step lengths [39,Chapter 5.3]. We also aim at analyzing the convergence rate of the ergodic sequences in terms of infeasibility and sub-optimality depending on the convexity weight rules utilized. Another extension of the results presented here would be to analyze the convergence of the ergodic sequences when allowing inexact solutions of the subproblems; such solutions would provide ε-subgradients of the dual objective function (e.g., d'Antonio and Frangioni [12]).
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.