Benders decomposition with adaptive oracles for large scale optimization

This paper proposes an algorithm to efficiently solve large optimization problems which exhibit a column bounded block-diagonal structure, where subproblems differ in right-hand side and cost coefficients. Similar problems are often tackled using cutting-plane algorithms, which allow for an iterative and decomposed solution of the problem. When solving subproblems is computationally expensive and the set of subproblems is large, cutting-plane algorithms may slow down severely. In this context we propose two novel adaptive oracles that yield inexact information, potentially much faster than solving the subproblem. The first adaptive oracle is used to generate inexact but valid cutting planes, and the second adaptive oracle gives a valid upper bound of the true optimal objective. These two oracles progressively “adapt” towards the true exact oracle if provided with an increasing number of exact solutions, stored throughout the iterations. These adaptive oracles are embedded within a Benders-type algorithm able to handle inexact information. We compare the Benders with adaptive oracles against a standard Benders algorithm on a stochastic investment planning problem. The proposed algorithm shows the capability to substantially reduce the computational effort to obtain an ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document}-optimal solution: an illustrative case is 31.9 times faster for a 1.00%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.00\%$$\end{document} convergence tolerance and 15.4 times faster for a 0.01%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.01\%$$\end{document} tolerance.


Introduction
In this paper we consider problems that can be expressed in the form where g(x i , c i ) is the optimal solution of the LP subproblem SP : The set of problems I is finite, the x i are (possibly overlapping) subvectors of x, Y is a convex polyhedron, and the π i are non-negative constants. The coefficient matrices A, B, and C are the same in every subproblem so the subproblems differ only through the value of their parameters, x i and c i . The c i are vectors of coefficients and the x i are vectors of variables for (1) and parameters for (2). However, elements of x i may be set to fixed values in x ∈ X , so become equivalent to parameters specific to subproblem i. The inclusion of the matrices B and C allow the dimensions of x i and c i to differ from (and in the case we shall present be much smaller than) the number of explicit constraints and variables y i in each subproblem. The function g(x i , c i ) has properties that can be exploited in the solution procedure. It is a saddle function, convex w.r.t. x i , and concave w.r.t. c i . Moreover, we focus on problems where g(x i , c i ) is also a decreasing function of x i and an increasing function of c i . Monotonicity can be a natural property of the problem, e.g. when C ≥ 0, y ≥ 0, and B ≥ 0, or the problem can be usually rearranged to have this property.
An example of a problem with the above structure is an investment planning problem. Here the x represents investment decisions with corresponding investment cost f (x). The investments affect a set I of situations, and x i is the subvector of x that represents the investments that affect the situation i, c i specifies the operational costs, y i defines the operational decisions, and g(x i , c i ) gives the optimal operating cost. The test case we present in this paper is a stochastic problem, where the situations are the different possible scenarios that might occur in the future, each of which is weighted by the probability π i of that situation occurring. Note that in a multistage planning problem the sum of the π i at each stage is equal to 1, so i∈I π i ≥ 1.

Literature review
In the standard Benders decomposition algorithms [2,16] for problem (1) a sequence of relaxations is solved. At each iteration j the relaxed master problem is where ( j−1) i is the set of cutting planes associated with subproblem i built up to iteration j − 1. To generate a new cutting plane at iteration j, we first solve the RMP and obtain the optimal solution x ( j) . Then, for each i ∈ I we call an oracle that, for given x ( j) i , returns the optimal objective value θ i . Such an oracle that generates an exact value of g and its subgradient will be referred to as exact. The newly generated cutting plane is added to The standard Benders algorithm requires calling the exact oracle |I| times at each iteration j. Consequently, when the number of subproblems is large and the exact oracle is hard to solve (e.g., the problem (2) is very large), the computational efficiency of these methods is badly affected. This motivates the investigation of Benders-type algorithms able to exploit the information of inexact oracles. An inexact oracle only provides an estimate of θ ( j) i and λ ( j) i , but possibly much faster than an exact oracle. Inexact oracles can have different characteristics. They may or may not guarantee to generate valid bounds and cutting planes. Additionally it may or may not be possible to control the accuracy of the oracle, and bounds on the approximation error may or may not be known. An example of an inexact oracle is the one proposed in [11]. The authors suggest solving only a subset I ex ⊂ I of subproblems and using fast techniques to evaluate approximate values of θ ( j) i and λ ( j) i for subproblems i ∈ I\I ex . Using a concept of collinearity the authors group similar subproblems, solve one subproblem per group, and derive an approximate solution for the remaining subproblems of each group. Similar inexact oracles that do not provide a guaranteed bound can be found in [6][7][8]14]. Zakeri et al. [18] propose solving each subproblem i ∈ I up to a predefined tolerance which is then tightened over time to ensure convergence. The generated cutting planes are all valid and asymptotically exact. Similar inexact but always valid oracles are proposed by [4,5]. The use of any of the above types of oracles leads to algorithms that are fast at the early iterations but are potentially almost as slow as exact oracles towards the end of the iterative procedure. As a matter of fact, in order to obtain high accuracy, the approach of [11] would need to solve almost all the set of subproblems (i.e., I ex ≈ I), and the inexact oracle of [18] would solve the subproblems up to a very tight tolerance.
An alternative approach to constructing inexact oracles is to exploit some known properties of the subproblems that allow valid cuts to be generated for subproblems from solution of different problems. This approach has been used in stochastic dual dynamic programming (Pereira et al. [12]), for example by exploiting convexity of the recourse function w.r.t. the uncertain parameters. Stochastic dual dynamic programming is also used to solve minimax problems, which can arise from the inclusion of risk-aversion within the model. This leads to saddle functions similar to g(x i , c i ), i.e., convex in some directions (x i ) and concave in others (c i ). Philpott et al. [13] exploit this property and construct an upper bound on the recourse function using an inner approximation. However, an inner approximation is infeasible if the point of interest is not in the convex hull of known points, so the authors propose to initially compute a solution for every extreme point of the uncertainty set. Work [1] proposes using penalties to deal with infeasibility and obtains valid lower and upper bounds to saddle functions without sampling all the vertices of the uncertainty set.

Approach and contributions
This paper proposes two new inexact oracles, which we refer to as adaptive oracles. The first adaptive oracle approximates g(x i , c i ) from below and it is called to generate inexact but valid cuts of those subproblems that are not solved at an iteration j. The second adaptive oracle approximates g(x i , c i ) from above and it is called to obtain a valid upper bound when a subproblem is not solved. The adaptive oracles exploit properties of g(x i , c i ) such as convexity w.r.t. x i and concavity w.r.t. c i . In addition, they require g(x i , c i ) to be a monotonic function of both x i and c i . The adaptive oracles use the knowledge of m solutions of g(x i , c i ), known by having solved some subproblems in the previous iterations. Increasing the number m of known solutions, makes both oracles progressively "adapt" towards the true function g(x i , c i ).
The proposed adaptive oracles are asymptotically exact oracles, since they provide valid upper and lower bounds of g(x i , c i ), a valid subgradient, and both bounds tend toward g(x i , c i ) as the number of iterations grows. However, they have properties that, combined, distinguish them from the inexact oracles available in the literature [5][6][7][8]10,11,14,15,17,18]. First, the computational effort required to solve the adaptive oracles is independent of the size and complexity of the exact oracle (2), and only depends on the number m of known solutions. Second, a Benders-type algorithm that uses the adaptive oracles converges to an -optimal solution ( ≥ 0) in a finite number of iterations even when only a single subproblem is solved at each iteration. As a consequence, when subproblems are expensive to solve and the set I of subproblems is large, each iteration is much faster than methods that solve every subproblem at each iteration, and we show this can lead to a significant reduction in total solution time. We test our new method on stochastic LP investment planning problems with up to 1.06 × 10 8 variables and 3.11 × 10 8 constraints against a standard Benders algorithm and the Benders algorithm with inexact cuts of [18] (used as a benchmark). Compared to the standard Benders decomposition, the decomposition algorithm of [18] is 1.6, 1.7, and 1.9 times faster at reaching an optimality tolerance of 1.00%, 0.10%, and 0.01%, while our proposed algorithm is 31.9, 28.5, and 15.4 times faster than standard Benders and 19.5, 16.5, and 8.1 times faster than the Benders algorithm of [18] to reach the same .

Paper structure
The remainder of the paper is organized as follows. Section 2 introduces assumptions on problems (1) and (2), needed to apply our adaptive oracles. Section 3 briefly presents a standard formulation of the Benders decomposition algorithm. Section 4 illustrates the intuitions behind the proposed adaptive oracles and derives the associ-ated mathematical formulation. The adaptive oracles are embedded within a Benders decomposition algorithm in Sect. 5. Section 6 tests the proposed Benders with adaptive oracles against a standard Benders algorithm on a stochastic investment planning problem. Finally, conclusions are drawn in Sect. 7.

Assumptions
We consider solving problems of the form of (1). The subproblem SP in (2) is assumed to be always feasible (any possible infeasibility having been dealt with by reformulating the subproblem using infeasibility penalties) and bounded for each decision x that belongs to X and for all c i . Accordingly, we say that the function g(x, c) can be computed for all x ∈ K x and c ∈ K c , where the region K x is obtained by collecting all the possible x i such that x ∈ X , and the region K c collecting all c i for all i ∈ I. Then, we assume that there exist two vectors x and c such that x ≤ x, ∀x ∈ K x , and c ≤ c, ∀c ∈ K c , and that subproblem SP is feasible and bounded at the special point (x, c).
Function g(x, c) is a convex function of x and a concave function of c (e.g., see [3]). In addition, we assume that g(x, c) is a non-increasing function of x, and a non-decreasing function of c. If the property of monotonicity does not hold naturally, problem (1) can be modified so that the rearranged subproblem is a monotonic function of both x and c (see "Appendix A").

Standard benders
This section briefly describes how a Benders algorithm can exploit the block structure of problem (1) allowing it to be solved in a decomposed fashion.

Algorithm
In a standard Benders algorithm applied to problem (1), we iteratively approximate the subproblem cost function through a set of cutting planes. The formulation of the relaxed master problem RMP is given in (3) and the standard Benders decomposition is summarized in Algorithm 1.

Lemma 1 At each iteration, either
Algorithm terminates with an -optimal solution, or the algorithm adds at least one locally-improving exact cut to the reduced master problem.
and given that the right side of (4) is equal to β ( j) i , it follows that the lower bound and the upper bound have converged exactly.

Lemma 2 There exists a finite number of locally-improving exact cutting planes that can be added to the relaxed master problem.
Proof Since each subproblem g(x i , c i ) is an LP, there exist a finite number of faces (of all dimensions from 0, i.e. vertices, edges, …, facets), and each exact cutting plane gives an exact representation of (at least) one of these faces. A new exact cut can be locally-improving if and only if it is associated with a face that has not been exactly represented yet. Then, given that the number |I| of subproblems is finite, it follows that there exists a finite number of locally-improving exact cuts that can be added to the relaxed master problem.
Theorem 1 For any ≥ 0, Algorithm 1 finds an -optimal solution to problem (1) in a finite number of iterations.
Proof Lemma 2 proves that there exists a finite number of locally-improving exact cuts that can be added to the reduced master problem, and Lemma 1 proves that Algorithm 1 adds at least one locally-improving exact cut at each iteration (or it has converged). It follows that Algorithm 1 finds an -optimal solution to problem (1) in a finite number of iterations.

Remark 1
Convergence proofs for Benders decomposition (e.g., see [18]) usually rely on the existence of a finite number of basis matrices for each subproblem. In contrast, Theorem 1 relies on the existence of a finite number of faces and does not require that each exact cut corresponds to a basis matrix.

Adaptive oracles
This section presents the adaptive oracles used within the proposed novel Benderstype decomposition algorithm. Figure 1 illustrates a saddle function g(x, c), convex and non-increasing w.r.t x, and concave and non-decreasing w.r.t. c. This function is used to illustrate the intuition behind the proposed adaptive oracles.

Graphical interpretation of the adaptive oracles
For a subproblem i that is not solved at iteration j, we call two adaptive oracles to retrieve the information needed to perform a Benders-type iteration. The first adaptive oracle provides θ i ) generates a valid cutting plane. The second adaptive oracle yields an upper bound θ , which is then used to compute a valid upper bound on the optimal solution of the relaxed master problem at x ( j) .
Before presenting the mathematical formulation of the two adaptive oracles, we give a graphic and explanatory example using the saddle function g(x, c) shown in  Each red dashed line in Fig. 2a shows the tangent at a blue point (x s , c s ) w.r.t. x (i.e., for fixed c s ). Since g(x, c s ) is convex w.r.t. x, the tangent lies below g(x, c s ). The red squares are the points (θ s , x r , c s ) on the tangent where x = x r . If the gradients are λ s , then θ s = θ s + λ s (x r − x s ). Since g(x r , c) is concave w.r.t. c and all the red square points lie in x = x r plane and below g(x r , c), the convex hull of these points also lies below g(x r , c). Figure 2b illustrates this convex hull as a red shaded area, and the gray shaded areas indicate values of c where a convex combination can not be found. Figure 2c shows that if the special point (x, c) is included among the known points (x s , c s ), the non-decreasing property of g(x, c) w.r.t. c can be exploited to extend the convex hull and obtain a valid lower bound for all c ≥ c. In particular, the upper envelope of the convex hull gives the tightest lower bound on g(x r , c).
The green dashed lines in Fig. 2d are the tangents (with gradient φ s ) at each blue The convex hull of these points lies above g(x, c r ), given that g(x, c r ) is convex w.r.t. x and the green square points lie above g(x, c r ). Figure 2e shows this convex hull as a green shaded area, likewise two gray shaded areas where a convex combination can not be found. If the special point (x, c) is included we extend the convex hull and obtain a valid upper bound for all x ≥ x by exploiting the non-increasing property of g(x, c) w.r.t. x (see Fig. 2f). Here, the lower envelope of the convex hull gives the tightest upper bound on g(x, c r ).
Note that the methodology for obtaining a valid upper bound is the counterpart of the methodology for obtaining a valid lower bound. As an example, one could use the lower bounding oracle to get a valid lower bound on g(x, c) and the same oracle to also obtain a valid upper bound. To do that, one can compute a valid lower bound onĝ(c, x) = −g(x, c) and use it (with a change in sign) as a valid upper bound on g(x, c).

Adaptive oracles
This section gives a mathematical formulation of the intuitions illustrated in Sect. 4.1. Consider the saddle function g(x, c), and suppose we have already found m optimal solutions at {(x s , c s ), s = 1, .., m}. At each point s, we know the true value θ s , a subgradient λ s w.r.t. x, and a subgradient φ s w.r.t. c. We are interested in building a valid cutting plane and obtaining a valid upper bound at a new point (x, c), without solving the associated subproblem.

Adaptive oracle for valid cutting plane
Proof (a) Set the variable μ s associated with (x, c) equal to 1, and all the others to 0. The first constraint of (5) becomes c ≤ c, which is true by definition, and the other constraints also hold.
The first inequality holds since θ s + λ s (x − x s ) is an underestimator of the convex function g(x, c s ) and μ(x, c) ≥ 0, the second inequality holds since the μ(x, c) define a convex combination and g(x, c) is concave w.r.t. c, and the third holds since m s=1 μ s (x, c) c s ≤ c and g(x, c) is non-decreasing w.r.t. c. (c) Setting μ r = 1 gives a feasible solution with objective value θ r + λ r (x r − x r ) = θ r . Since μ r is feasible for A LB m (x r , c r ), it follows that θ (x r , c r ) ≥ θ r = g(x r , c r ). This and b) imply θ(x r , c r ) = g(x r , c r ). (d) Since the weights μ(x, c) are feasible for A LB m (x, c), they are also feasible for A LB m (x, c). Hence, The first equality follows since and the second from the definition of θ(x, c) and λ(x, c).  (x, c). Then,

Adaptive oracle for valid upper bound
Proof Lemma 4 can be proved similarly to Lemma 3 (parts a, b, and c) since obtaining a valid upper bound is the exact counterpart than obtaining a valid lower bound.

Notes on adaptive oracles
Note that every feasible solution μ of A LB m (x, c) gives a valid cutting plane (x, θ, λ). Of these possible cuts, the one corresponding to the optimal solution, i.e., (x, θ, λ), is the tightest at x. If A LB m (x, c) and/or A UB m (x, c) are solved to a feasible but not optimal solution, the generated cutting plane and upper bound are still valid.
A graphical interpretation of θ and θ of A LB m (x, c) is shown in Fig. 2c, where the set (θ, c) is the red continuous curve, and the set (θ, c) is the red shaded area. Figure 2f gives an interpretation of θ and θ of A UB m (x, c), where the set (θ, c) is the green continuous curve, and the set (θ, c) is the green shaded area.

Benders decomposition with adaptive oracles
This section presents the Benders-type algorithm incorporating the inexact information of the adaptive oracles of Sect. 4.

Convergence of Algorithm 2
Lemma 5 At each iteration, either Algorithm 2 finds an -optimal solution, or the algorithm adds at least one locally-improving exact cut to the reduced master problem.
Proof At iteration j Algorithm 2 adds at least one locally-improving exact cut to the reduced master problem (ξ ( j) = 1) or all the subproblems are solved for the current solution (E ( j) = ∅). If none of the exact cuts (x j is locally-improving (ξ ( j) = 0), it follows that and given that the left-and right-hand sides of (7) are equal to θ Proof Lemma 2 proves that there exists a finite number of locally-improving exact cuts that can be added to the reduced master problem, and Lemma 5 proves that Algorithm 2 adds at least one locally-improving exact cut at each iteration (or it has converged). It follows that Algorithm 2 finds an -optimal solution to problem (1) in a finite number of iterations.

Case Study
We test the proposed Benders algorithm with adaptive oracles on power system stochastic investment planning problems. Serial and parallel versions of the algorithms are implemented in Julia 1.4. Two Linux Desktop computers Intel i7 6-core processor clocking at 2.40 GHz and 16 GB of RAM are used for running the code. The serial implementation is run on a single machine, and the parallel implementation uses both machines simultaneously. The optimization models are implemented in JuMP (Julia package) and solved with Gurobi 9.0. The Julia code implementing Algorithms 1 (Stand_Bend) and 2 (Adapt_Bend) for the proposed case study is provided at https://github.com/nimazzi/Stand_and_Adapt_Bend [9].

Investment planning model
We consider a power system investment planning problem with a time horizon of 15 years. The deterministic version of the problem has 3 decision nodes: one refers to decisions to be taken in the first stage, one to decisions in 5 years time, and one to decision in 10 years time. The stochastic version is obtained by modeling different possible scenarios for the future of the system in 5 and 10 years. At each node we also compute the cost of operating the system for the following 5 years for given installed capacity. We consider a construction time of 5 years, so new assets installed in the first stage will only be available in 5 and 10 years, and new capacity installed in 5 years will only be available in 10 years. We model a set P of technologies: 6 thermal units, 3 storage units, and 3 renewable generation units. The cost for operating the system is computed by solving an hourly economic dispatch for 5 years.
We formulate the stochastic investment planning problem as where I is the set of stochastic decision nodes, each associated with a probability π i . The function gives the cost of operating the system over 5 years. The vector of parameters x i is given by where x acc pi is the accumulated capacity of technology p at node i. Parameters ν D i and ν E i are the relative level of energy demand and the yearly CO 2 emission limit, respectively. Note that g(x i , c i ) is non-increasing w.r.t. x acc i and ν E i , and non-decreasing w.r.t. ν D i , so using −ν D i instead satisfies the non-increasing requirement. The vector of uncertain cost coefficients c i is where c nucl i is the uranium fuel price and c co 2 i the CO 2 emission price, and g(x i , c i ) is non-decreasing w.r.t. both c nucl i and c co 2 i . All the remaining parameters, e.g., A, B, and C, are the same for every node i ∈ I. Finally, the function f (x) yields the expected total investment and fixed cost, and it is computed as where the variable x inst pi is the newly installed capacity of technology p at node i. Parameters c inv pi and c f ix pi are the unitary investment and fixed costs of technology p at node i. The accumulated capacity x acc pi at node i is computed as the sum of the historical capacity x hist pi and the newly installed capacity x inst pi at nodes i ancestors to i. Finally, the initial special point (x, c) is set to We consider three possible sources of uncertainty, i.e., ν E i , c co 2 i , and c nucl i . Each uncertain parameter has 3 possible outcomes in 5 years, each of which is linked to 3 additional possible outcomes in 10 years. The result is 9 possible trajectories for each source of uncertainty, all with the same probability. We consider 4 different cases of the investment problem. Case 0 is the deterministic version, where ν E i , c co 2 i , and c nucl i are deterministic parameters (weighted average of the scenarios). Then, case 1 has 1 uncertain parameter (ν E i ), case 2 has 2 uncertain parameters (ν E i and c co 2 i ), and case 3 has 3 uncertain parameters (ν E i , c co 2 i , and c nucl i ). The number of decision nodes and the size of the deterministic equivalent for the 4 versions of the problem is summarized in Table 1. As a benchmark, we try to solve the deterministic equivalent problem for cases 0, 1, 2, and 3 on our Linux Desktop computer. Case 0 is successfully solved in 4 minutes, and case 1 in 58 minutes. The deterministic equivalent of cases 2 and 3 is not solved since the Julia instance is killed due to a memory overload.

Results
We use Algorithms 1 (Stand_Bend) and 2 (Adapt_Bend) to solve the stochastic investment planning problem (8), whose operational subproblems (9) are LP problems with 4.11 × 10 5 constraints and 1.40 × 10 5 variables. All the subproblems are solved with Gurobi barrier method ("Method"=2), while we do not impose a predefined solution method for the RMP and the oracles ("Method"=-1). In the serial implentation of the algorithms we set "Threads"=0 for the RMP, the subproblems, and the oracles, i.e., Gurobi decides the amount of threads to use. In the parallel implementation of the algorithms we impose "Threads"=1 for the subproblems and the oracles.
For Algorithm 2 at each iteration j we choose the set I ex = {i k , k = 1, .., w} of w subproblems that are solved exactly. Each element i k is the (or one of the) i ∈ E The idea is to group in each subset E ( j) k subproblems that are potentially similar to each others, to diversify the exact solutions that are added to the oracles. To select the w subsets of E ( j) we use the kmeans function of the Clustering package with , c co 2 i ) as the input parameters. As a benchmark, we also implement the Benders algorithm with inexact cuts as presented in [18], and we refer to it as Algorithm 3 or Zaker_Bend. The core idea is to solve subproblems up to primal-dual feasible solution with an optimality gap lower than δ ( j) which is large for early iterations and progressively tightened (δ ( j) → 0 for j → ∞) as the algorithm approaches the optimal solution. The dual solution and dual objective are used to build valid cutting planes, and the primal objective is used to build a valid upper bound. To obtain such primal-dual feasible solution, we solve subproblems with barrier method, we disable crossover ("Crossover"=0), and we impose the optimality gap to δ ( j) ("BarConvTol"=δ ( j) ), in accordance with [18]. The optimality gap is set to 1 (maximum value of "BarConvTol") for the first iteration, i.e., δ (1) = 1, then we use the update rule δ ( j) := 1 10 δ ( j−1) suggested by [18]. We set a minimum threshold for δ ( j) to 10 −8 which is the default value for "BarConvTol". Table 2 gives the optimal investment decisions to be taken in the first investment stage for cases 0-3. The solutions are obtained solving (8) with Algorithm 1 up to an 0.01%-optimal solution. Including a more accurate description of the stochastic processes involved in the decision-making process progressively modifies the optimal decisions of the system planner. Using the full stochastic model (case 3) the planner takes considerably different decisions compared to the ones yielded by its deterministic counterpart (case 0). In case 0 the system planner builds 13.7 GW of CCGT, 1.4 GW of diesel, and 19.0 GW of onshore wind. In case 3 the investment in CCGT (11.4 GW)  Note that other technologies (e.g., nuclear and off-shore wind) are chosen in many scenarios of the second investment stage (in 5 years) and that varying amounts of most of the technologies are explored during the iterative solution procedure, even if these are not used in the optimal solution.

Serial implementation
This section discusses the results of the serial implementation of the algorithms, where only one Julia instance is launched. For Algorithm 2 we set w = 1 so one single subproblem is solved exactly at each iteration. Table 3 shows the effort in terms of iterations and computation time (the timer starts after the Julia code has been precompiled) to reach an -optimal solution using Algorithms 1 (Stand_Bend), 2 (Adapt_Bend), and 3 (Zaker_Bend) for case studies 0-3. We report the results for values of the optimality tolerance of 1.00%, 0.10%, and 0.01%. For case 0 the computation efficiencies of Algorithms 1, 2, and 3 are similar. In this small case study Algorithms 1 and 3 solve 2 subproblems each iteration and Algorithm 2 solves 1 subproblem. However, Algorithm 2 needs more iterations to reach the target tolerance. As an example, to obtain a tolerance lower than 0.01% Algorithm 1 requires 26 iterations and Algorithm 2 needs 50. Increasing the size of the problem makes the comparison more interesting. For example, for case 2 Algorithm 1 needs 13 and 24 iterations to reach tolerance of 1.00% and 0.01%, respectively. To reach the same tolerances Algorithm 3 performs 12 and 21 iterations, respectively, and Algorithm 2 performs 67 and 190 iterations. However, an iteration of Algorithms 1 and 3 solves 90 subproblems while Algorithm 2 solves only 1. The results show that Algorithm 1 takes 64 minutes to yield an 0.01%-optimal solution compared to 38 minutes for Algorithm 3, i.e., 1.7 times faster. To reach the same tolerance Algorithm 2 needs less than 8 minutes, i.e., 8.1 and 4.8 times faster than Algorithm 1 and 3, respectively. For case 3 this difference is even larger as Algorithms 1 and 3 solve the subproblem 756 times at each iteration. Accordingly, even if they reach a 1.00% tolerance in only 12 iterations, this requires 4 hours and 15 minutes, and 2 hours and 36 minutes, respectively. On the other hand, Algorithm 2 reaches the same tolerance in 8 minutes, 31.9 times faster than Algorithm 1 and 19.5 times faster than Algorithm 3. If a tighter tolerance is needed the difference of performances progressively reduces but it is still significant for = 0.01%. Indeed, Algorithm 1 requires around 9 hours and 18 minutes, Algorithm 3 takes 4 hours and 53 minutes (1.9 times faster than Algorithm 1), and Algorithm 2 needs 36 minutes (15.4 and 8.1 times faster than Algorithm 1 and 3, respectively). Table 4 shows the (relative) computation time of the main steps of Algorithm 2, i.e., solving the relaxed master problem, solving the subproblems, and solving the adaptive oracles, to reach an -optimal solution for case studies 0-3. We report the results for value of the optimality tolerance of 1.00%, 0.10%, and 0.01%. In cases 0, 1, and 2 almost all the computation time is spent solving subproblems (one each iteration). However, in case 3 more than 10% of the computation time to obtain an 0.01%-optimal solution is spent solving the master problem, 27% solving the adaptive oracles, and only 65% solving subproblems. These results suggest that the proposed rule of solving one subproblem at each iteration may not be the best when the number |I| of subproblems grows significantly and that a selection rule that dynamically decides the number of subproblems to solve at each iteration may then achieve a better balance.

Parallel implementation
This section discusses the results of the parallel implementation of Algorithms 1 and 2, where together with the main Julia instance we also start up a pool of workers via function addprocs of the Distributed package. The RMP is solved in the main Julia instance, while the subproblems and the oracles are solved by the pool of workers via the function pmap. For Algorithm 2 we set w equal to the number of workers in the pool, so one single subproblem is solved exactly by each worker at each iteration. Table 5 compares the time and the number of iterations to reach an 0.01%-optimal solution of case 3 using Algorithms 1 and 2 as a function of the size of the workers pool. Algorithm 1 takes 23 iterations and 10 hours and 57 minutes with one worker in the pool, whereas Algorithm 2 takes 537 iterations and 1 hour, i.e., 10.8 times faster than Algorithm 1. The number of iterations for Algorithm 2 is slightly different from the values shown in Table 3, even though when the pool contains one worker only it should be equivalent to the serial implementation. The only difference from the serial implementation is the number of threads used to solve subproblems and oracles. Changing that value can make Gurobi converge to slightly different solutions (within optimality and feasibility tolerances) and alter the way the Benders algorithm finds an -optimal solution. Increasing the number of workers in the pool slightly changes the relative speed-up obtained with Algorithm 2, mainly due to some load balancing Table 3 Comparative results for Algorithm 1 (Stand_Bend), Algorithm 2 (Adapt_Bend), and Algorithm 3 (Zaker_Bend) (%)   Table 5 Comparative results of the parallel implementation for Algorithm 1 (Stand_Bend) and Algorithm 2 (Adapt_Bend) to solve case 3 up to = 0.01% especially if a large optimality gap is allowed. The largest problem we solve has 756 subproblems, each of which has 4.11 × 10 5 linear constraints and 1.40 × 10 5 linear variables. Our algorithm is 31.9 times faster than the standard Benders algorithm to reach a 1.00% optimal solution and 15.4 times faster if the optimality tolerance is tightened to 0.01%.

Stand_Bend
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.