Refined cut selection for benders decomposition: applied to network capacity expansion problems

In this paper, we present a new perspective on cut generation in the context of Benders decomposition. The approach, which is based on the relation between the alternative polyhedron and the reverse polar set, helps us to improve established cut selection procedures for Benders cuts, like the one suggested by Fischetti et al. (Math Program Ser B 124(1–2):175–182, 2010). Our modified version of that criterion produces cuts which are always supporting and, unless in rare special cases, facet-defining. We discuss our approach in relation to the state of the art in cut generation for Benders decomposition. In particular, we refer to Pareto-optimality and facet-defining cuts and observe that each of these criteria can be matched to a particular subset of parametrizations for our cut generation framework. As a consequence, our framework covers the method to generate facet-defining cuts proposed by Conforti and Wolsey (Math Program Ser A 178:1–20, 2018) as a special case. We conclude the paper with a computational evaluation of the proposed cut selection method. For this, we use different instances of a capacity expansion problem for the european power system.

of m linear inequalities. Such a problem can be written in the following form: The interaction matrix H ∈ R m×n captures the influence of the x-variables on the y-subproblem: For fixed x * ∈ R n , (1.1) reduces to an ordinary linear program with constraints Ay ≤ b − H x * , where A ∈ R m×k , y ∈ R k , and b − H x * ∈ R m .
We are interested in cases where the size of the complete problem (1.1) leads to infeasibly high computation times (or memory demands), but both the problem over S and the problem resulting from fixing x can separately be solved much more efficiently due to their special structures. To deal with such problems, Benders (1962) introduced a method that works by iterating between these two "easier" problems: For a problem of the form (1.1), let the function z : R n → R ∪ {±∞} represent the value of the optimal y-part of the objective function for a fixed vector x: The corresponding epigraph of z is epi(z) = (x, η) ∈ R n × R ∃y ∈ R k : Writing epi S (z) := epi(z) ∩ (S × R), this provides us with an alternative representation of the optimization problem (1.1): min c x + η (x, η) ∈ epi S (z) This representation suggests the following iterative algorithm: Start by finding a solution (x * , η * ) ∈ S × R that minimizes c x + η without any additional constraints (adding a generous lower bound for η to make the problem bounded). If (x * , η * ) ∈ epi(z), then (x * , η * ) ∈ epi S (z) (since x * ∈ S) and the solution is optimal. Otherwise, we add constraints violated by (x * , η * ) but satisfied by all (x , η ) ∈ epi(z) and iterate. This is of course just an ordinary cutting plane algorithm and the crucial question is how to select a separating inequality in each iteration.
The original Benders algorithm uses feasibility cuts (cuts with coefficient 0 for the variable η) and optimality cuts (cuts with non-zero coefficient for the variable η), depending on whether or not the subproblem that results from fixing the x-variables is feasible (see, e. g., Vanderbeck and Wolsey 2010). Fischetti et al. (2010), on the other hand, present a unified perspective that covers both cases: They begin by observing that the subproblem can be seen as a pure feasibility problem, represented by the set This polyhedron will be empty if and only if (x * , η * ) / ∈ epi(z) and any Farkas certificate for emptiness of (1.4) can be used to derive an additional valid inequality. The set of such certificates (up to positive scaling) is called alternative polyhedron. 1 Thus P(x * , η * ) = ∅ if and only if (x * , η * ) ∈ epi(z) and every point (γ , γ 0 ) ∈ P(x * , η * ) induces an inequality γ (b − H x) + γ 0 η ≥ 0 that is valid for epi(z) but violated by (x * , η * ). This characterization is very useful and has been demonstrated empirically to work well in Fischetti et al. (2010). However, it exposes some fundamental issues, which are demonstrated by the following example (cf. Fig. 1).
Example 1.1 Consider the following optimization problem: Note that the constraint −4x − 4y ≤ −14 is redundant and does not support the feasible region. Suppose that we want to decompose the problem into its x-part and its y-part.
As Gleeson and Ryan (1990) showed, each of these points corresponds to a minimal infeasible subsystem of (1.6) with the objective function written in inequality form x + y ≤ 0. Consequently, each vertex yields one of the original inequalities as a cut. This notably includes the redundant inequality −4x − 4y ≤ −14, which does not support the feasible region but is derived from the vertex P 3 in the alternative polyhedron (and furthermore minimizes the linear objective 1).
A cut generated from a point in the alternative polyhedron may thus be very weak, not even supporting the set epi(z). This is true even if we use a vertex of the alternative polyhedron and even if that vertex minimizes a given linear objective such as the vector 1 as suggested in Fischetti et al. (2010).
In the following, we present an improved approach for cut generation in the context of Benders decomposition. Our method can be parametrized by the selection of an objective vector in primal space and produces facet cuts without any additional computational effort for all but a sub-dimensional set of parametrizations. In addition, our method is more robust with respect to the formulation of the problem than the original approach from Fischetti et al. (2010). In particular it always generates supporting cuts, avoiding the problem pointed out in the context of Example 1.1.
Our method is based on the relation between the alternative polyhedron as introduced above, which is commonly used in the context of Benders cut generation, and the reverse polar set, originally introduced by Balas and Ivanescu (1964) in the context of transportation problems.
We show that the alternative polyhedron can be viewed as an extended formulation of the reverse polar set, providing us with a parametrizable method to generate cuts with different well-known desirable properties, most notably facet-defining cuts. As a special case, we obtain an (arguably simpler) alternative proof for the method to generate facet-defining cuts proposed by Conforti and Wolsey (2018), if applied to Benders decomposition. Our work links their approach more directly to previous work on cut selection, within the context of Benders decomposition (e. g., Fischetti et al. (2010)) as well as more generally for separation of convex sets (e. g., Cornuéjols and Lemaréchal (2006)).
Before we proceed by investigating different representations of the set of possible Benders cuts, it is useful to record a general characterization of the set of normal vectors for cuts separating a point from epi(z) as defined in (1.3). In the following, we (1.10) Proof Let h epi(z) (π, π 0 ) := sup π x + π 0 η (x, η) ∈ epi(z) be the support function of epi(z) evaluated at (π, π 0 ). The vector (π, π 0 ) is the normal vector of a (x * , η * )-separating halfspace for epi(z) if and only if By the definition of epi(z) (which is closed and polyhedral) and then by strong LP duality, we obtain (1.13) Note that in order for the equality −γ 0 = π 0 to hold and (1.13) to be feasible (and hence (1.12) to be bounded), we need that π 0 ≤ 0. Thus the optimality of any pair (γ , γ 0 ) for (1.13) is equivalent to to the fulfillment of conditions (1.7) to (1.10).
As one can see from the proof above, any γ satisfying (1.8) to (1.10) is an upper bound for h epi(z) . This means that given a certificate γ to prove that (π, π 0 ) is a normal vector of an (x * , η * )-separating halfspace H ≤ ((π,π 0 ),α) , we immediately obtain a corresponding right hand side α := γ b. Furthermore, the definition of the support function h epi(z) immediately tells us when this right-hand side is actually optimal and the resulting halfspace supports epi(z): Remark 1.1 Let (x * , η * ) ∈ R n × R and let (π, π 0 ) be the normal vector of an (x * , η * )separating halfspace for epi(z). If γ minimizes γ b among all possible certificates in Theorem 1.2, then the halfspace H ≤ ((π,π 0 ),γ b) supports the set epi(z).

Benders cuts from the reverse polar set
While it would be sufficient in the context of Benders decomposition to obtain an arbitrary (x * , η * )-separating halfspace whenever the set in (1.4) is empty, the alternative polyhedron P(x * , η * ) actually completely characterizes the set of all possible normal vectors of such halfspaces: Corollary 2.1 The alternative polyhedron (1.5) completely characterizes all normal vectors of (x * , η * )-separating halfspaces for epi(z). In particular: Observe, however, that in contrast to Remark 1.1, Corollary 2.1 does not guarantee that the cut generated from a point in the alternative polyhedron is supporting (as seen in Example 1.1 not even if that point is a vertex): A given vector (γ , γ 0 ) ∈ P(x * , η * ) might not minimize γ b among all points in P(x * , η * ) which lead to the same cut normal.
Alternatively, as argued by Cornuéjols and Lemaréchal (2006), the reverse polar of a convex set characterizes the set of normals of cuts that separate the origin from the set. The reverse polar was originally introduced in Balas and Ivanescu (1964) and can be defined as follows: Fig. 2 The reverse polar set (epi(z) − (x * , η * )) − and the corresponding polar cone (drawn in a coordinate system with (x * , η * ) as the origin). It can be seen that (epi(z) − (x * , η * )) − is contained in the polar cone pos(epi(z) − (x * , η * )) • (indicated by the black solid lines) but offers a "richer" boundary from which we can choose cut normals. Specifically, for each vertex v of (epi(z) − (x * , η * )) − there exists a facet of epi(z) with normal vector v and vice versa (see Theorem 3.2) Definition 2.1 Let C ⊆ R n be a convex set. Then the reverse polar set C − of C is defined as It is thus a subset of the polar cone pos(C) • := c ∈ R n c x ≤ 0 forall x ∈ C .
We can use Theorem 1.2 and an appropriate positive scaling 2 to obtain the following description of the reverse polar set. 3 Keeping in mind that we want to separate the point (x * , η * ) rather than the origin, we must translate the set such that (x * , η * ) becomes the origin (cf. Fig. 2).
Note furthermore that, as a consequence of Remark 1.1, we can compute for any given normal vector (π, π 0 ) a supporting inequality (if one exists) by solving problems (1.12) or (1.13).
We thus have at our disposal two alternative characterizations of the set of possible normal vectors of (x * , η * )-separating halfspaces: The alternative polyhedron and the reverse polar set. Despite their similarity, subtle differences exist between both representations that affect their usefulness for the generation of Benders cuts.
It should be noted at this point that we are not the first ones to notice the similarity between the approaches of Cornuéjols and Lemaréchal (2006) and Fischetti et al. (2010). Indeed, the work of Cornuéjols and Lemaréchal (2006) is explicitly cited in Fischetti et al. (2010), albeit only in a remark about the possibility to exchange normalization and objective function in optimization problems over the alternative polyhedron (see Corollary 2.4 below).
Before we proceed, we introduce a variant of the alternative polyhedron, the relaxed alternative polyhedron, which is also used in Gleeson and Ryan (1990). We will see that it is equivalent to the original alternative polyhedron for almost all purposes, but can more easily be connected to the reverse polar set: Definition 2.2 Let a problem of the form (1.1) and a point (x * , η * ) ∈ R n ×R be given. The relaxed alternative polyhedron P ≤ (x * , η * ) is defined as To motivate the above definition, observe that optimization problems over the original and the relaxed alternative polyhedron are equivalent, provided that the optimization problem over the relaxed alternative polyhedron has a finite non-zero optimum: Remark 2.1 Let z be defined as in (1.2) and let (x * , η * ) ∈ R n ×R. Let (ω,ω 0 ) ∈ R m ×R be such that max ω γ +ω 0 γ 0 γ, γ 0 ∈ P ≤ (x * , η * ) < 0. Then the sets of optimal solutions forω γ +ω 0 γ 0 over P ≤ (x * , η * ) and P(x * , η * ) are identical. Furthermore, every vertex of P ≤ (x * , η * ) is also a vertex of P(x * , η * ).
The following key theorem now almost becomes a trivial observation. However, to our knowledge, the relation between the alternative polyhedron and the reverse polar set has not been made explicit in a similar fashion before (for a set S, we write Theorem 2.1 Let z be defined as in (1.2) and (x * , η * ) ∈ R n × R. Then One common scenario for Benders decomposition is where the master problem is significantly smaller than the subproblem. In this case Theorem 2.1 implies that the relaxed alternative polyhedron is an extended formulation for the reverse polar set, which in particular is always polynomial in size.
We revisit Example 1.1 to illustrate this observation.

Cut-generating linear programs
One way to select a particular cut normal from the reverse polar set or the alternative polyhedron is by maximizing a linear objective function over these sets. Using Theorem 2.1, we can derive the precise relation between optimization problems over the reverse polar set and the alternative polyhedron. The set epi(z) − from Example 1.1. We can see that the point P 3 , which lead to the non-supporting cut above, is mapped to the interior of the reverse polar set and will hence not appear as an extremal solution Then (π, π 0 ) is an optimal solution to the problem if and only if there exists γ such that H γ = π and (γ , −π 0 ) is an optimal solution to the problem

Furthermore, the objective values of both optimization problems are identical.
Proof Let (π, π 0 ) be an optimal solution to (2.3). By Theorem 2.1, there exists a vector which proves the optimality of (γ , −π 0 ) for (2.4).
Note that the optimization problem stated in (2.4) is technically more general, since there is no reason to limit ourselves to objective functions of the form (2.2) a priori. If we choose a different objective function, we still obtain a valid cut. However, since there may be no objective function (ω, ω 0 ) such that the resulting cut normal is optimal for (2.3), we lose some of the properties associated with optimal solutions from the reverse polar set.
Indeed, this is the approach that Fischetti et al. (2010) take: They use the problem in (2.4) withω m = 0 for all m which correspond to rows of zeros in the interaction matrix H andω m = 1 for all other m, as well asω 0 = 1 (orω 0 = κ for some scaling factor κ > 0). In general, there exists no vector (ω, ω 0 ) such that this choice can be obtained by (2.2).
We now take a closer look at the role of objective functions in the context of Example 1.1:

Example 1.1 (continuing from p. 3)
In the situation of the optimization problem (1.6), remember that the point P 3 actually minimizes the 1-norm over P(0, 0) and is hence the unique result of the (unscaled) selection procedure from Fischetti et al. (2010). On the other hand the transformation from Theorem 2.1 actually maps this point, which lead to a non-supporting cut, to the interior of the reverse polar set. It will therefore never appear as an optimal solution of any linear optimization problem.
In order to obtain a supporting cut, we only have to make sure that the used objective can be written in the form (H ω, −ω 0 ). In our example, if we choose the objective function over the alternative polyhedron from the set then the point P 3 ∈ P(0, 0) is never optimal.
To illustrate this further, we solve the optimization problem using (ω, ω 0 ) := (1, 1) as an example, which implies that Assuming that we begin with 0 as an initial lower bound for both x and η to make the problem bounded, we thus obtain (x 1 , η 1 ) := (0, 0) as an initial tentative solution.
The cut-generating problem is the (linearly scaled) first inequality from (1.6).
Adding this inequality to the master problem, we obtain (x 2 , η 2 ) := ( 5 /2, 0) as the next tentative solution. Now, from which we obtain the next cut: the (linearly scaled) second inequality from (1.6).
We have thus solved the optimization problem in two iterations, whereas the selection procedure from Fischetti et al. (2010) would have selected the point P 3 in the first iteration, leading to a cut that corresponds to the the redundant inequality in the original problem. It would thus require at least one additional iteration to solve the problem.
On the other hand, we will see that the fact that our approach yields two facetdefining cuts is not a coincidence: The following corollary shows that the generated cuts are always at least supporting and we will see in Sect. 3.2 that the generated cuts are actually almost always facet-defining.
One interesting difference between the alternative polyhedron and the reverse polar set, which can be verified using the above example, is their different behavior with respect to algebraic operations on the set of inequalities: If, for instance, we scale one of the inequalities by a positive factor, the reverse polar set remains unchanged (just as the feasible region defined by the set of inequalities). On the other hand, the alternative polyhedron does change and, as a consequence, might yield a different optimal solution with respect to a given objective. In fact, by scaling the inequalities of (1.6) appropriately, we can actually avoid that the point P 3 is optimal for the approach from Fischetti et al. (2010). If an objective function is used which does not take this scaling into account, such as the vector of zeros and ones proposed by Fischetti et al. (2010), then the selected cut might change depending on the scaling factor. Even selecting a suitable manual scaling factor κ as mentioned above cannot fix this, since it cannot scale individual constraints against each other.
A critical requirement for Corollary 2.3 is thatω γ +ω 0 γ 0 < 0. Cornuéjols and Lemaréchal (2006, Theorem 2.3) establish some criteria on the objective function for which optimization problems over the reverse polar set are bounded. We have simplified the notation for our purposes and rephrased the relevant parts of the theorem according to our terminology. (2006)

Theorem 2.2 (Cornuéjols and Lemaréchal
Note in particular that the last part of the above statement implies z * < 0 whenever (ω, ω 0 ) ∈ pos(epi(z) − (x * , η * )) \ {0}, which provides us with a large variety of objective functions for which ω γ + ω 0 γ 0 < 0 holds in the optimal solution. By Corollaries 2.2 and 2.3, this means that the cut which results from maximizing these objectives over the reverse polar set is guaranteed to be supporting.

Alternative representations
We now derive an alternative representation of the optimization problem in (2.4), which will turn out to be much more useful in practice. For instance, the structure of the resulting problem will be very similar to the original subproblem, which makes it easy to use existing solution algorithms for the subproblem in a cut-generating program. Cornuéjols and Lemaréchal (2006, Theorem 4.2) prove that linear optimization problems over the reverse polar set can be evaluated in terms of the support function of the original set (in our case epi(z) − (x * , η * )). This can also be applied to the alternative polyhedron, as mentioned (without proof) by Fischetti et al. (2010). The following lemma makes a statement similar to Cornuéjols and Lemaréchal (2006, Theorem 4.2), which is applicable to a wider range of settings. For the proof, we refer to Stursberg (2019, Theorem 3.20). (2006)) Let K ⊆ R n be a cone and c 1 , c 2 ∈ R n . Consider the optimization problems
This lemma allows us to solve optimization problems of the form (2.4) by instead resorting to the optimization problem Let (ω,ω 0 ) ∈ R m ×R and let (γ * , γ * 0 ) denote an optimal solution with value ξ > 0 for (2.7) to (2.9). Applying Lemma 2.1 with c 1 : is an optimal solution with value − 1 ξ for (2.4). The structural similarity of (2.7) to (2.9) and the original problem becomes more apparent when we consider the dual problem: and (λ, x, y) be an optimal solution for the problem min λ (2.10) with λ > 0. Denote the corresponding dual solution by (γ , γ 0 ). Then 1 λ (γ , γ 0 ) is an optimal solution for Note that, together with our observations in the context of the definition of the alternative polyhedron (1.5), this means in particular that (a) Whenever (2.10) to (2.12) has objective value 0, then the alternative polyhedron is empty and (x * , η * ) ∈ epi(z), and (b) Whenever (2.10) to (2.12) is feasible with (finite) objective value greater than 0, then (2.3) and (2.4) have objective values strictly less than 0, which means that the requirements for Remark 2.1 and Corollary 2.3 are satisfied.

Remark 2.2
If (ω,ω 0 ) := (H ω, −ω 0 ), then the optimization problem (2.10) to (2.12) becomes min λ (2.13) The difference between the formulations from Corollary 2.4 and Remark 2.2 lies in how they relax the original problem: In (2.10) to (2.12), the relaxation works on the level of individual inequalities by relaxing their right-hand sides, whereas in (2.13) to (2.15) it works on the level of the master solution (x * , η * ), allowing us to choose a possibly more advantageous value for the vector x itself.

Cut selection
As we have seen in the previous section, Benders decomposition can be viewed as an instance of a classical cutting plane algorithm (Theorem 1.2). The Benders subproblem takes the role of the separation problem and the alternative polyhedron that is commonly used to select a Benders cut is a higher-dimensional representation of the reverse polar set, which characterizes all possible cut normals (Theorem 2.1).
Finally, Corollary 2.4 and Remark 2.2 show that selecting a cut normal by a linear objective over the reverse polar set or the alternative polyhedron can be interpreted as two different relaxations (2.10) to (2.12) and (2.13) to (2.15) of the original Benders feasibility subproblem (1.4). The former relaxation provides more flexibility with respect to the choice of parameters and coincides with the latter for a particular selection of the objective function.
Cut selection is one of four major areas of algorithmic improvements for Benders decomposition that recent work has focused on [see, e. g., the very extensive literature review by Rahmaniani et al. (2017)]. As a consequence, a number of selection criteria for Benders cuts have previously been explicitly proposed in the literature. Many of them also arise naturally from our discussion and analysis of the Benders decomposition algorithm above. We will first present these criteria in the way they typically appear in the literature and then link them to the reverse polar set and/or the alternative polyhedron.

Minimal infeasible subsystems
The work of Fischetti et al. (2010) is based on the premise that "one is interested in detecting a 'minimal source of infeasibility'" whenever the feasibility subproblem (1.4) is empty. They hence suggest to generate Benders cuts based on Farkas certificates that correspond to minimal infeasible subsystems (MIS) of (1.4). Fischetti et al. (2010) empirically study the performance of MIS-cuts on a set of multi-commodity network design instances. Their results suggest that MIS-based cut selection outperforms the standard implementation of Benders decomposition by a factor of at least 2-3. Furthermore, this advantage increases substantially when focusing on harder instances (e.g. those which could not be solved by the standard implementation within 10 hours).
We define this criterion as follows: Definition 3.1 Let z be defined as in (1.2) and let (π, π 0 ) ∈ R n × R. We say that (π, π 0 ) satisfies the MIS criterion if there exists (γ , γ 0 ) ≥ 0 such that (a) π = H γ, π 0 = −γ 0 (b) The inequalities of (1.4) corresponding to the rows of H which are multiplied by the non-zero components of (γ , γ 0 ) in the equations in a) form a minimal infeasible subsystem of (1.4).
Note that we have defined the MIS criterion as a property of a normal vector, rather than a property of a cut. The reason for this is that the cut normal is the only relevant choice to make, given that an optimal right-hand side for each cut normal is provided by Corollary 2.3. Accordingly, we will call any cut with a normal vector that satisfies the MIS criterion a MIS-cut. Gleeson and Ryan (1990) show that the set of (γ , γ 0 ) that appear in the above definition is exactly (up to homogeneity) the set of vertices of the alternative polyhedron: Theorem 3.1 (Gleeson and Ryan (1990)) Let (x * , η * ) ∈ R n × R. For each vertex v of the (relaxed) alternative polyhedron (1.5), the set of constraints corresponding to the non-zero entries of v forms a minimal infeasible subsystem of (1.

Corollary 3.1 Let z be defined as in
Note that the reverse direction of the last sentence is generally not true, i.e. there might be minimal infeasible subsystems that do not correspond to vertices of the reverse polar set. As an example to illustrate this, as well as the above definition overall, consider Example 1.1 and specifically Fig. 3: Each of the vectorsP 1 ,P 2 ,P 3 satisfies the MIS criterion, since they can be related by the equations π = H γ, π 0 = −γ 0 to vertices of the alternative polyhedron, which can be identified with minimal infeasible subsystems of (1.4).

Facet-defining cuts
In cutting plane algorithms for polyhedra, facet-defining cuts are commonly considered to be very useful since they form the smallest family of inequalities which completely describe (the convex hull of) the feasible solutions. A cutting-plane algorithm that can separate (distinct) facet inequalities in each iteration is not necessarily computationally efficient, but at least it is automatically guaranteed to terminate after a finite number of iterations. Also in practical applications, facet cuts have turned out to be extremely useful, e.g. in the context of branch-and-cut algorithms for integer programs such as the Traveling Salesman Problem. This is why the description of facet-defining inequalities has been a large and very active area of research for decades (see Balas 1975;Nemhauser and Wolsey 1988;Cook et al. 1998;Korte and Vygen 2008 and, as mentioned before, Conforti and Wolsey 2018).
Note that, in deviation from the common definition of a facet-defining cut, the above definition requires that the halfspace supports at least dim(C) affinely independent points. In other words, in the case where epi(z) is not full-dimensional, we also allow that epi(z) is entirely contained in the hyperplane which represents the boundary of H ≤ ((π,π 0 ),α) . In this situation, the comparison of different cut normals is inherently difficult: Since there is no clear way to tell if a cut supporting a facet of epi(z) or one fully containing the set is the stronger cut, the Facet criterion captures arguably the strongest statement about a cut in relation to epi(z) that we can make in general: In no case would we want to select a cut that supports neither a facet nor fully contains the set epi(z).
On the other hand, by this definition, the "trivial" cut normal (0, 0) would be facetdefining (since the hyperplane H ((0,0),0) contains all of R n × R and thus also epi(z)). Since this is not very useful, we have to exclude this choice explicitly and thus choose (π, π 0 ) ∈ R n × R \ {0} in Definition 3.2.
For an example to illustrate the above definition, we refer to Fig. 4 where it is discussed together with the property of Pareto-optimality, which will be defined later.
The following result was proven by Cornuéjols and Lemaréchal (2006, Theorem 6.2), containing a minor error in the case where the set P is subdimensional. 4 We therefore re-state a corrected version of the important parts below, a corresponding proof can be found in Stursberg (2019, Theorem 3.30). (2006)) Let P ⊆ R n be a polyhedron, x * / ∈ P and r := dim(P) − 1, x * ∈ aff(P) dim(P), x * / ∈ aff(P).
Most notably, for the case where P is full-dimensional (i. e., dim(P) = n) the above theorem implies that there exists an x * -separating halfspace with normal vector π supporting a facet of P if and only if there exists a vertex π * of (P − x * ) − and λ ≥ 0 such that λπ = π * .
In this case, every cut generated from a vertex of the reverse polar set defines a facet of epi(z). If an explicit H-representation of the reverse polar set is available, we can thus easily obtain a facet-defining cut, e. g., by linear programming.
Note that since P ≤ (x * , η * ) is line-free (i. e. its lineality space, the maximal linear subspace it contains, is {0}), Theorem 2.1 implies that for every vertex of the reverse polar set there exists a vertex of the relaxed alternative polyhedron (and hence of the original alternative polyhedron) that leads to the same cut normal. In other words, if the normal of an x * -separating halfspace satisfies the Facet criterion, then it also satisfies the MIS criterion.
On the other hand, Theorem 2.1 is not sufficient to guarantee that selecting a vertex of the alternative polyhedron yields a facet-defining cut: As Example 1.1 shows, a vertex of P ≤ (x * , η * ), is not necessarily mapped to a vertex of the reverse polar set under the transformation from Theorem 2.1. This exposes a useful hierarchy of subsets of the alternative polyhedron according to the properties of the cut normals which they yield: Selecting a vertex of the alternative polyhedron already guarantees that the resulting cut normal satisfies the MIS criterion and the points that lead to cut normals satisfying the Facet criterion constitute a subset of these vertices. The approach of selecting MIS-cuts may thus be viewed as a heuristic method to find Facet-cuts.
Although cuts satisfying the MIS criterion in general do not satisfy the Facet criterion, we can obtain some information on when this is the case in the situation of Corollary 2.2, i. e. if the objective function (ω,ω 0 ) used to select the cut via problem (2.4) satisfies (ω,ω 0 ) = (H ω, −ω 0 ) for some valid objective (ω, ω 0 ) for problem (2.3).
In this case it turns out that we actually obtain a Facet-cut for all objectives (ω, ω 0 ) except those from a lower-dimensional subspace. More precisely, we can prove the following characterization of the relationship between vertices of the alternative polyhedron and cut normals satisfying the Facet criterion. Like Conforti and Wolsey (2018, Proposition 6), the following theorem provides a method to generate facetdefining cuts using a single linear program. In contrast to Conforti and Wolsey (2018), however, our theorem uses a linear program over the (relaxed) alternative polyhedron and thus creates a link to well-established cut selection methods in the context of Benders decomposition, such as that proposed by Fischetti et al. (2010): (1.2), (x * , η * ) ∈ (R n × R) \ epi(z), and (ω, ω 0 ) ∈ cl(pos(epi(z)−(x * , η * ))). Then, there exists an optimal vertex (γ * , γ * 0 ) ∈ P ≤ (x * , η * ) with respect to the objective function (H ω, −ω 0 ) such that the resulting cut normal (H γ * , −γ * 0 ) is (x * , η * )-separating and satisfies the Facet criterion.
We can summarize our results as follows: While any Facet-cut is also an MIS-cut, the reverse is not always true. However, if we optimize the objective (H ω, −ω 0 ) over the alternative polyhedron, then there exists only a subdimensional set of choices for the vector (ω, ω 0 ) for which the resulting cut might not satisfy the Facet criterion (those, for which the optimum over the reverse polar set is non-unique).
This suggests that these cases should be "rare" in practice, especially if we choose (or perturb) (ω, ω 0 ) randomly from some full-dimensional set. This argument, why a cut obtained for a generic vector (ω, ω 0 ) can be expected to be facet-defining, is identical to the concept of "almost surely" finding facet-defining cuts proposed by Conforti and Wolsey (2018).
Looking back at Remark 2.2, this similarity should not come as a surprise: With (ω, ω 0 ) = (x − x * ,η − η * ) for a point (x,η) ∈ relint(epi(z)), the resulting cutgenerating LP is almost identical. In fact, the point (x,η) in this case takes the role of the point that the origin is relocated into in the approach from Conforti and Wolsey (2018). Observe, however, that while Conforti and Wolsey (2018) require that point to lie in the relative interior of epi(z), we can actually expect a cut satisfying the Facet criterion from any (ω, ω 0 ) for which the optimal objective over the reverse polar is strictly negative. By Theorem 2.2, one sufficient (but not necessary) criterion for this is to choose (ω, ω 0 ) = (x − x * ,η − η * ) for an arbitrary point (x,η) ∈ relint(epi(z)).

Pareto-optimality
The first systematic work on the general selection of Benders cuts to our knowledge was undertaken by Magnanti and Wong (1981). The paper, which has proven very influential and still being referred to regularly, focuses on the property of Paretooptimality. It can intuitively be described as follows: A cut is Pareto-optimal if there is no other cut valid for epi(z) which is clearly superior, which dominates the first cut.
In this setting, any cut that does not support epi(z) is obviously dominated. Between supporting cuts, there is no general criterion for domination. We can, however, compare cuts where the cut normal (π, π 0 ) satisfies π 0 = 0 (this is also the case covered by Magnanti and Wong (1981)): Definition 3.3 For a problem of the form (1.1), we say that an inequality π x +π 0 η ≤ α with π 0 < 0 is dominated by another inequality π x + π 0 η ≤ α if π 0 < 0 and with strict inequality for at least one x ∈ S. If π 0 < 0 and π x + π 0 η ≤ α is not dominated by any valid inequality for epi(z), then we call it Pareto-optimal.
Remember that the set S contains all points x ∈ R n that are feasible for an optimization problem of the form (1.1) if we ignore the linear constraints H x + Ay ≤ b. Fig. 4 The dotted cut supports a facet of epi(z) and it supports epi S (z), but it is still not Pareto-optimal. The solid cut supports a facet of epi S (z) and is hence Pareto-optimal. The dashed cut is Pareto-optimal even though it does not support a facet of epi(z) (or epi S (z)) By the above definition, a cut dominates another cut if the minimum value of η that it enforces is at least as good for all x ∈ S and strictly better for at least one x ∈ S (see Fig. 4).
Analogously to the previous criteria, we define the Pareto criterion for a cut normal: Definition 3.4 For a problem of the form (1.1) with z as defined as in (1.2), let (π, π 0 ) ∈ R n ×R. We say that (π, π 0 ) satisfies the Pareto criterion if there exists a scalar α ∈ R such that the inequality π x + π 0 η ≤ α is Pareto-optimal.
This criterion is very reasonable: If a cut is not Pareto-optimal, then there exists a different cut which is also valid for epi(z), but leads to a strictly tighter approximation. We would hence prefer to generate a stronger, Pareto-optimal cut right away.
The following theorem provides us with a characterization of Pareto-optimal cuts. It is based on the idea of Magnanti and Wong (1981, Theorem 1), which is formulated under the assumption that the subproblem is always feasible (which implies that π 0 < 0 for any cut normal (π, π 0 )). While the original theorem is only concerned with sufficiency, we extend the result in a natural way to obtain a criterion that gives a complete characterization of Pareto-optimal cuts. We use the following separation lemma: Lemma 3.1 (Rockafellar (1970)) Let C ⊆ R n be a non-empty convex set and K ⊆ R n a non-empty polyhedron such that relint(C) ∩ K = ∅. Then, there exists a hyperplane separating C and K which does not contain C.
Finally, we claim that the inequality (π , π 0 )(x, η) ≤ α is dominated by the inequality (π , π 0 )(x, η) ≤ α : For all x ∈ S, it holds that Since the last inequality is strict for x * ∈ S, this proves the statement.
For the case where S is convex, the previous theorem immediately implies the following statement: Corollary 3.2 Let S be convex. Then, π x + π 0 η ≤ α is Pareto-optimal if and only if H ≤ ((π,π 0 ),α) supports a face F of epi S (z) such that F ⊂ relbd(S) × R. Magnanti and Wong (1981) also propose an algorithm that computes a Paretooptimal cut by solving the cut-generating problem twice. While their algorithm is defined for the original Benders optimality cuts, it can be adapted to work with other cut selection criteria, as well. Sherali and Lunday (2013) present a method based on multiobjective optimization to obtain a cut that satisfies a weaker version of Paretooptimality by solving only a single instance of the cut-generating LP. Papadakos (2008) notes that, given a point in the relative interior of conv(S), a Pareto-optimal cut can be generated using a single run of the cut-generating problem. Also, under certain conditions on the problem, other points not in the relative interior allow this, as well. However, the approach suggested by the authors adds Pareto-optimal cuts independently from master-or subproblem solutions, together with subproblem-generated cuts, which are generally not Pareto-optimal. This means that the Pareto-optimal cuts which are added may not even cut off the current tentative solution. The upcoming Theorem 3.5 will lead to an approach that reconciles both objectives, generating cuts that are both Pareto-optimal and cut off the current tentative solution.
We use a result by Cornuéjols and Lemaréchal (2006) on the set of points exposed by a cut normal (π, π 0 ) to derive a method that always obtains a Pareto-optimal cut. The following lemma has been slightly generalized and rewritten to match our setting and notation, but it follows the general idea of Cornuéjols and Lemaréchal (2006, Theorem 3.4).
The results from this section are summarized in Table 1.

Computational results
To validate the theoretical results presented in this paper, we have compared our refined cut selection approach to that presented by Fischetti et al. (2010) on a set of instances of the Capacity Expansion Problem for electrical power systems.
To map the range of potential instances we use a total of 14 test instances, spanning from a small, closely connected model of the Bavarian power system to a large, realistic model of the (rather sparsely connected) European power system consisting of 102 demand regions with 587 (aggregated) generation units and 195 existing and potential transmission lines (Schaber et al. 2012), that were investigated in the context of two joint research projects.
For both models, we optimize capacity expansion and hourly dispatch based on demand data and data for the availability of renewable energy sources in hourly resolution for a period of one year. Due to their inherent structure, where subproblems for individual timesteps are loosely coupled by capacity expansion decisions and storage constraints, this type of problem is generally well-suited for Benders decomposition.
To give an indication of the size of the resulting optimization problems, instances 1-12 each consist of ≈ 800,000 variables, ≈ 1,200,000 constraints and ≈ 3,700,000 non-zero entries in the constraint matrix. Instances 13 and 14 consist of ≈ 1,500,000 variables, ≈ 2,600,000 constraints and ≈ 7,600,000 non-zero entries in the constraint matrix.
To demonstrate the benefits of our refined cut selection approach, we continuously update the weight vector (ω,ω 0 ) so that in each iteration it satisfies the conditions of Theorem 3.5 and thus the resulting cut meets the most advantageous of the criteria developed in this paper (corresponding to the last row in Table 1). We call this approach adaptive cuts.
As a benchmark, we use a version of the approach proposed by Fischetti et al. (2010) that strengthens the resulting cuts without additional computational effort using the information represented by the matrix H (thereby in particular making sure that the obtained cut is always supporting). We denote this approach by the term static cuts.
As mentioned above, the computational effort for the cut-generating LP in both approaches is almost identical: The only difference is that in the "adaptive cuts" approach, we use a different objective function, which can be obtained from the result of the previous iteration via a simple matrix-vector multiplication.
We have implemented both approaches in C++ using Gurobi 7.5. For our computations, we used ten CPU cores running at 2.4 GHz with 45 GB of main memory.
To solve master problem and subproblems, we use the dual simplex algorithm (i. e., the version of the simplex algorithm that maintains dual feasibility while pivoting between bases). In each iteration, we warm-start all problems using the optimal basis from the previous iteration and solve all subproblems in parallel on the available cores. Beyond this, we run the solution algorithm with default settings, i. e., we did not undertake any computational optimizations with respect to either the algorithm itself or the solution method of master and subproblems. In particular, also the specific update mechanism use for the adaptive cuts approach should be seen as an illustrative example rather than a performance-optimized prescription. As a performance measure, we use the time to reach different thresholds for the relative duality gap, i. e., the gap between upper and lower bound relative to the optimal objective value. This takes into account that in practical applications, one is often satisfied with a solution that is guaranteed to be within a certain tolerance of the optimal solution (e. g., 0.1 %), rather than a strictly optimal solution. Our results show that for any desired gap, the adaptive cuts selection approach performs substantially better than the static cuts approach. The results can be inspected in detail in Table 2: The adaptive cuts approach generally reaches any given optimality threshold by a factor of 2-3 faster than the static cuts approach. While the general result is very consistent across all instances, differences in the magnitude of the advantage exist: The benefit of the adaptive cuts approach tends to be larger in more difficult instances which overall take longer to solve.
To further visualize our results, Fig. 5 compares the two approaches with respect to the progression of the average duality gap over all instances based on the (larger) European power system model. The plot confirms our observation from Table 2: The adaptive cuts approach reduces the duality gap by a factor of 2-3 faster than the static cuts approach.

Outlook
We conclude with an outlook on interesting research questions raised by the results presented in this paper.
Our theoretical results clearly point to a computational advantage from improving the parametrization of cut-generating LPs in Benders decomposition and we have demonstrated this advantage in the context of Capacity Expansion Problems for electrical power systems. This holds despite the fact that we have not performed any "fine tuning" of parameters beyond what is immediately implied by our theoretical results. A broader computational study of such optimizations (for which we point out some ideas below), as well as of general performance across other types of problem instances would certainly be worthwile.
In a generic implementation of Benders decomposition, feasible solutions are used primarily to decide when the algorithm has converged sufficiently close to the optimal solution. By Theorem 2.2, however, any such solution can furthermore be used to derive a subproblem objective which satisfies the prerequisites of both Corollary 2.3 and Theorem 3.3. Together with Theorem 3.5, they thus result in the generation of cuts which are always supporting, almost always support a facet and (if π 0 < 0) are also pareto-optimal. In our computational experiments, these cuts proved to be very useful in improving the performance of a Benders decomposition algorithm. Since information from a feasible solution can thus be used within the cut generation, it makes sense to investigate more closely the possibilities how such a solution can be obtained during the algorithm. This is likely to be very problem-specific, but some general ideas could be: -How is the information from feasible solutions computed in different iterations best aggregated? Does it make sense to use e. g. a stabilization approach or a convex combination with some other choices for (ω, ω 0 ), e. g. from previous iterations? This corresponds to the method used by Papadakos (2008)  Furthermore, if a feasible solution is not available as the basis for a subproblem objective, the cut-generating problem might be unbounded/infeasible. On the other hand, the approach from Fischetti et al. (2010) withω = 1 yields a cut-generating LP that is always feasible, but the resulting cut might be weaker. How can both approaches be combined in a best-possible way? For instance, is choosingω = H ω + ε · 1 as the relaxation term and letting ε go to zero a good choice?
Finally, our approach provides a clear geometric interpretation of the interaction between parametrization of the cut-generating LP and the resulting cut normals. How can this be used to leverage a-priori knowledge about the problem (or information obtained through a fast preprocessing algorithm) to improve the selection of a subproblem objective (ω, ω 0 ) from a set of cuts satisfying the same quality criteria (e.g. that are all facet-defining)?