A loose Benders decomposition algorithm for approximating two-stage mixed-integer recourse models

We propose a new class of convex approximations for two-stage mixed-integer recourse models, the so-called generalized alpha-approximations. The advantage of these convex approximations over existing ones is that they are more suitable for efﬁ-cient computations. Indeed, we construct a loose Benders decomposition algorithm that solves large problem instances in reasonable time. To guarantee the performance of the resulting solution, we derive corresponding error bounds that depend on the total variations of the probability density functions of the random variables in the model. The error bounds converge to zero if these total variations converge to zero. We empirically assess our solution method on several test instances, including the SIZES and SSLP instances from SIPLIB. We show that our method ﬁnds near-optimal solutions if the variability of the random parameters in the model is large. Moreover, our method outperforms existing methods in terms of computation time, especially for large problem instances.


Introduction
Consider the two-stage mixed-integer recourse model with random right-hand side B Niels van der Laan n.van.der.laan@rug.nl where the recourse function Q is defined as This model represents a two-stage decision problem under uncertainty. In the first stage, a decision x has to be made here-and-now, subject to deterministic constraints Ax = b and random goal constraints T x = ω. Here, ω is a continuous or discrete random vector whose probability distribution is known. In the second stage, the realization of ω becomes known and any infeasibilities with respect to T x = ω have to be repaired. This is modelled by the second-stage problem v(ω, x) := min y qy : The objective in this two-stage recourse model is to minimize the sum of immediate costs cx and expected second-stage costs Frequently, integrality restrictions are imposed on the first-and second-stage decisions. That is, X and Y are of the form X = Z p 1 Such restrictions arise naturally when modelling real-life problems, for example to model on/off decisions or batch size restrictions. The resulting model is called a mixedinteger recourse (MIR) model. Such models have many practical applications in for example energy, telecommunication, production planning, and environmental control, see e.g. [10,37].
While MIR models are highly relevant in practice, they are notoriously difficult to solve. The reason is that Q is in general non-convex if integrality restrictions are imposed on the second-stage decision variables y, see [20]. Therefore, standard techniques for convex optimization cannot be used to solve these models. In contrast, if Y = R n 2 + , then Q is convex and efficient solution methods are available, most notably the L-shaped method in [32] and variants thereof.
Because of the non-convexity of Q, traditional solution methods for MIR models typically combine ideas from deterministic mixed-integer programming and stochastic continuous programming, see e.g. [4,8,9,15,17,19,29,31,38], and the survey papers [24,28,30]. We, however, use a fundamentally different approach to deal with the nonconvex recourse function Q. Instead of solving the original MIR model in (1), we solve an approximating problem in which Q is replaced by a convex approximationQ of Q. This convex approximationQ does not have to be a lower bound of Q. The resulting approximating optimization problem is given bŷ BecauseQ is convex, we can use efficient techniques from convex optimization to solve the optimization problem in (4). Indeed, this optimization problem is convex if all first-stage variables x ∈ X are continuous, and it is a MIP with a convex objective if some of these first-stage variables are integer. Thus, compared to traditional solution methods, we are able to solve similar-sized problems much faster, and solve larger problem instances. In fact, we demonstrate this in our numerical experiments on problem instances available from [29] and SIPLIB [2], and on randomly generated problem instances. Obviously, the optimal solutionx of the approximating problem in (4) is not necessarily optimal for the original MIR model in (1). That is why we guarantee the quality of the approximating solutionx, by deriving an error bound on This error bound directly gives us an upper bound on the optimality gap ofx: see [25].
Convex approximations and corresponding error bounds have been derived for many different classes of models. The idea to use convex approximationsQ for the non-convex mixed-integer recourse function Q dates back to [33], in which van der Vlerk proposes to use α-approximations for the special case of simple integer recourse (SIR) models. These α-approximations are obtained by perturbing the probability distribution of the random vector ω. Klein Haneveld et al. [14] derive an error bound for the α-approximations that depends on the total variations of the marginal density functions of the random variables in the SIR model.
More convex approximations have been described for more general classes of problems. For example, in [34,35] van der Vlerk extends the α-approximations to a class of MIR models with a single recourse constraint, and to integer recourse models with a totally unimodular (TU) recourse matrix, respectively. Furthermore, Romeijnders et al. [26] derive an error bound for the latter approximation, and they derive a tighter error bound for the shifted LP-relaxation approximation for the same class of problems. The quality of the convex approximations for TU integer recourse models is assessed empirically in [22], and it turns out that they perform well if the variability of the random parameters in the model is large enough.
Romeijnders et al. [23] generalize the shifted LP-relaxation to general two-stage MIR models, and they derive a corresponding asymptotic error bound. This error bound converges to zero if the variability of the random parameters in the model increases. Romeijnders and van der Laan [21] derive similar error bounds for convex approximations that fit into a specific framework. In this framework, convex approximations are defined using pseudo-valid cutting planes for the second-stage feasible regions. Their idea is to only use cutting planes which are affine in the first-stage decision variables, so that the corresponding expected value function is convex. In general, the approximations are not exact, since pseudo-valid cutting planes cut away feasible solutions and are allowed to be overly conservative. Nevertheless, a similar error bound as for the shifted LP-relaxation has been derived if the pseudo-valid cutting planes are tight. That is, if they are exact on an entire grid of first-stage solutions.
Both the shifted LP-relaxation of [23] and the cutting plane framework of [21], however, cannot be applied directly to efficiently solve MIR models in general. This is because the shifted LP-relaxation is very difficult to compute in general, as discussed in [23], and furthermore, the asymptotic error bound in the cutting plane framework of [21] only applies for tight pseudo-valid cutting planes, which are only available in special cases, e.g. for SIR models. That is why we propose an alternative class of convex approximations for general two-stage MIR models, the so-called generalized α-approximations. They are derived by exploiting properties of Gomory relaxations [11] of the second-stage mixed-integer programming problems.
Contrary to the shifted LP-relaxation, the generalized α-approximations, denoted byQ α , can be solved efficiently. In fact, we develop a so-called loose Benders decomposition algorithm to solve the approximating model in (4) withQ =Q α . Our Benders decomposition is called loose, because we derive optimality cuts forQ α which are in general not tight at the current solution. While these loose optimality cuts are in general not sufficient to find the optimal solutionx of the approximating model in (4), we prove that they are tight enough in the sense that a similar performance guarantee applies to the solution obtained by the loose Benders decomposition as tox.
Summarizing, our main contributions are as follows.
-We propose a new class of convex approximations for general two-stage MIR models, which are based on Gomory relaxations of the second-stage problems. These generalized α-approximations can be solved efficiently, and a similar error bound as for the shifted LP-relaxation applies to the generalized α-approximations. -We derive a loose Benders decomposition algorithm to (approximately) solve the approximating model with the generalized α-approximations. This is the first efficient algorithm for solving non-trivial convex approximations of general twostage MIR models. -We prove that the solution obtained by the loose Benders decomposition algorithm has a similar performance guarantee as the exact solution to the generalized αapproximations. -We carry out extensive numerical experiments on 41 test instances from the literature and on 240 randomly generated instances, and show that using our loose Benders decomposition algorithm we obtain good solutions within reasonable time, also for large problem instances, in particular when the variability of the random parameters in the model is large.
The remainder of this paper is organized as follows. In Sect. 2, we discuss the shifted LP-relaxation of [23] and its corresponding error bound. In Sect. 3, we present the generalized α-approximations and an efficient algorithm for solving the corresponding approximating problem. Section 4 contains the proof of the performance guarantee for the loose Benders decomposition algorithm. In Sect. 5, we report on numerical experiments to evaluate the performance of our algorithm. Finally, we conclude in Sect. 6.
Throughout, we make the following assumptions. Assumptions (A2)-(A4) guarantee that Q(x) is finite for all x ∈ X such that Ax = b.
(A1) The first-stage feasible region X := {x ∈ X : Ax = b} is bounded. (A2) The recourse is relatively complete: for all ω ∈ R m and x ∈ X , there exists a y ∈ Y such that W y = ω − T x, so that v(ω, x) < ∞.
(A3) The recourse is sufficiently expensive: The recourse matrix W is integer.

Existing convex approximations of mixed-integer recourse functions
In this section, we review the shifted LP-relaxation approximation of [23] and its corresponding error bound. First, however, we review results on asymptotic periodicity in mixed-integer linear programming in Sect. 2.1. We do so since these results are not only used to derive the shifted LP-relaxation in Sect. 2.2, but also to derive the generalized α-approximations in Sect. 3.1.

Asymptotic periodicity in mixed-integer programming
In order to derive convex approximations of the MIR function Q we analyze the value function v of the second-stage problem, defined as In particular, LP-duality implies that the LP-relaxation v LP (ω, x) of v(ω, x) is polyhedral in the right-hand side vector ω − T x: where λ k , k = 1 . . . , K , are the extreme points of the dual feasible region {λ : λW ≤ q}. Romeijnders et al. [23] derive a similar characterization of v(ω, x) in terms of linear and periodic functions by exploiting so-called Gomory relaxations. We briefly discuss these Gomory relaxations, before stating the characterization of v(ω, x) in Lemma 1. The Gomory relaxation of v(ω, x) is defined for any dual feasible basis matrix of v LP (ω, x). Let B denote such a matrix and let N be such that W ≡ B N , meaning equality up to a permutation of the columns. Let y B and y N denote the second-stage variables corresponding to the columns in B and N , respectively, and q B and q N their corresponding cost parameters. The Gomory relaxation v B is obtained by relaxing the non-negativity constraints of the basic variables y B . Romeijnders et al. [23] derive the following expression for v B : if ω − T x ∈ Λ := {t : B −1 t ≥ 0}, and if the distance of ω − T x to the boundary of Λ is sufficiently large, see Definition 1. We say that v(ω, x) is asymptotically periodic, since ψ B is a B-periodic function, i.e. ψ B (s + Bl) = ψ B (s) for every s ∈ R m and l ∈ Z m , see [23].
where W is an integer matrix, and v(ω, x) is finite for all ω ∈ R m and x ∈ R n . Then, there exist dual feasible basis matrices B k of v LP , k = 1, . . . , K , closed convex polyhedral cones Λ k := {t ∈ R m : (B k ) −1 t ≥ 0}, distances d k , B k -periodic functions ψ k , and constants w k such that we have the following: Lemma 1 (iii) shows that if ω − T x ∈ Λ k (d k ) for some k = 1, . . . , K , then v(ω, x) is equal to the sum of the LP-relaxation v LP (ω, x) and ψ k (ω − T x). Hence, ψ k (ω − T x) can be interpreted as the additional costs resulting from the integrality restrictions on the decision variables y. For a discussion on how to obtain d k or how to represent Λ k (d k ) using a system of linear inequalities we refer to [23].

The shifted LP-relaxation approximation
Lemma 1 shows why the second-stage value function is not convex in x. On regions of its domain it is the sum of a linear function q B k (B k ) −1 (ω − T x) and a periodic function ψ k (ω − T x). Clearly the periodic part is causing v to be non-convex. That is why the shifted LP-relaxation is obtained by replacing this periodic part ψ k (ω − T x) by a constant Γ k for every k = 1, . . . , K , with Γ k defined as where p k = | det B k |. The K constants Γ k can be interpreted as the averages of the periodic functions ψ k . The shifted LP-relaxation approximation is obtained by taking the pointwise maximum over all dual feasible basis matrices B k , k = 1, . . . , K .

Definition 2 Define the shifted LP-relaxation approximationQ of the MIR function
where B k , k = 1, . . . , K , are the dual feasible basis matrices of Lemma 1, and Γ k is defined in (7).
Romeijnders et al. [23] derive a total variation error bound on the approximation error ||Q −Q|| ∞ of the shifted LP-relaxationQ. This error bound is expressed in terms of the total variations of the one-dimensional conditional probability density functions (pdf) of the random vector ω.
Definition 3 Let f : R → R be a real-valued function, and let I ⊂ R be an interval. Let Π(I ) denote the set of all finite ordered sets P = {x 1 , . . . , x N +1 } with x 1 < · · · < x N +1 in I . Then, the total variation of f on I , denoted by |Δ| f (I ), is defined as We write |Δ| f := |Δ| f (R).

Definition 4
For every i = 1, . . . , m and x −i ∈ R m−1 , define the i-th conditional density function f i (·|x −i ) of the m-dimensional joint pdf f as In general, the error bound in Theorem 1 may be large, in particular since the constant C > 0 may be large. Nevertheless, the theorem shows that if the total variations of the one-dimensional conditional pdf are small, thenQ is a good approximation of Q. For example, if the components ω i of ω follow independent normal distributions, with mean μ i and variance σ 2 i , for i = 1, . . . , m, then the error bound in (8) [21,Example 2] for details. Observe that the error bound goes to zero if σ i → ∞ for all i = 1, . . . , m. Thus, the error of using the shifted LP-relaxation approximation decreases if the variability of the random parameters in the model increases.

Generalized˛-approximations
To derive the generalized α-approximationsQ α , we first derive a convex approximationv α of the second-stage value function v defined in (5), and we defineQ α ( Similar as for the shifted LP-relaxationṽ, we use Lemma 1, i.e. for However, instead of replacing ψ k (ω − T x) by its average Γ k , as is done for the shifted LP-relaxation, we replace The difference compared to the shifted LP-relaxation seems small: the constants Γ k are replaced by ψ k (ω − α). From a computational point of view, however, this difference is significant. This is because the constants Γ k are the averages over ψ k , and in general need to be obtained by computing a multi-dimensional integral of a mixed-integer value function. For a fixed ω and α, however, the value of ψ k (ω − α) is obtained by solving a single mixed-integer programming problem of the same size as the second-stage problem. In fact, we need to solve the Gomory relaxation discussed in Sect. 2.1, which can be done in polynomial time if all second-stage variables are integer [11].
. . , K , are the dual feasible basis matrices of Lemma 1.

Remark 1
The generalized α-approximations are a generalization of the α-approximations defined for TU integer recourse models [26], which arise if y ∈ Z p + and if the second-stage constraints are of the form W y ≥ ω − T x, where W is a TU matrix. Indeed, for these models, we have ψ k (s) = λ k ( s − s), see [23], and thuŝ which is the expression for the α-approximations for TU integer recourse models in [26].
It turns out that we can derive a similar error bound for the generalized αapproximations as for the shifted LP-relaxation.
Theorem 2 There exists a constant C > 0, such that for every α ∈ R m and for every continuous random vector ω with joint pdf f ∈ H m , Theorem 2 states that the approximation error ofQ α goes to zero as the total variations E ω −i |Δ| f i (·|ω −i ) , i = 1, . . . , m, all go to zero. This error bound is independent of α, i.e.Q α is a good approximation of the true MIR function Q for any α ∈ R m .
An interesting difference between the generalized α-approximations and the shifted LP-relaxation of Sect. 2.2 is that the approximating value functionv α is not convex in ω for fixed x ∈ R n 1 . Indeed,v α is only convex in x for every fixed ω, but this is sufficient to guarantee that the generalized α-approximationQ α is convex. In contrast, the value functionṽ of the shifted LP-relaxation is convex in both ω and x. We illustrate these properties in Example 1. In this example, we also illustrate the difference between the generalized α-approximations and the pseudo-valid cutting plane approximation of [21].
We use this example, since it also appears in [23] and [21], allowing us to compare the generalized α-approximations to the shifted LP-relaxation and a pseudo-valid cutting plane approximation.
The LP-relaxation of v has two dual feasible basis matrices B 1 = [−1] and B 2 = [1]. Thus, K = 2, and straightforward computations yield λ 1 = −2, λ 2 = 1, ψ 1 ≡ 0, and for every s ∈ R, In contrast, the value functionṽ of the shifted LP-relaxation equals as a function of x and ω, respectively. They illustrate thatv α is convex in x for fixed ω, but not convex in ω for fixed x, respectively. Moreover, Fig. 1b illustrates a key difference betweenṽ(ω, x) andv α (ω, x). Whereasṽ(ω, x) is obtained by replacing the periodic part . This is true in general for tight pseudo-valid cutting plane approximations: there always existd k

Benders decomposition for the generalized˛-approximations
We solve the approximating problem using an SAA ofQ α . Given a sample {ω 1 , . . . , ω S } of size S from the distribution of ω, the SAAQ S α ofQ α is defined aŝ and the corresponding SAA problem aŝ SinceQ S α is convex, we can solve the SAA problem using Benders decomposition [6], i.e. using an L-shaped algorithm [32]. In iteration τ of this algorithm, we solve (11) withQ S α replaced by a convex polyhedral outer approximationQ τ out ≤Q S α . This problem is called the master problem and its optimal solution x τ is referred to as the current solution at iteration τ . If , then the current solution x τ is optimal for the original SAA problem in (11). If not, then we strengthen the outer approximation using an optimality cut forQ S α of the formQ The outer approximationQ τ out in iteration τ is the pointwise maximum over all optimality cuts derived in previous The challenge for the generalized α-approximations, however, is to compute tight optimality cuts ofQ S α . Sincê such tight optimality cuts can be obtained by identifying for each scenario s the maximizing index k τ s at x τ , defined as However, this is computationally too expensive, since we need to compute ψ k (ω s −α) for all k = 1, . . . , K , and K grows exponentially in the size of the second-stage problem. Indeed, K is at least as large as the number of dual vertices of the feasible region of the LP-relaxation v LP of v, of which there are exponentially many.

Loose Benders decomposition for the generalized˛-approximations
To overcome this computational challenge we propose to use approximate indicesk τ These indices are computationally tractable since they correspond to the optimal basis matrix index of the LP-relaxation v LP (ω s , x τ ), defined in (5). Hence, they can be obtained by solving a single LP. Moreover, if the values of ψ k (ω s − α) are equal for all k = 1, . . . , K , then the indicesk τ s are optimal in (13), i.e.k τ s = k τ s . In general however, the indices k s are suboptimal in (13), leading to the following definition.
Definition 6 Let x τ be given and letk τ s , s = 1, . . . , S, be as in (14). We define the loose optimality cut forQ S α at x τ aŝ We use these loose optimality cuts in our loose Benders decomposition algorithm, LBDA(α). In this algorithm the outer approximationQ τ out ofQ S α is defined using our loose optimality cuts. The algorithm terminates with tolerance level Such a lower bound L exists because of Assumptions (A1)-(A4).

11:
Denote the optimal basis matrix index byk τ s . 12: Solve The solutionx α of LBDA(α) is not necessarily ε-optimal for the SAA problem in (11), since LBDA(α) uses the loose optimality cuts of Definition 6. If the optimality cuts were tight, thenQ τ +1 out (x τ ) =Q S α (x τ ) for every iteration τ , and thus the algorithm would terminate ifQ τ out (x τ ) ≥Q S α (x τ ) − ε. We, however, show that our loose optimality cuts are asymptotically tight at x τ : the differenceQ τ +1 out (x τ )−Q S α (x τ ) converges to zero if the total variations of the one-dimensional conditional pdf of the random vector ω go to zero and as S → ∞, see Proposition 1. In this case, we are able to prove that the LBDA(α) solutionx α is near-optimal; see Theorem 3 below for a bound on the optimality gap cx α + Q(x α ) − η * . This performance guarantee is independent of α, and thus it applies if we take e.g. α = 0. We further explore selection of α in Sect. 5. The proof of the performance guarantee is postponed to Sect. 4.2.

Theorem 3 Consider the two-stage mixed-integer recourse model
Letx α denote the solution by LBDA(α) with tolerance ε and sample size S. Then, there exists a constant C > 0 such that for every continuous random vector ω with joint pdf Theorem 3 implies that the optimality gap ofx α converges to the prespecified tolerance ε as the total variations of the underlying one-dimensional conditional pdf go to zero.

Implementation details of LBDA(˛)
LBDA(α) can be implemented efficiently if the input size of the second-stage problem v(ω, x) is moderate. During each iteration τ , we have to solve the LPrelaxation v LP (ω s , x τ ) and the Gomory relaxation v Bk τ s (ω s − α, 0) for each scenario s = 1, . . . , S, in order to generate a loose optimality cut. If the input size of v(ω, x) is not too large, then v LP (ω s , x τ ) and v Bk τ s (ω s − α, 0) can be solved in reasonable time using standard LP and MIP solvers, respectively. Moreover, the master problem can be solved efficiently using a standard LP solver, or MIP solver if some of the firststage decision variables are integer. Improved implementations of LBDA(α) using a multicut approach [7] and regularization techniques [27] are possible. Furthermore, the subproblems v Bk τ s (ω s − α, 0), s = 1, . . . , S can be solved in parallel, and it may be beneficial to solve these subproblems inexactly in the first phase of the algorithm.
In Sect. 5, we exploit that LBDA(α) can be run multiple times, using different values of α. The performance guarantee of LBDA(α) is independent of α and thus applies to every candidate solution that we obtain in this way. Moreover, we use inand out-of-sample evaluation to select the best candidate solution. In our numerical experiments, we investigate several schemes for selecting the values of α. Running LBDA(α) multiple times can be done very efficiently by using parallelization and a common warm start. That is, we first apply the L-shaped algorithm of [32] to the LP-relaxation of the original problem (1), in which the integer restrictions on the second-stage variables y are relaxed. Next, we run LBDA(α) for each value of α, and we keep the optimality cuts generated for the LP-relaxation Q LP of Q, which are also valid forQ α , for any value of α.

Convergence of sampling and loose optimality cuts
The performance guarantee of LBDA(α) does not follow directly from our error bound for the generalized α-approximations. The reason is that LBDA(α) uses sampling and the loose optimality cuts of Definition 6 to solve the corresponding approximating problem. We consider these aspects in Sects. 4.1.1 and 4.1.2, respectively. In particular, we prove consistency of the SAA and asymptotic tightness of our loose optimality cuts.

Consistency of the sample average approximation
Intuitively, the SAA becomes better as the sample size S increases. Indeed, we show in Lemma 2 that the SAAQ S α ofQ α converges uniformly toQ α on X w.p. 1 as S → ∞.
the result follows directly by combining Theorem 2 and Lemma 2.
Corollary 1 implies a similar error bound on the difference between the optimal values η * andη S α of the original MIR problem (1) and the SAA problem (11), respectively, see Corollary 2.

Corollary 2
Consider the optimal values η * andη S α of the MIR problem (1) and the SAA problem (11). Then, Proof Since the optimal solution x * of (1) is feasible but not necessarily optimal in (11), we havê The result follows directly from Corollary 1.

Asymptotic tightness of loose optimality cuts
If we derive a loose optimality cutQ S ) may be positive, since the cut is not necessarily tight at x τ . However, we derive a bound on this gap by considering the index function Note thatk(ω s , x τ ) =k τ s , i.e. the index functionk(ω, x) is used to derive our loose optimality cuts. Based on this index function, we define the lower bounding function L S α ofQ S α , given by and we note that L S α (x τ ) is the value of the loose optimality cut at x τ , i.e. L S α (x τ ) = β τ +1 x τ + δ τ +1 . The function L S α is a lower bound ofQ S α since its corresponding value function, defined as is a lower bound ofv α (ω, x) for all ω ∈ Ω and x ∈ X . Proposition 1 contains a uniform total variation bound on the difference between L S α andQ S α . In order to derive this result, we first analyze the difference between ν α andv α . It turns out that ν α equalsv α on large parts of the domain, see Lemma 3. (16) andv α defined asv

Proposition 1
Consider the SAA of the generalized α-approximationQ S α and its lower bound L S α , defined in (15). There exists a constant C > 0 such that for every continuous random vector ω with joint pdf f ∈ H m , We derive an upper bound on Δ(ω, x), independent of x. We then apply the strong law of large numbers (SLLN) to obtain the desired result. If we define M = ∪ K k=1 (σ k + Λ k ), where σ k and Λ k , k = 1, . . . , K , are the vectors and closed convex cones from Lemma 3, then where R is the upper bound onv α (ω, x) − ν α (ω, x) from Lemma 3. Moreover, we can derive a bound on P[ω − T x ∈ M]. Unfortunately, the random variable ξ(ω, x) depends on x. Therefore, we cannot apply the SLLN to if X is infinite. To resolve this, we use that X is bounded. Let D denote the diameter of T X , i.e., ||T x − T x || ≤ D for all x, x ∈ X . Define M ⊂ M as M := K k=1 (σ k + Λ k )(D). Fix an arbitraryx ∈ X . Note that for all x ∈ X , We obtain Note thatξ(ω) only depends on a fixed x * ∈ X and is independent of x. By the SLLN, w.p. 1 as S → ∞. By [23, Lemma 3.9], R m \ M can be covered by finitely many hyperslices, that is, where the hyperslices H j are defined as for some a T j and δ j . By [23,Theorem 4.6], there exists a constant β > 0 such that The result now follows from Proposition 1 shows that our loose optimality cuts are asymptotically tight, since for every iteration τ in LBDA(α) the loose optimality cut forQ S α at x τ is tight for L S α at x τ . Hence, if the underlying total variations are small enough and if S is large enough, then the loose optimality cut is nearly tight forQ S α at x τ .

Error bound on the optimality gap of LBDA(˛)
We are now ready to prove the performance guarantee of LBDA(α) in Theorem 3. Intuitively, we are able to derive this bound since (i) our loose optimality cuts are asymptotically tight, and thusx α is near-optimal for the SAA problem withQ S α , (ii) the SAAQ S α converges uniformly toQ α , and (iii) Theorem 2 contains a uniform error on the difference betweenQ α and Q.

Proof of Theorem 3
Letx α denote the solution returned by LBDA(α). Sincex α := x τ is the current solution in the final iteration τ of the algorithm, it follows thatx α is a minimizer of Moreover, sinceQ τ out ≤Q S α , it follows that cx α +Q τ out (x α ) ≤η S α , and thus, by rearranging terms and addingQ S α on both sides, The right-hand side of (18) represents an upper bound on the optimality gap ofx α in the SAA problem (11). The termination criterion of LBDA(α) guarantees that this upper bound is not too large. Indeed, at termination it holds that and thus the upper bound in (18) reduces to In the end, however, we are not interested in the optimality gap ofx α in the SAA problem (11) but in the optimality gap ofx α in the original MIR problem (1). Adding and subtracting bothQ S α (x α ) andη S α , and using (19), yields Applying Corollaries 1 and 2, and Proposition 1, we conclude that there exists a constant C > 0 such that The performance guarantee for LBDA(α) in Theorem 3 is a worst-case bound. For many problem instances, the actual performance may be much better. In Sect. 5, we assess the performance of LBDA(α) empirically on a wide range of test instances.

Numerical experiments
We test the performance of LBDA(α) on problem instances from the literature and on randomly generated instances, see Sects. 5.2 and 5.3, respectively. In particular, we consider (variations of) an investment planning problem in [29], and two classes of problem instances available from SIPLIB [2], namely the SIZES problem [13] and the stochastic server location problem (SSLP) [18]. First, however, we describe the set-up of our numerical experiments in Sect. 5.1.

Set-up of numerical experiments
In our numerical experiments, we compare LBDA(α) to several benchmark methods, in terms of costs, relative optimality gaps, and computations times. Since the performance of LBDA(α) depends on α, we investigate four different approaches to select α.
First, we take α equal to the zero vector. Second, we take α = α * := T x * , where x * is the optimal solution of the original problem. Obviously, for large problem instances x * is unknown, however, we expect that α * is a good choice of α since the generalized α-approximations are obtained by replacing T x by α in the Gomory relaxations, and thusQ α * is a good approximation of Q near the true optimal solution x * . We test this for the smaller problem instances for which x * is known.
Since α * , however, typically cannot be computed for larger problem instances, we also propose an iterative scheme, in which we first obtainx α 0 by running LBDA(α 0 ), where α 0 is the zero vector. Next, we run LBDA(α 1 ), where α 1 := Tx α 0 . We extend this scheme to 100 iterations by recursively defining α k+1 := Tx α k , k = 1, . . . , 100. We then select the best value of α in terms of expected costs, denoted by α # . Finally, we apply LBDA(α) multiple times using 100 different values of α, drawn from an multivariate uniform distribution on [0, 100] m , and we denote the best value of α in terms of the expected costs by α + .
In order to compare these approaches, note that the expected costs cx + Q(x) of a candidate solution x can be computed exactly if the random vector ω has a finite number of realizations, as is the case for the SIZES, SSLP, and investment planning problems that we consider. Therefore, if the optimal value η * is known, then the relative optimality gap ρ(x), defined as can be computed exactly, and otherwise bounds on ρ(x) can be computed. In contrast, for our randomly generated instances, we assume that ω is continuously distributed. For these instances, we use the multiple replications procedure (MRP) [16] with Latin hypercube sampling [5] to obtain 95% confidence upper bounds on ρ(x). Moreover, we compare the performance of LBDA(α) to a range of benchmark solutions, using out-of-sample estimation of cx + Q(x), with a sample size of 10 5 , which guarantees that the standard errors of our results are sufficiently small. The benchmark and LBDA(α) solutions are computed using a sample of size S = 1000.
The first benchmark solutionx S is obtained by solving the deterministic equivalent formulation (DEF) of the corresponding SAA of the original problem (1). The DEF is a large-scale MIP, which, typically, cannot be solved in reasonable time by standard MIP solvers. Hence we also solve the DEF using a smaller sample size S = 100, resulting in the second benchmark solutionx S .
We obtain three additional benchmark solutions by solving the generalized αapproximations exactly for α = 0, α = α * and α = α + , that is, we find the optimal solution x * α of the approximating problem (4) withQ =Q S α . We do so by solving the approximation second-stage problems max k=1,...,K by enumeration over k = 1, . . . , K . For this reason, x * α can only be computed in reasonable time for small problem instances.
Finally, we consider two trivial benchmark solutions, which we expect to outperform significantly. First, we relax the integer restrictions on the second-stage variables in the SAA of the original problem (1), resulting in the benchmark solution x LP . Second, we solve the Jensen approximation, which replaces the distribution of ω by a degenerate distribution at μ = E ω [ω], and denote the optimal solution by x μ .
We run our experiments on a single Intel Xeon E5 2680v3 core @2.5GHz with Gurobi 7.0.2. To ensure a fair comparison between solutions, we use common random numbers where possible and we limit the computation time of each algorithm to two hours.

The SIZES problem
We first consider all instances of the SIZES test problem suite [13] from SIPLIB. These instances have mixed-binary variables in both stages, and differ in the number of scenarios, namely 3, 5, and 10. The DEF of the largest instance has 341 constraints and 825 variables, of which 110 are binary. We refer to [2] for further details. In Table 1, we report the outcomes of LBDA(α) and solving the DEF. We observe from Table 1 that LBDA(α) performs very well for all choices of α. Indeed, on every instance, the optimality gaps of all LBDA(α) solutions are below 0.5%, and below 0.2% for α = α + . Moreover, LBDA(α) runs very fast for all instances: for a single value of α, the computation time of LBDA(α) is always below one second.
Another observation is that LBDA(α # ) and LBDA(α + ) consistently outperform LBDA(0), at the expense of additional computation time. Nevertheless, the computation times of LBDA(α) for α # and α + are still moderate, and they scale much better to larger instances than solving the DEF. In particular, the time taken to solve the DEF grows exponentially as the sample size S increases, whereas the computation times of LBDA(α) are approximately linear in S. Finally, LBDA(α) performs very well if α = α * . However, since α * is not known in practice, it is useful to observe that LBDA(α + ) achieves similar performance.

The stochastic server location problem
The SSLP instances are more challenging in terms of input size than the SIZES instances. Indeed, the DEF of the largest instance has over 1,000,000 binary decision variables and 120,000 constraints. Their first-and second-stage problems are pure binary and mixed-binary, respectively, and ω follows a discrete distribution. A full problem description can be found in [18], in which the instances are solved using the D 2 algorithm of [31]. See also [1] and [12] for more recent computational experiments on these test instances using other exact approaches. In Table 2 we report the best known running time for each SSLP instance over all these exact approaches, along with the outcomes of LBDA(α).

Table 2
The stochastic server location problem Instance Computation time (s) (optimality gap) Exact approaches LBDA(α) See [12] Strikingly, LBDA(α) was able to solve all instances to optimality for α = α + and α = α * . Moreover, LBDA(0) and LBDA(α # ) solved all instances except SSLP_15_45_5, on which both achieved an optimality gap of 0.45%. In terms of computation time, LBDA(0) is clearly preferred to LBDA(α # ) and LBDA(α + ), while achieving similar results. Finally, although directly comparing LBDA(α) to the other approaches is not completely fair since the algorithms were run on different machines, it is clear that LBDA(α) is generally faster than exact approaches. Indeed, LBDA(0) solved all instances in under one minute, and eight out of ten instances were solved in less than ten seconds, whereas the fastest exact approach required at least one minute for six out of ten instances, and over one hour for the largest instance.

An investment planning problem
We consider the following problem by Schultz et al. [29],  [19]. For each of the resulting 28 instances, Table 3 shows the results of LBDA(α) and solving the DEF. Note that if Gurobi could not solve an instance within two hours, then we report the gaps tox * , the best solution that Gurobi was able to find.
Overall, LBDA(α) performs well on the instances in Table 3. In particular, for α = α # and α = α + , LBDA(α) achieves gaps that are below 2% on 22 and 23 out of 28 instances, respectively. In general, LBDA(α) achieves better results if S is larger. For example, if Y = {0, 1} 4 and T = H , then the gaps are strictly decreasing in S. This is in line with the performance guarantee of LBDA(α) in Theorem 3: if S is larger, then the distributions of ω 1 and ω 2 more closely resemble a continuous uniform distribution on [5,15], which has small total variation, i.e. the error bound in Theorem 3 is small.   4 and T = I . For these instances, α = α # and α = α + outperform the other choices of α. However, similar as for the SIZES instances in Sect. 5.2.1, there is a tradeoff between performance and computation times, since LBDA(α + ) and LBDA(α # ) require 100 LBDA(α) runs, which is computationally more demanding than LBDA(0). Nevertheless, on every instance, the computation times of LBDA(α) for α ∈ {α # , α + } are below 20 minutes, and below 3 minutes if S ≤ 1681.

Randomly generated test instances
We generate random MIR problems of the form (1), with X = R n 1 + and In addition, we assume that the components of the random vector ω ∈ R m follow independent normal distributions with mean 10 and standard deviation σ ∈ {0.1, 0.5, 1, 2, 4, 10}. The parameters c, q, T , and W are fixed, and their elements are drawn from discrete uniform distributions with supports contained in [1,5], [5,10], [1,6], and [1,6], respectively. The reason that we consider multiple values of the standard deviation σ is that Theorem 3 implies that LBDA(α) performs better as σ increases. This is because the total variations of the one-dimensional conditional pdf are small if σ is large, and thus the error bound on the optimality gap achieved by LBDA(α) is also small.
To prevent noise in the outcomes of our experiments, we compute the average optimality gaps, costs, and computation times over 20 randomly generated test instances for each value of σ . We consider test instances of two different sizes, namely n 1 = 10, p = 5, m = 5 (small), and n 1 = 100, p = 40, m = 20 (large). Tables 4 and 5 display the results for the small and large versions, respectively.
From these results, we observe that LBDA(α) clearly outperforms the sampling solutions in terms of computation time and scalability to larger problem instances. In particular, we observe that the computation time of LBDA(α) is of the same order of magnitude as that of x LP , while it performs significantly better in terms of optimality gaps and out-of-sample estimated expected costs. Undeniably, our results indicate that LBDA(α) can be implemented very efficiently and that it can handle large MIR problem instances.
In line with the performance guarantee in Theorem 3, LBDA(α) performs better for larger values of σ . For example, on the small instances, the optimality gaps achieved by LBDA(0) are strictly decreasing in σ , and for σ = 10.0, LBDA(α + ) outperforms the sampling solutionx S . A similar observation is true for the optimal solution of the generalized α-approximationsQ α , as we would expect based on the error bound forQ α in Theorem 2. Observe, however, that even for large values of σ , the optimality gaps reported in Table 4 for LBDA(α) are relatively large, i.e. around 3-4%, and that the optimality gaps achieved by e.g. LBDA(α + ) are not strictly decreasing in σ . The reason is that the actual optimality gaps are likely much smaller than the upper bounds Table 4 Randomly generated test instances (n 1 = 10, p Computation time (s) (optimality gap) Benchmarks  3 17/20 instances solved in time 4 19/20 instances solved in time  Table 4. This is because they are computed using the MRP, which relies on solving multiple SAAs of the original problem. Since, however, Gurobi has difficulties solving the DEFs of these SAAs in reasonable time, especially for large values of σ , the bounds obtained using the MRP are typically not sharp.
Interestingly, x LP also performs better as σ increases. An explanation is that our error bound for the generalized α-approximation implies that the MIR function Q becomes closer to a convex function as σ increases. Thus, since the LP-relaxation of Q is a convex lower bound of Q, its approximation error is expected to become smaller as σ increases. Note however, that unlike the generalized α-approximations, the approximation error of the LP-relaxation does not go to zero. Indeed, based on our results, LBDA(α) is clearly preferred to x LP , since LBDA(α) consistently outperforms x LP , at the expense of very little additional computation time.
Furthermore, the results in Tables 4 and 5 indicate that α = Tx S is a good choice for LBDA(α). Indeed, if α = Tx S , then LBDA(α) and x * α perform similar tox S . For example, on the small instances, they achieve optimality gaps that are on average within 0.2% and 1.1% ofx S , respectively. However, sincex S is difficult to compute in practice, the use of LBDA(Tx S ) is limited, whereas LBDA(α + ) and LBDA(α # ) can be applied directly. Similar as for the instances in Sects. 5.2.1 and 5.2.3, they achieve significantly better results than LBDA(0), at the expense of higher computation times. In particular, on the large instances, LBDA(α # ) performs 2 to 5 times as well as LBDA(0), and on the small instances, LBDA(α + ) achieves optimality gaps that are within 0.6% ofx S for σ ≥ 0.5. While the computation times of LBDA(α) increase for α = α + and α = α # compared to α = 0, they remain manageable: even the large instances are solved within 26 minutes.
Finally, observe from Table 4 that LBDA(α) generally achieves better or similar performance as x * α . In other words, the fact that LBDA(α) uses loose optimality cuts to solve the generalized α-approximations has no negative effect on the solution quality.

Conclusion
We consider two-stage mixed-integer recourse models with random right-hand side. Due to non-convexity of the recourse function, such models are extremely difficult to solve. We develop a tractable approximating model by using convex approximations of the recourse function. In particular, we propose a new class of convex approximations, the so-called generalized α-approximations, and we derive a corresponding error bound on the difference between these approximations and the true recourse function. In addition, we show that this error bound is small if the variability of the random parameters in the model is large. More precisely, the error bound for the generalized α-approximations goes to zero as the total variations of the one-dimensional conditional probability density functions of the random right-hand side vector in the model go to zero.
The advantage of the generalized α-approximations over existing convex approximations is that it can be solved efficiently. In fact, we describe a loose Benders decomposition algorithm, LBDA(α), which efficiently solves the corresponding approximating model. The quality of the candidate solutionx α generated by LBDA(α) in the original model is guaranteed by Theorem 3, which states an upper bound on the optimality gap ofx α . This performance guarantee is similar to the error bound we prove for the generalized α-approximations. Indeed, we show that the optimality gap ofx α is small if the variability of the random parameters in the model is large.
In addition to this theoretical guarantee on the solution quality, we assess LBDA(α) empirically on a range of test instances. In particular, we consider the SIZES and SSLP instances from SIPLIB, an investment planning problem by [29], and randomly generated instances. We find that LBDA(α) performs well in terms of computation times, scalability to larger problem instances, and solution quality. In particular, LBDA(α) is able to solve larger instances than traditional sampling techniques and its computation times scale more favourably in the input size of the instances. In terms of solution quality, LBDA(α) solves the SIZES and SSLP instances to near optimality and generally performs very well on the investment planning instances. Moreover, on the randomly generated instances, LBDA(α) performs similar to traditional sampling techniques and achieves small optimality gaps if the variability of the random parameters in the model is medium to large.
One avenue for future research is to derive sharper theoretical error bounds for the generalized α-approximations. While Theorem 3 provides conditions under which our solution method performs well, the quantitative error bound cannot be computed, as it depends on an unknown and potentially large constant C. A sharp tractable error bound would be an improvement over our current results. Another avenue is the extension of our solution method to more general mixed-integer recourse models, for example by allowing for randomness in the second-stage cost coefficients q, technology matrix T , or recourse matrix W .
In order to prove (ii), we use a similar argument as in [23,Lemma 3.6] and [21,Proposition 2]. It suffices to show that there exists a constant R such that |v α (ω, x) − v LP (ω, x)| ≤ R . We can take R = max k w k , where w k are the upper bounds on ψ k from Lemma 1, k = 1, . . . , K .

Proof of Lemma 2
Our line of proof is based on [3]. For any ν > 0, consider a finite set X ν such that for all x ∈ X , there exists an x ∈ X ν such that ||x − x || ≤ ν. Such a set X ν exist due to Assumption 1. Let x ∈ X be given and let x ∈ X ν be such that ||x − x || ≤ ν. Note that The first and third term on the right-hand side of (20) can be bounded by noting that bothQ α andQ S α are Lipschitz continuous. Denote Lipschitz constants ofQ α andQ S α by L 1 and L 2 , respectively. We obtain The first term (L 1 + L 2 )ν can be made arbitrarily small by letting ν → 0. The result follows, because for fixed ν, the second term sup x ∈X ν Q α (x ) −Q S α (x ) goes to zero w.p. 1 as S → ∞. To see this, fix any x ∈ X ν , and consider the random variable ξ := max k=1,...,K {λ k (ω s − T x ) + ψ k (ω s − α)}.

Proof of Lemma 3
It follows from the definitions ofv α and ν α thatv α ≥ ν α . Moreover, we can take R = max k w k , where w k are the upper bounds on ψ k , k = 1, . . . , K , from Lemma 1.