1 Introduction

Consider the two-stage mixed-integer recourse model with random right-hand side

$$\begin{aligned} \eta ^* := \min _{x}\left\{ cx + Q(x) :\quad Ax = b,\;x \in X\subseteq \mathbb {R}^{n_1}_+\right\} ,\end{aligned}$$
(1)

where the recourse function Q is defined as

$$\begin{aligned} Q(x):=\mathbb {E}_{\omega }\left[ \min _y \left\{ qy:\quad Wy = \omega - Tx,\; y \in Y \subseteq \mathbb R^{n_2 }_+\right\} \right] , \quad x\in {X}. \end{aligned}$$
(2)

This model represents a two-stage decision problem under uncertainty. In the first stage, a decision x has to be made here-and-now, subject to deterministic constraints \(Ax = b\) and random goal constraints \(Tx = \omega \). Here, \(\omega \) is a continuous or discrete random vector whose probability distribution is known. In the second stage, the realization of \(\omega \) becomes known and any infeasibilities with respect to \(Tx = \omega \) have to be repaired. This is modelled by the second-stage problem

$$\begin{aligned} v(\omega ,x) := \min _y \left\{ qy:\quad Wy = \omega - Tx,\; y \in Y \subseteq \mathbb {R} ^{n_2 }_+\right\} .\end{aligned}$$
(3)

The objective in this two-stage recourse model is to minimize the sum of immediate costs cx and expected second-stage costs \(Q(x) = \mathbb {E}_\omega [v(\omega ,x)]\), \({x \in X}\).

Frequently, integrality restrictions are imposed on the first- and second-stage decisions. That is, X and Y are of the form \(X = {\mathbb {Z}}^{p_1}_+ \times \mathbb {R}^{n_1 - p_1}_+\) and \(Y = {\mathbb {Z}}^{p_2}_+\times \mathbb {R}^{n_2 - p_2}_+\). Such restrictions arise naturally when modelling real-life problems, for example to model on/off decisions or batch size restrictions. The resulting model is called a mixed-integer recourse (MIR) model. Such models have many practical applications in for example energy, telecommunication, production planning, and environmental control, see e.g. [10, 37].

While MIR models are highly relevant in practice, they are notoriously difficult to solve. The reason is that Q is in general non-convex if integrality restrictions are imposed on the second-stage decision variables y, see [20]. Therefore, standard techniques for convex optimization cannot be used to solve these models. In contrast, if \(Y = {\mathbb {R}}^{n_2}_+\), then Q is convex and efficient solution methods are available, most notably the L-shaped method in [32] and variants thereof.

Because of the non-convexity of Q, traditional solution methods for MIR models typically combine ideas from deterministic mixed-integer programming and stochastic continuous programming, see e.g. [4, 8, 9, 15, 17, 19, 29, 31, 38], and the survey papers [24, 28, 30]. We, however, use a fundamentally different approach to deal with the non-convex recourse function Q. Instead of solving the original MIR model in (1), we solve an approximating problem in which Q is replaced by a convex approximation \(\hat{Q}\) of Q. This convex approximation \({\hat{Q}}\) does not have to be a lower bound of Q. The resulting approximating optimization problem is given by

$$\begin{aligned} {\hat{\eta }} := \min _{x}\left\{ cx + {\hat{Q}}(x) :\quad Ax = b,\; x \in {X} \right\} .\end{aligned}$$
(4)

Because \(\hat{Q}\) is convex, we can use efficient techniques from convex optimization to solve the optimization problem in (4). Indeed, this optimization problem is convex if all first-stage variables \(x \in X\) are continuous, and it is a MIP with a convex objective if some of these first-stage variables are integer. Thus, compared to traditional solution methods, we are able to solve similar-sized problems much faster, and solve larger problem instances. In fact, we demonstrate this in our numerical experiments on problem instances available from [29] and SIPLIB [2], and on randomly generated problem instances.

Obviously, the optimal solution \({\hat{x}}\) of the approximating problem in (4) is not necessarily optimal for the original MIR model in (1). That is why we guarantee the quality of the approximating solution \({\hat{x}}\), by deriving an error bound on

$$\begin{aligned}|| Q - {\hat{Q}}||_{\infty } := \sup _x | Q(x) -{\hat{Q}}(x)| .\end{aligned}$$

This error bound directly gives us an upper bound on the optimality gap of \({\hat{x}}\):

$$\begin{aligned}c{\hat{x}} + Q({\hat{x}}) - \eta ^* \le 2||Q - {\hat{Q}}||_{\infty },\end{aligned}$$

see [25].

Convex approximations and corresponding error bounds have been derived for many different classes of models. The idea to use convex approximations \({\hat{Q}}\) for the non-convex mixed-integer recourse function Q dates back to [33], in which van der Vlerk proposes to use \(\alpha \)-approximations for the special case of simple integer recourse (SIR) models. These \(\alpha \)-approximations are obtained by perturbing the probability distribution of the random vector \(\omega \). Klein Haneveld et al. [14] derive an error bound for the \(\alpha \)-approximations that depends on the total variations of the marginal density functions of the random variables in the SIR model.

More convex approximations have been described for more general classes of problems. For example, in [34, 35] van der Vlerk extends the \(\alpha \)-approximations to a class of MIR models with a single recourse constraint, and to integer recourse models with a totally unimodular (TU) recourse matrix, respectively. Furthermore, Romeijnders et al. [26] derive an error bound for the latter approximation, and they derive a tighter error bound for the shifted LP-relaxation approximation for the same class of problems. The quality of the convex approximations for TU integer recourse models is assessed empirically in [22], and it turns out that they perform well if the variability of the random parameters in the model is large enough.

Romeijnders et al. [23] generalize the shifted LP-relaxation to general two-stage MIR models, and they derive a corresponding asymptotic error bound. This error bound converges to zero if the variability of the random parameters in the model increases. Romeijnders and van der Laan [21] derive similar error bounds for convex approximations that fit into a specific framework. In this framework, convex approximations are defined using pseudo-valid cutting planes for the second-stage feasible regions. Their idea is to only use cutting planes which are affine in the first-stage decision variables, so that the corresponding expected value function is convex. In general, the approximations are not exact, since pseudo-valid cutting planes cut away feasible solutions and are allowed to be overly conservative. Nevertheless, a similar error bound as for the shifted LP-relaxation has been derived if the pseudo-valid cutting planes are tight. That is, if they are exact on an entire grid of first-stage solutions.

Both the shifted LP-relaxation of [23] and the cutting plane framework of [21], however, cannot be applied directly to efficiently solve MIR models in general. This is because the shifted LP-relaxation is very difficult to compute in general, as discussed in [23], and furthermore, the asymptotic error bound in the cutting plane framework of [21] only applies for tight pseudo-valid cutting planes, which are only available in special cases, e.g. for SIR models. That is why we propose an alternative class of convex approximations for general two-stage MIR models, the so-called generalized \(\alpha \)-approximations. They are derived by exploiting properties of Gomory relaxations [11] of the second-stage mixed-integer programming problems.

Contrary to the shifted LP-relaxation, the generalized \(\alpha \)-approximations, denoted by \({\hat{Q}_\alpha }\), can be solved efficiently. In fact, we develop a so-called loose Benders decomposition algorithm to solve the approximating model in (4) with \({\hat{Q}} = {\hat{Q}_\alpha }\). Our Benders decomposition is called loose, because we derive optimality cuts for \({\hat{Q}_\alpha }\) which are in general not tight at the current solution. While these loose optimality cuts are in general not sufficient to find the optimal solution \({\hat{x}}\) of the approximating model in (4), we prove that they are tight enough in the sense that a similar performance guarantee applies to the solution obtained by the loose Benders decomposition as to \({\hat{x}}\).

Summarizing, our main contributions are as follows.

  • We propose a new class of convex approximations for general two-stage MIR models, which are based on Gomory relaxations of the second-stage problems. These generalized \(\alpha \)-approximations can be solved efficiently, and a similar error bound as for the shifted LP-relaxation applies to the generalized \(\alpha \)-approximations.

  • We derive a loose Benders decomposition algorithm to (approximately) solve the approximating model with the generalized \(\alpha \)-approximations. This is the first efficient algorithm for solving non-trivial convex approximations of general two-stage MIR models.

  • We prove that the solution obtained by the loose Benders decomposition algorithm has a similar performance guarantee as the exact solution to the generalized \(\alpha \)-approximations.

  • We carry out extensive numerical experiments on 41 test instances from the literature and on 240 randomly generated instances, and show that using our loose Benders decomposition algorithm we obtain good solutions within reasonable time, also for large problem instances, in particular when the variability of the random parameters in the model is large.

The remainder of this paper is organized as follows. In Sect. 2, we discuss the shifted LP-relaxation of [23] and its corresponding error bound. In Sect. 3, we present the generalized \(\alpha \)-approximations and an efficient algorithm for solving the corresponding approximating problem. Section 4 contains the proof of the performance guarantee for the loose Benders decomposition algorithm. In Sect. 5, we report on numerical experiments to evaluate the performance of our algorithm. Finally, we conclude in Sect. 6.

Throughout, we make the following assumptions. Assumptions (A2)–(A4) guarantee that Q(x) is finite for all \(x \in X\) such that \(Ax = b\).

  1. (A1)

    The first-stage feasible region \({\mathcal {X}} := \{x\in X: Ax = b\}\) is bounded.

  2. (A2)

    The recourse is relatively complete: for all \(\omega \in \mathbb {R}^m\) and \(x \in {\mathcal {X}}\), there exists a \(y \in Y\) such that \(Wy = \omega - Tx\), so that \(v(\omega ,x) < \infty \).

  3. (A3)

    The recourse is sufficiently expensive: \(v(\omega ,x) > -\infty \) for all \(\omega \in \mathbb {R}^m\) and \(x \in {\mathcal {X}}\).

  4. (A4)

    \(\mathbb {E}[|\omega _i|]\) is finite for all \(i=1,\dots ,m\).

  5. (A5)

    The recourse matrix W is integer.

2 Existing convex approximations of mixed-integer recourse functions

In this section, we review the shifted LP-relaxation approximation of [23] and its corresponding error bound. First, however, we review results on asymptotic periodicity in mixed-integer linear programming in Sect. 2.1. We do so since these results are not only used to derive the shifted LP-relaxation in Sect. 2.2, but also to derive the generalized \(\alpha \)-approximations in Sect. 3.1.

2.1 Asymptotic periodicity in mixed-integer programming

In order to derive convex approximations of the MIR function Q we analyze the value function v of the second-stage problem, defined as

$$\begin{aligned} v(\omega ,x)=\min _y \left\{ qy:\quad Wy = \omega - Tx, \; y \in {\mathbb {Z}}^{p_2}_+\times \mathbb {R} ^{n_2 - p_2 }_+\right\} .\end{aligned}$$
(5)

In particular, LP-duality implies that the LP-relaxation \(v_\text {LP}(\omega ,x)\) of \(v(\omega ,x)\) is polyhedral in the right-hand side vector \(\omega - Tx\):

$$\begin{aligned}v_\text {LP}(\omega ,x) = \max _{k = 1,\dots ,K} \lambda ^k (\omega - Tx),\end{aligned}$$

where \(\lambda ^k\), \(k = 1\dots ,K\), are the extreme points of the dual feasible region \(\{\lambda : \lambda W \le q\}\). Romeijnders et al. [23] derive a similar characterization of \(v(\omega , x)\) in terms of linear and periodic functions by exploiting so-called Gomory relaxations. We briefly discuss these Gomory relaxations, before stating the characterization of \(v(\omega ,x)\) in Lemma 1.

The Gomory relaxation of \(v(\omega ,x)\) is defined for any dual feasible basis matrix of \(v_\text {LP}(\omega ,x)\). Let B denote such a matrix and let N be such that \(W \equiv \begin{pmatrix} B&N \end{pmatrix}\), meaning equality up to a permutation of the columns. Let \(y_B\) and \(y_N\) denote the second-stage variables corresponding to the columns in B and N, respectively, and \(q_B\) and \(q_N\) their corresponding cost parameters. The Gomory relaxation \(v_B\) is obtained by relaxing the non-negativity constraints of the basic variables \(y_B\). Romeijnders et al. [23] derive the following expression for \(v_B\):

$$\begin{aligned}v_B(\omega ,x) = q_BB^{-1}(\omega - Tx) + \psi _B(\omega - Tx),\end{aligned}$$

where

$$\begin{aligned}&\psi _B(s) := \min \bigg \{{\bar{q}}_N y_N:\quad B^{-1}(s - Ny_N) \in \mathbb {Z}^{p_B} \times \mathbb {R}^{n_B} ,\nonumber \\&\qquad \qquad \qquad y_N \in \mathbb {Z}_+^{p_N} \times \mathbb {R}_+^{n_N} \bigg \},\quad s \in {\mathbb {R}}^m, \end{aligned}$$
(6)

and \({\bar{q}}_N = q_N - q_B (B^{-1}) N\). Moreover, they show that

$$\begin{aligned}v(\omega ,x) = v_B(\omega ,x) = q_BB^{-1}(\omega - Tx) + \psi _B(\omega - Tx)\end{aligned}$$

if \(\omega - Tx \in \varLambda := \{t:B^{-1}t \ge 0\}\), and if the distance of \(\omega - Tx\) to the boundary of \(\varLambda \) is sufficiently large, see Definition 1. We say that \(v(\omega , x)\) is asymptotically periodic, since \(\psi _B\) is a B-periodic function, i.e. \(\psi _B(s + Bl) = \psi _B(s)\) for every \(s \in {\mathbb {R}}^m\) and \(l \in {\mathbb {Z}}^m\), see [23].

Definition 1

Let \(\varLambda \subset {\mathbb {R}}^m\) be a closed convex set and let \(d \in {\mathbb {R}}\) with \(d > 0\) be given. Then, we define \(\varLambda (d)\) as

$$\begin{aligned}\varLambda (d) := \{s \in \varLambda : {\mathcal {B}}(s, d) \subset \varLambda \},\end{aligned}$$

where \({\mathcal {B}}(s, d) := \{t \in {\mathbb {R}}^m : ||t - s||_2 \le d\}\) is the closed ball centered at s with radius d. We can interpret \(\varLambda (d)\) as the set of points in \(\varLambda \) with at least Euclidean distance d to the boundary of \(\varLambda \).

Lemma 1

[23, Theorem 2.9] Consider the mixed-integer programming problem

$$\begin{aligned}v(\omega ,x):=\min _y\left\{ qy:\quad Wy = \omega - Tx, \; y \in {\mathbb {Z}}^{p_2}_+ \times {\mathbb {R}}^{n_2 - p_2}_+ \right\} ,\end{aligned}$$

where W is an integer matrix, and \(v(\omega ,x)\) is finite for all \(\omega \in {\mathbb {R}}^m\) and \(x \in ~{\mathbb {R}}^n\). Then, there exist dual feasible basis matrices \(B^k\) of \(v_{\text {LP}}\), \(k = 1,\dots ,K\), closed convex polyhedral cones \(\varLambda ^k := \{t\in {\mathbb {R}}^m: (B^k)^{-1}t \ge 0 \}\), distances \(d_k\), \(B^k\)-periodic functions \(\psi ^k\), and constants \(w_k\) such that we have the following:

  1. (i)

    \(\displaystyle \bigcup _{k = 1}^K \varLambda ^k = \mathbb {R}^m\).

  2. (ii)

    \((int \ \varLambda ^k) \cap (int \ \varLambda ^l) = \emptyset \) for every \(k,l \in \{1,\dots ,K\}\) with \(k \ne l\).

  3. (iii)

    If \(\omega - Tx \in \varLambda ^k(d_k)\), then

    $$\begin{aligned} v(\omega ,x) = v_\text {LP}(\omega ,x)+ \psi ^k(\omega - Tx) =q_{B^k} (B^k)^{-1}(\omega - Tx) + \psi ^k(\omega - Tx) ,\end{aligned}$$

    where \(\psi ^l \equiv \psi ^k\) if \(q_{B^k} (B^k)^{-1} = q_{B^l} (B^l)^{-1}\).

  4. (iv)

    \(0\le \psi ^{{k}}(\omega - Tx) \le w_k\) for all \(\omega \in {\mathbb {R}}^m\) and \(x \in {\mathbb {R}}^{n_1}, \quad k = 1,\dots ,K\).

Lemma 1 (iii) shows that if \(\omega - Tx \in \varLambda ^k(d_k)\) for some \(k = 1,\dots ,K\), then \(v(\omega ,x)\) is equal to the sum of the LP-relaxation \(v_\text {LP}(\omega ,x)\) and \(\psi ^k(\omega - Tx)\). Hence, \(\psi ^k(\omega - Tx)\) can be interpreted as the additional costs resulting from the integrality restrictions on the decision variables y. For a discussion on how to obtain \(d_k\) or how to represent \(\varLambda ^k(d_k)\) using a system of linear inequalities we refer to [23].

2.2 The shifted LP-relaxation approximation

Lemma 1 shows why the second-stage value function is not convex in x. On regions of its domain it is the sum of a linear function \(q_{B^k}(B^k)^{-1}(\omega - Tx)\) and a periodic function \(\psi ^k(\omega - Tx)\). Clearly the periodic part is causing v to be non-convex. That is why the shifted LP-relaxation is obtained by replacing this periodic part \(\psi ^k(\omega - Tx)\) by a constant \(\varGamma _k\) for every \(k = 1,\dots ,K\), with \(\varGamma _k\) defined as

$$\begin{aligned} \varGamma _k := p_k^{-m} \int _0^{p_k}\dots \int _0^{p_k} \psi ^k(x)dx_1\dots dx_m ,\end{aligned}$$
(7)

where \(p_k = |\det B_k|\). The K constants \(\varGamma _k\) can be interpreted as the averages of the periodic functions \(\psi ^k\). The shifted LP-relaxation approximation is obtained by taking the pointwise maximum over all dual feasible basis matrices \(B^k\), \(k = 1,\dots ,K\).

Definition 2

Define the shifted LP-relaxation approximation \({\tilde{Q}}\) of the MIR function Q as \({\tilde{Q}}(x) = {\mathbb {E}}_{\omega }[{\tilde{v}}(\omega ,x)]\), where

$$\begin{aligned}{\tilde{v}}(\omega ,x):= \max _{k = 1,\dots ,K}\{q_{B^k}(B^k)^{-1} (\omega - Tx) + \varGamma _k\},\end{aligned}$$

where \(B^k\), \(k =1,\dots ,K\), are the dual feasible basis matrices of Lemma 1, and \(\varGamma _k\) is defined in (7).

Romeijnders et al. [23] derive a total variation error bound on the approximation error \(||Q - {\tilde{Q}}||_\infty \) of the shifted LP-relaxation \({\tilde{Q}}\). This error bound is expressed in terms of the total variations of the one-dimensional conditional probability density functions (pdf) of the random vector \(\omega \).

Definition 3

Let \(f: \mathbb {R}\rightarrow \mathbb {R}\) be a real-valued function, and let \(I \subset \mathbb {R}\) be an interval. Let \(\varPi (I)\) denote the set of all finite ordered sets \(P = \{x_1,\dots ,x_{N+1}\}\) with \(x_{1}< \dots < x_{N + 1}\) in I. Then, the total variation of f on I, denoted by \(|\varDelta |f(I)\), is defined as

$$\begin{aligned}|\varDelta |f(I) = \sup _{P \in \varPi (I)}V_f(P),\end{aligned}$$

where

$$\begin{aligned}V_f(P) = \sum _{i = 1}^{N}|f(x_{i + 1}) - f(x_i)|.\end{aligned}$$

We write \(|\varDelta |f := |\varDelta |f(\mathbb {R})\).

Definition 4

For every \(i = 1,\dots ,m\) and \(x_{-i} \in {\mathbb {R}}^{m - 1}\), define the i-th conditional density function \(f_i(\cdot | x_{-i})\) of the m-dimensional joint pdf f as

$$\begin{aligned}f_i(x_i | x_{-i}) = \frac{f(x)}{f_{-i}(x_{-i})},\end{aligned}$$

where \(f_{-i}\) is the joint pdf of the \((m - 1)\)-dimensional random vector \(\omega _{-i}\), which is equal to \(\omega \) without its i-th component. Define \({\mathcal {H}}^m\) as the set of all m-dimensional joint pdf f such that \(f_i(\cdot | x_{-i})\) is of bounded variation for all \(i =1, \dots ,m\) and \(x_{-i} \in {\mathbb {R}}^{m-1}\).

Theorem 1

[23, Theorem 5.1] There exists a constant \(C > 0\) such that for every continuous random vector \(\omega \) with joint pdf \(f \in {\mathcal {H}}^m\),

$$\begin{aligned} \sup _x |Q(x) - {\tilde{Q}}(x)| \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] .\end{aligned}$$
(8)

In general, the error bound in Theorem 1 may be large, in particular since the constant \(C>0\) may be large. Nevertheless, the theorem shows that if the total variations of the one-dimensional conditional pdf are small, then \({\tilde{Q}}\) is a good approximation of Q. For example, if the components \(\omega _i\) of \(\omega \) follow independent normal distributions, with mean \(\mu _i\) and variance \(\sigma _i^2\), for \(i = 1,\dots ,m\), then the error bound in (8) simplifies to \(C' \sum _{ i =1}^m \sigma _i^{-1}\) for some \(C'>0\); see [21, Example 2] for details. Observe that the error bound goes to zero if \({\sigma _i \rightarrow \infty }\) for all \(i = 1,\dots ,m\). Thus, the error of using the shifted LP-relaxation approximation decreases if the variability of the random parameters in the model increases.

3 Loose Benders decomposition algorithm for two-stage mixed-integer recourse models

3.1 Generalized \(\alpha \)-approximations

To derive the generalized \(\alpha \)-approximations \({\hat{Q}_\alpha }\), we first derive a convex approximation \({\hat{v}_\alpha }\) of the second-stage value function v defined in (5), and we define \({\hat{Q}_\alpha }(x) := {\mathbb {E}}_\omega [{\hat{v}_\alpha }(\omega ,x)]\). Similar as for the shifted LP-relaxation \({\tilde{v}}\), we use Lemma 1, i.e. for \(\omega - Tx \in \varLambda ^k(d_k)\),

$$\begin{aligned}v(\omega ,x) = {v_{B^k}(\omega ,x) =\;} q_{B^k}(B^k)^{-1}(\omega - Tx) + \psi ^k(\omega - Tx).\end{aligned}$$

However, instead of replacing \(\psi ^k(\omega - Tx)\) by its average \(\varGamma _k\), as is done for the shifted LP-relaxation, we replace \(\psi ^k(\omega - Tx)\) by \(\psi ^k(\omega - \alpha )\) for some \(\alpha \in {\mathbb {R}}^m\) to obtain the generalized \(\alpha \)-approximation \({\hat{v}_\alpha ^k}\) of \(v_{B^k}\) defined as

$$\begin{aligned}{\hat{v}_\alpha ^k}(\omega ,x) = q_{B^k}(B^k)^{-1}(\omega - Tx) + \psi ^k(\omega - \alpha ),\end{aligned}$$

and we define the generalized \(\alpha \)-approximation \({\hat{v}_\alpha }\) of v as

$$\begin{aligned}{\hat{v}_\alpha }(\omega ,x) = \max _k {\hat{v}_\alpha ^k}(\omega ,x).\end{aligned}$$

The difference compared to the shifted LP-relaxation seems small: the constants \(\varGamma _k\) are replaced by \(\psi ^k(\omega - \alpha )\). From a computational point of view, however, this difference is significant. This is because the constants \(\varGamma _k\) are the averages over \(\psi ^k\), and in general need to be obtained by computing a multi-dimensional integral of a mixed-integer value function. For a fixed \(\omega \) and \(\alpha \), however, the value of \(\psi ^k(\omega - \alpha )\) is obtained by solving a single mixed-integer programming problem of the same size as the second-stage problem. In fact, we need to solve the Gomory relaxation discussed in Sect. 2.1, which can be done in polynomial time if all second-stage variables are integer [11].

Definition 5

For \(\alpha \in {\mathbb {R}}^m\), we define the generalized \(\alpha \)-approximation \({\hat{Q}_\alpha }\) of Q as

$$\begin{aligned}{\hat{Q}_\alpha }(x) := {\mathbb {E}}_{\omega }\left[ \max _{k = 1,\dots ,K} \{\lambda ^k(\omega - Tx) + \psi ^k(\omega - \alpha )\}\right] ,\quad x\in {\mathbb {R}}^{n_1},\end{aligned}$$

with \(\lambda ^k := q_{B^k}(B^k)^{-1}\) and \(\psi ^k := \psi _{B^k}\), where \(B^k\), \(k = 1,\dots ,K\), are the dual feasible basis matrices of Lemma 1.

Remark 1

The generalized \(\alpha \)-approximations are a generalization of the \(\alpha \)-approximations defined for TU integer recourse models [26], which arise if \(y \in {\mathbb {Z}}^p_+\) and if the second-stage constraints are of the form \(Wy \ge \omega - Tx\), where W is a TU matrix. Indeed, for these models, we have \(\psi ^k(s) = \lambda ^k(\lceil s \rceil -~s)\), see [23], and thus

$$\begin{aligned}{\hat{Q}_\alpha }(x) = \mathbb E_{\omega }\left[ \max _{k = 1,\dots ,K} \lambda ^k(\lceil \omega - \alpha \rceil + \alpha - Tx) \right] ,\end{aligned}$$

which is the expression for the \(\alpha \)-approximations for TU integer recourse models in [26].

It turns out that we can derive a similar error bound for the generalized \(\alpha \)-approximations as for the shifted LP-relaxation.

Theorem 2

There exists a constant \(C > 0\), such that for every \(\alpha \in {\mathbb {R}}^m\) and for every continuous random vector \(\omega \) with joint pdf \(f \in {\mathcal {H}}^m\),

$$\begin{aligned}{\sup _{x}|Q(x) - {\hat{Q}_\alpha }(x)| \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] .}\end{aligned}$$

Proof

See “Appendix”. \(\square \)

Theorem 2 states that the approximation error of \({\hat{Q}_\alpha }\) goes to zero as the total variations \(\mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] \), \(i = 1,\dots ,m\), all go to zero. This error bound is independent of \(\alpha \), i.e. \({\hat{Q}_\alpha }\) is a good approximation of the true MIR function Q for any \(\alpha \in {\mathbb {R}}^m\).

An interesting difference between the generalized \(\alpha \)-approximations and the shifted LP-relaxation of Sect. 2.2 is that the approximating value function \({\hat{v}_\alpha }\) is not convex in \(\omega \) for fixed \(x\in {\mathbb {R}}^{n_1}\). Indeed, \({\hat{v}_\alpha }\) is only convex in x for every fixed \(\omega \), but this is sufficient to guarantee that the generalized \(\alpha \)-approximation \({\hat{Q}_\alpha }\) is convex. In contrast, the value function \({\tilde{v}}\) of the shifted LP-relaxation is convex in both \(\omega \) and x. We illustrate these properties in Example 1. In this example, we also illustrate the difference between the generalized \(\alpha \)-approximations and the pseudo-valid cutting plane approximation of [21].

Example 1

Consider the second-stage mixed-integer value function v defined as

$$\begin{aligned}v(\omega ,x) := \min \{y_1 + 2y_2 + 2y_3: \; y_1 + y_2 - y_3 = \omega -x, \; y_1\in {\mathbb {Z}}_+, \; y_2,\;y_3 \in {\mathbb {R}}_+\}.\end{aligned}$$

We use this example, since it also appears in [23] and [21], allowing us to compare the generalized \(\alpha \)-approximations to the shifted LP-relaxation and a pseudo-valid cutting plane approximation.

The LP-relaxation of v has two dual feasible basis matrices \(B^1 = [-1]\) and \(B^2 = [1]\). Thus, \(K = 2\), and straightforward computations yield \(\lambda ^1 = -2\), \(\lambda ^2 = 1\), \(\psi ^1 \equiv 0\), and for every \(s \in {\mathbb {R}}\),

$$\begin{aligned}\psi ^2(s) = \left\{ \begin{array}{ll} s - \lfloor s \rfloor , &{} \text{ if } s - \lfloor s \rfloor \le 3/4\text{, }\\ 3 - 3 (s- \lfloor s \rfloor ), &{} \text{ if } s - \lfloor s \rfloor \ge 3/ 4 \text{. }\end{array}\right. \end{aligned}$$

It follows that \(v_\text {LP}(\omega ,x) = \max \{-2(\omega - x), \omega - x\}\). Moreover, for every \(\alpha \in {\mathbb {R}}\), the approximating value function is defined as

$$\begin{aligned}{\hat{v}_\alpha }(\omega ,x) = \max \{ -2(\omega - x), \omega - x + \psi ^2(\omega - \alpha ) \}.\end{aligned}$$

In contrast, the value function \({\tilde{v}}\) of the shifted LP-relaxation equals

$$\begin{aligned}{\tilde{v}}(\omega ,x) = \max \{-2(\omega - x), \omega - x + 3 / 8\},\end{aligned}$$

since \(\varGamma ^2 := \int _0^1 \psi ^2(s)ds = 3/8\). Finally, the tight pseudo-valid cutting plane approximation \({\hat{v}}_c\) is given by

$$\begin{aligned}{\hat{v}}_c(\omega ,x) = \min _{y \in {\mathbb {R}}^3_+} \{y_1 + 2y_2 + 2y_3:&y_1 + y_2 - y_3 = \omega -x, \\&y_1 - \frac{1}{1 +\lfloor \omega \rfloor - \omega }y_3 \le \lfloor \omega \rfloor - x\}.\end{aligned}$$

Figure 1a, b show \(v(\omega ,x)\)\({\hat{v}_\alpha }(\omega ,x)\)\({\tilde{v}}(\omega ,x)\), and \({\hat{v}}_c(\omega ,x)\) as a function of x and \(\omega \), respectively. They illustrate that \({\hat{v}_\alpha }\) is convex in x for fixed \(\omega \), but not convex in \(\omega \) for fixed x, respectively. Moreover, Fig. 1b illustrates a key difference between \({\tilde{v}}(\omega , x)\) and \({\hat{v}_\alpha }(\omega ,x)\). Whereas \({\tilde{v}}(\omega ,x)\) is obtained by replacing the periodic part \(\psi ^2(\omega - x)\) of \(v(\omega ,x)\) by its average \(\varGamma ^2\), we obtain \({\hat{v}_\alpha }(\omega ,x)\) by shifting \(\psi ^2(\omega - x)\) towards \(\psi ^2(\omega - \alpha )\).

Fig. 1
figure 1

The value function v and the approximating value functions \({\tilde{v}}\)\({\hat{v}}_c\), and \({\hat{v}_\alpha }\), where \(\alpha = 0\). The left figure shows \(v(\omega ,x)\), \({\hat{v}_\alpha }(\omega ,x)\), and \({\hat{v}}_c(\omega ,x)\), as a function of x, whereas the right figure shows \(v(\omega ,x)\)\({\hat{v}_\alpha }(\omega ,x)\), and \({\tilde{v}}(\omega ,x)\) as a function of \(\omega \)

Finally, Fig. 1a shows that \({\hat{v}_\alpha }(\omega ,x) = {\hat{v}}_c(\omega ,x)\) for a range of x values. However, the difference \({\hat{v}}_c(\omega , x) - {\hat{v}_\alpha }(\omega , x)\) is relatively large if \(x \in (-0.3,0.7)\). This is true in general for tight pseudo-valid cutting plane approximations: there always exist \({\hat{d}}_k\) such that \({\hat{v}_\alpha }(\omega ,x) = {\hat{v}}_c(\omega ,x)\) if \(\omega - Tx \in \varLambda ^k({\hat{d}}_k)\). However, the two approximations may differ if \(\displaystyle \omega - Tx \notin \bigcup _{k} \varLambda ^k({\hat{d}}_k).\) \(\lozenge \)

3.2 Benders decomposition for the generalized \(\alpha \)-approximations

We solve the approximating problem

$$\begin{aligned} {\hat{\eta }}_\alpha := \min _x \{cx + {\hat{Q}_\alpha }(x):\quad Ax = b, \;x \in {X} \},\end{aligned}$$
(9)

using an SAA of \({\hat{Q}_\alpha }\). Given a sample \(\{\omega ^1,\dots ,\omega ^S\}\) of size S from the distribution of \(\omega \), the SAA \(\hat{Q}_\alpha ^S\) of \({\hat{Q}_\alpha }\) is defined as

$$\begin{aligned} \hat{Q}_\alpha ^S(x) := \frac{1}{S}\sum _{s = 1}^S{{\hat{v}_\alpha }(\omega ^s, x)}, \quad {x \in X},\end{aligned}$$
(10)

and the corresponding SAA problem as

$$\begin{aligned} {{\hat{\eta }}_\alpha ^S :=} \min _x \{cx + \hat{Q}_\alpha ^S(x):\quad Ax = b ,\; x\in X\}. \end{aligned}$$
(11)

Since \(\hat{Q}_\alpha ^S\) is convex, we can solve the SAA problem using Benders decomposition [6], i.e. using an L-shaped algorithm [32]. In iteration \(\tau \) of this algorithm, we solve (11) with \(\hat{Q}_\alpha ^S\) replaced by a convex polyhedral outer approximation \(\hat{Q}_{\text {out}}^{\tau }\le \hat{Q}_\alpha ^S\). This problem is called the master problem

$$\begin{aligned} {\hat{\eta }}_\tau := \min _{x}\{cx + \hat{Q}_{\text {out}}^{\tau }(x): \quad Ax = b,\; x\in X\},\end{aligned}$$
(12)

and its optimal solution \(x^{\tau }\) is referred to as the current solution at iteration \(\tau \). If \(\hat{Q}_{\text {out}}^{\tau }(x^\tau ) = \hat{Q}_\alpha ^S(x^{\tau })\), then the current solution \(x^{\tau }\) is optimal for the original SAA problem in (11). If not, then we strengthen the outer approximation using an optimality cut for \(\hat{Q}_\alpha ^S\) of the form

$$\begin{aligned}\hat{Q}_\alpha ^S(x) \ge \beta _{\tau + 1} x + \delta _{\tau + 1} ,\quad \forall x\in X,\end{aligned}$$

which is tight at \(x^{\tau }\), i.e. \(\hat{Q}_\alpha ^S(x^\tau ) = \beta _{\tau + 1} x^\tau + \delta _{\tau +1}\). The outer approximation \(\hat{Q}_{\text {out}}^{\tau }\) in iteration \(\tau \) is the pointwise maximum over all optimality cuts derived in previous iterations:

$$\begin{aligned}\hat{Q}_{\text {out}}^{\tau }(x) = \max _{t = 1,\dots ,\tau }\{ \beta _t x + \delta _t\}, \quad x\in X.\end{aligned}$$

The challenge for the generalized \(\alpha \)-approximations, however, is to compute tight optimality cuts of \(\hat{Q}_\alpha ^S\). Since

$$\begin{aligned}\hat{Q}_\alpha ^S(x^\tau ) = \frac{1}{S} \sum _{s = 1}^S \max _{k = 1,\dots ,K}\{\lambda ^k(\omega ^s - Tx^\tau ) + \psi ^k(\omega ^s - \alpha ),\} \quad x \in X,\end{aligned}$$

such tight optimality cuts can be obtained by identifying for each scenario s the maximizing index \(k_s^{\tau }\) at \(x^\tau \), defined as

$$\begin{aligned} k_s^{\tau }\in \underset{k = 1,\dots ,K}{\text {arg max}}\{\lambda ^k(\omega ^s- Tx^\tau ) + \psi ^k(\omega ^s- \alpha )\}, \quad s =1,\dots ,S .\end{aligned}$$
(13)

However, this is computationally too expensive, since we need to compute \(\psi ^k(\omega ^s - \alpha )\) for all \(k= 1,\dots ,K\), and K grows exponentially in the size of the second-stage problem. Indeed, K is at least as large as the number of dual vertices of the feasible region of the LP-relaxation \(v_\text {LP}\) of v, of which there are exponentially many.

3.3 Loose Benders decomposition for the generalized \(\alpha \)-approximations

To overcome this computational challenge we propose to use approximate indices \(\hat{k}_s^{\tau }\) defined as

$$\begin{aligned} \hat{k}_s^{\tau }\in \underset{k = 1,\dots ,K}{\text {arg max}}\; \lambda ^k(\omega ^s - Tx^\tau ), \quad s = 1,\dots , S.\end{aligned}$$
(14)

These indices are computationally tractable since they correspond to the optimal basis matrix index of the LP-relaxation \(v_\text {LP}(\omega ^s, x^\tau )\), defined in (5). Hence, they can be obtained by solving a single LP. Moreover, if the values of \(\psi ^k(\omega ^s - \alpha )\) are equal for all \(k = 1,\dots ,K\), then the indices \(\hat{k}_s^{\tau }\) are optimal in (13), i.e. \(\hat{k}_s^{\tau }= k_s^{\tau }\). In general however, the indices \(k_s\) are suboptimal in (13), leading to the following definition.

Definition 6

Let \(x^\tau \) be given and let \(\hat{k}_s^{\tau }\), \(s = 1,\dots ,S\), be as in (14). We define the loose optimality cut for \(\hat{Q}_\alpha ^S\) at \(x^\tau \) as

$$\begin{aligned}{\hat{Q}_\alpha ^S(x) \ge \;} \frac{1}{S} \sum _{s = 1}^S\lambda ^{\hat{k}_s^{\tau }} (\omega ^s - T x) + \psi ^{\hat{k}_s^{\tau }}(\omega ^s-\alpha ),\quad x\in {X}.\end{aligned}$$

We use these loose optimality cuts in our loose Benders decomposition algorithm, LBDA(\(\alpha \)). In this algorithm the outer approximation \(\hat{Q}_{\text {out}}^{\tau }\) of \(\hat{Q}_\alpha ^S\) is defined using our loose optimality cuts. The algorithm terminates with tolerance level \(\varepsilon \ge 0\) if \(\hat{Q}_{\text {out}}^{\tau }(x^\tau ) \ge \hat{Q}_{\text {out}}^{\tau + 1}(x^\tau ) - \varepsilon \); then, LBDA(\(\alpha \)) reports \({\hat{x}_\alpha }= x^\tau \) as solution.

A full description of LBDA(\(\alpha \)) is given below. Note that the algorithm requires a lower bound L on \(\hat{Q}_\alpha ^S\) such that \(\hat{Q}_\alpha ^S(x) \ge L\) for all \(x \in {\mathcal {X}}\). Such a lower bound L exists because of Assumptions (A1)–(A4).

figure a

The solution \({\hat{x}_\alpha }\) of LBDA(\(\alpha \)) is not necessarily \(\varepsilon \)-optimal for the SAA problem in (11), since LBDA(\(\alpha \)) uses the loose optimality cuts of Definition 6. If the optimality cuts were tight, then \(\hat{Q}_{\text {out}}^{\tau + 1}(x^\tau ) = \hat{Q}_\alpha ^S(x^\tau )\) for every iteration \(\tau \), and thus the algorithm would terminate if \(\hat{Q}_{\text {out}}^{\tau }(x^\tau ) \ge \hat{Q}_\alpha ^S(x^\tau ) - \varepsilon \). We, however, show that our loose optimality cuts are asymptotically tight at \(x^\tau \): the difference \(\hat{Q}_{\text {out}}^{\tau + 1}(x^\tau ) - \hat{Q}_\alpha ^S(x^\tau )\) converges to zero if the total variations of the one-dimensional conditional pdf of the random vector \(\omega \) go to zero and as \(S \rightarrow \infty \), see Proposition 1. In this case, we are able to prove that the LBDA(\(\alpha \)) solution \({\hat{x}}_\alpha \) is near-optimal; see Theorem 3 below for a bound on the optimality gap \(c{\hat{x}}_\alpha + Q({\hat{x}}_\alpha ) - \eta ^*\). This performance guarantee is independent of \(\alpha \), and thus it applies if we take e.g. \(\alpha = 0\). We further explore selection of \(\alpha \) in Sect. 5. The proof of the performance guarantee is postponed to Sect. 4.2.

Theorem 3

Consider the two-stage mixed-integer recourse model

$$\begin{aligned}\eta ^* = \min _x \{cx + Q(x):\quad Ax = b, \; x \in {X}\}.\end{aligned}$$

Let \({\hat{x}_\alpha }\) denote the solution by LBDA(\(\alpha \)) with tolerance \(\varepsilon \) and sample size S. Then, there exists a constant \(C > 0\) such that for every continuous random vector \(\omega \) with joint pdf \(f \in {\mathcal {H}}^m\)

$$\begin{aligned}c {\hat{x}_\alpha }+ Q({\hat{x}_\alpha }) - \eta ^*\le \varepsilon + C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] , \end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \).

Theorem 3 implies that the optimality gap of \({\hat{x}_\alpha }\) converges to the prespecified tolerance \(\varepsilon \) as the total variations of the underlying one-dimensional conditional pdf go to zero.

3.4 Implementation details of LBDA(\(\alpha \))

LBDA(\(\alpha \)) can be implemented efficiently if the input size of the second-stage problem \(v(\omega ,x)\) is moderate. During each iteration \(\tau \), we have to solve the LP-relaxation \(v_\text {LP}(\omega ^s,x^\tau )\) and the Gomory relaxation \(v_{B^{\hat{k}_s^{\tau }}}(\omega ^s - \alpha , 0)\) for each scenario \(s = 1,\dots , S\), in order to generate a loose optimality cut. If the input size of \(v(\omega ,x)\) is not too large, then \(v_\text {LP}(\omega ^s, x^\tau )\) and \(v_{B^{\hat{k}_s^{\tau }}}(\omega ^s - \alpha , 0)\) can be solved in reasonable time using standard LP and MIP solvers, respectively. Moreover, the master problem can be solved efficiently using a standard LP solver, or MIP solver if some of the first-stage decision variables are integer. Improved implementations of LBDA(\(\alpha \)) using a multicut approach [7] and regularization techniques [27] are possible. Furthermore, the subproblems \(v_{B^{\hat{k}_s^{\tau }}}(\omega ^s - \alpha , 0)\), \(s = 1,\dots ,S\) can be solved in parallel, and it may be beneficial to solve these subproblems inexactly in the first phase of the algorithm.

In Sect. 5, we exploit that LBDA(\(\alpha \)) can be run multiple times, using different values of \(\alpha \). The performance guarantee of LBDA(\(\alpha \)) is independent of \(\alpha \) and thus applies to every candidate solution that we obtain in this way. Moreover, we use in- and out-of-sample evaluation to select the best candidate solution. In our numerical experiments, we investigate several schemes for selecting the values of \(\alpha \). Running LBDA(\(\alpha \)) multiple times can be done very efficiently by using parallelization and a common warm start. That is, we first apply the L-shaped algorithm of [32] to the LP-relaxation of the original problem (1), in which the integer restrictions on the second-stage variables y are relaxed. Next, we run LBDA(\(\alpha \)) for each value of \(\alpha \), and we keep the optimality cuts generated for the LP-relaxation \(Q_\text {LP}\) of Q, which are also valid for \({\hat{Q}_\alpha }\), for any value of \(\alpha \).

4 Performance guarantee of LBDA(\(\alpha \))

4.1 Convergence of sampling and loose optimality cuts

The performance guarantee of LBDA(\(\alpha \)) does not follow directly from our error bound for the generalized \(\alpha \)-approximations. The reason is that LBDA(\(\alpha \)) uses sampling and the loose optimality cuts of Definition 6 to solve the corresponding approximating problem. We consider these aspects in Sects. 4.1.1 and 4.1.2, respectively. In particular, we prove consistency of the SAA and asymptotic tightness of our loose optimality cuts.

4.1.1 Consistency of the sample average approximation

Intuitively, the SAA becomes better as the sample size S increases. Indeed, we show in Lemma 2 that the SAA \(\hat{Q}_\alpha ^S\) of \({\hat{Q}_\alpha }\) converges uniformly to \({\hat{Q}_\alpha }\) on \({\mathcal {X}}\) w.p. \(1\) as \(S \rightarrow \infty \).

Lemma 2

Consider the generalized \(\alpha \)-approximation \({\hat{Q}_\alpha }\) and its sample average approximation \(\hat{Q}_\alpha ^S\). Then,

$$\begin{aligned}\sup _{x \in {\mathcal {X}}} \left| {\hat{Q}_\alpha }(x) - \hat{Q}_\alpha ^S(x) \right| \rightarrow 0\end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \), where \({\mathcal {X}} = \{x \in {\mathbb {R}}^{n_1}_+ :Ax \le b \}\).

Proof

See “Appendix”. \(\square \)

Corollary 1

\(\displaystyle {\sup _{x \in {\mathcal {X}}} |Q(x) - \hat{Q}_\alpha ^S(x)| \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] }\) w.p. 1 as \({S \rightarrow \infty }\).

Proof

Since

$$\begin{aligned}\sup _{x \in {\mathcal {X}}} |Q(x) - \hat{Q}_\alpha ^S(x)| \le \sup _{x \in {\mathcal {X}}} |Q(x) - {\hat{Q}_\alpha }(x)| + \sup _{x \in {\mathcal {X}}} |{\hat{Q}_\alpha }(x) - \hat{Q}_\alpha ^S(x)|,\end{aligned}$$

the result follows directly by combining Theorem 2 and Lemma 2. \(\square \)

Corollary 1 implies a similar error bound on the difference between the optimal values \(\eta ^*\) and \({\hat{\eta }}^S_\alpha \) of the original MIR problem (1) and the SAA problem (11), respectively, see Corollary 2.

Corollary 2

Consider the optimal values \(\eta ^*\) and \({\hat{\eta }}_\alpha ^S\) of the MIR problem (1) and the SAA problem (11). Then,

$$\begin{aligned} {\hat{\eta }}_\alpha ^S - \eta ^* \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] \end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \).

Proof

Since the optimal solution \(x^*\) of (1) is feasible but not necessarily optimal in (11), we have

$$\begin{aligned} {\hat{\eta }}_\alpha ^S - \eta ^* \le c x^* + \hat{Q}_\alpha ^S(x^*) - \eta ^* = \hat{Q}_\alpha ^S(x^*) - Q(x^*) \le \sup _{x \in {\mathcal {X}}} |Q(x) - \hat{Q}_\alpha ^S(x)|. \end{aligned}$$

The result follows directly from Corollary 1. \(\square \)

4.1.2 Asymptotic tightness of loose optimality cuts

If we derive a loose optimality cut \(\hat{Q}_\alpha ^S(x) \ge \beta _{\tau + 1} x+ \delta _{\tau + 1}\) for \(\hat{Q}_\alpha ^S\) at \(x^\tau \) in LBDA(\(\alpha \)), then the gap \(\hat{Q}_\alpha ^S(x^\tau )-{(\beta _{\tau + 1} x^\tau + \delta _{\tau + 1})}\) may be positive, since the cut is not necessarily tight at \(x^\tau \). However, we derive a bound on this gap by considering the index function

$$\begin{aligned}{\hat{k}}(\omega , x) \in \underset{k = 1,\dots ,K}{\text {arg max}}\; \lambda ^k(\omega - Tx), \quad \omega \in \varOmega , \; x \in X.\end{aligned}$$

Note that \({\hat{k}}(\omega ^s, x^\tau ) = \hat{k}_s^{\tau }\), i.e. the index function \({\hat{k}}(\omega ,x)\) is used to derive our loose optimality cuts. Based on this index function, we define the lower bounding function \({L_{\alpha }^S}\) of \(\hat{Q}_\alpha ^S\), given by

$$\begin{aligned} {L_{\alpha }^S}(x):=\frac{1}{S} \sum _{s = 1}^S \left[ \lambda ^{{\hat{k}}(\omega ^s,x)} (\omega ^s - T x) + \psi ^{{\hat{k}}(\omega ^s,x)}(\omega ^s-\alpha )\right] , \quad x \in X, \end{aligned}$$
(15)

and we note that \({L_{\alpha }^S}(x^\tau )\) is the value of the loose optimality cut at \(x^\tau \), i.e. \({L_{\alpha }^S}(x^\tau ) = \beta _{\tau + 1} x^\tau + \delta _{\tau + 1}\). The function \({L_{\alpha }^S}\) is a lower bound of \(\hat{Q}_\alpha ^S\) since its corresponding value function, defined as

$$\begin{aligned} {\nu _\alpha }(\omega ,x) := \lambda ^{{\hat{k}}(\omega , x)} (\omega - Tx) + \psi ^{{\hat{k}}(\omega ,x)}(\omega - \alpha ),\quad x\in X, \end{aligned}$$
(16)

is a lower bound of \({\hat{v}_\alpha }(\omega ,x)\) for all \(\omega \in \varOmega \) and \(x \in X\).

Proposition 1 contains a uniform total variation bound on the difference between \({L_{\alpha }^S}\) and \(\hat{Q}_\alpha ^S\). In order to derive this result, we first analyze the difference between \({\nu _\alpha }\) and \({\hat{v}_\alpha }\). It turns out that \({\nu _\alpha }\) equals \({\hat{v}_\alpha }\) on large parts of the domain, see Lemma 3.

Lemma 3

Consider the value functions \({\nu _\alpha }\) and \({\hat{v}_\alpha }\), with \({\nu _\alpha }\) defined in (16) and \({\hat{v}_\alpha }\) defined as

$$\begin{aligned} {{\hat{v}_\alpha }(\omega ,x) = \max _{k = 1,\dots ,K} \left\{ \lambda ^k(\omega - Tx) + \psi ^k(\omega - \alpha ) \right\} .} \end{aligned}$$
(17)

Let \(\varLambda ^k\), \(k = 1,\dots ,K\), denote the closed convex cones from Lemma 1. Then, there exist vectors \(\sigma _k \in \varLambda ^k\) and a constant \(R > 0\) such that

  1. (i)

    \(0 \le {\hat{v}_\alpha }(\omega ,x) - {\nu _\alpha }(\omega ,x) \le R\) for all \(\omega \in {\mathbb {R}}^m\) and \(x \in {\mathbb {R}}^{n_1}\), and

  2. (ii)

    \(\omega - Tx \in \sigma _k + \varLambda ^k \implies {{\nu _\alpha }(\omega ,x) =\;} {\hat{v}_\alpha }(\omega ,x) = \lambda ^k(\omega - Tx) + \psi ^k(\omega -\alpha )\).

Proof

See “Appendix”. \(\square \)

Proposition 1

Consider the SAA of the generalized \(\alpha \)-approximation \(\hat{Q}_\alpha ^S\) and its lower bound \({L_{\alpha }^S}\), defined in (15). There exists a constant \(C > 0\) such that for every continuous random vector \(\omega \) with joint pdf \(f \in {\mathcal {H}}^m\),

$$\begin{aligned}\sup _{x \in {{\mathcal {X}}}}\left| \hat{Q}_\alpha ^S(x) - {L_{\alpha }^S}(x)\right| \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] \end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \).

Proof

Define \(\varDelta (\omega ,x) := {\hat{v}_\alpha }(\omega ,x) - {\nu _\alpha }(\omega ,x)\), so that

$$\begin{aligned}\hat{Q}_\alpha ^S(x) - {L_{\alpha }^S}(x) = \frac{1}{S}\sum _{s = 1}^S \varDelta (\omega ^s,x).\end{aligned}$$

We derive an upper bound on \(\varDelta (\omega ,x)\), independent of x. We then apply the strong law of large numbers (SLLN) to obtain the desired result.

If we define \({\mathcal {M}} = \cup _{k = 1}^K (\sigma _k + \varLambda _k)\), where \(\sigma _k\) and \(\varLambda ^k\), \(k = 1,\dots ,K\), are the vectors and closed convex cones from Lemma 3, then

$$\begin{aligned}\varDelta (\omega ,x) \le \xi (\omega ,x):= \left\{ \begin{array}{ll} R, &{} \text{ if } \omega - Tx \notin {\mathcal {M}}, \\ 0, &{} \text{ if } \omega - Tx \in \mathcal M,\end{array} \right. \end{aligned}$$

where R is the upper bound on \({\hat{v}_\alpha }(\omega ,x) - {\nu _\alpha }(\omega ,x)\) from Lemma 3. Moreover, we can derive a bound on \({\mathbb {P}}[\omega - Tx \in {\mathcal {M}}]\). Unfortunately, the random variable \(\xi (\omega ,x)\) depends on x. Therefore, we cannot apply the SLLN to

$$\begin{aligned}\sup _{x \in {{\mathcal {X}}}} \frac{1}{S}\sum _{s = 1}^S \xi (\omega ^s,x) \end{aligned}$$

if \({\mathcal {X}}\) is infinite. To resolve this, we use that \({\mathcal {X}}\) is bounded. Let D denote the diameter of \(T{{\mathcal {X}}}\), i.e., \(||Tx - Tx'|| \le D\) for all \(x, x' \in {{\mathcal {X}}}\). Define \({\mathcal {M}}' \subset {\mathcal {M}}\) as \({\mathcal {M}}':= \bigcup _{k =1}^K (\sigma _k + \varLambda ^k)(D)\). Fix an arbitrary \({{\bar{x}}} \in X\). Note that for all \(x \in {{\mathcal {X}}}\),

$$\begin{aligned}\omega - T{{\bar{x}}} \in {\mathcal {M}}'&\implies \exists k: \omega - T{{\bar{x}}} \in (\sigma _k + \varLambda ^k )(D)\\ {}&\implies \exists k: \omega - Tx \in (\sigma _k + \varLambda ^k) \implies \omega - Tx \in {\mathcal {M}}.\end{aligned}$$

We obtain

$$\begin{aligned}\varDelta (\omega ,x)\le \xi (\omega ):= \left\{ \begin{array}{ll} R &{} \text{ if } \omega - T{{\bar{x}}} \notin {\mathcal {M}}' \\ 0 &{} \text{ if } \omega - T{{\bar{x}}} \in {\mathcal {M}}'\end{array} \right. \end{aligned}$$

Note that \(\bar{\xi }(\omega )\) only depends on a fixed \(x^* \in {{\mathcal {X}}}\) and is independent of x. By the SLLN,

$$\begin{aligned}\frac{1}{S}\sum _{s = 1}^S {\bar{\xi }}(\omega ^s) \rightarrow R \mathbb P[\omega - T{{\bar{x}}} \notin \mathcal M'],\end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \).

By [23, Lemma 3.9], \({\mathbb {R}}^m\setminus \mathcal M'\) can be covered by finitely many hyperslices, that is,

$$\begin{aligned}{\mathbb {R}}^m\setminus {\mathcal {M}}' \subset \bigcup _{j =1}^J H_j,\end{aligned}$$

where the hyperslices \(H_j\) are defined as

$$\begin{aligned}H_j := \{x \in {\mathbb {R}}^m: 0\le a_j^T x \le \delta _j \},\end{aligned}$$

for some \(a_j^T\) and \(\delta _j\). By [23, Theorem 4.6], there exists a constant \(\beta > 0\) such that

$$\begin{aligned}{\mathbb {P}}[\omega - Tx^* \notin {\mathcal {M}}'] \le \beta \sum _{i =1}^{m} {\mathbb {E}}_{\omega _{-i}}\left[ |\varDelta | f_i(\cdot | \omega _{-i})\right] .\end{aligned}$$

The result now follows from

$$\begin{aligned} \sup _{x \in {{\mathcal {X}}}}\left| \hat{Q}_\alpha ^S(x) - {L_{\alpha }^S}(x)\right|&= \sup _{x \in {{\mathcal {X}}}}\frac{1}{S}\sum _{s = 1}^S \varDelta (\omega ^s,x)\\&\le \frac{1}{S}\sum _{s = 1}^S \bar{\xi }(\omega ^s) \rightarrow R {\mathbb {P}}[\omega - Tx^* \notin {\mathcal {M}}'] \\&\le C \sum _{i =1}^{m} \mathbb E_{\omega _{-i}}\left[ |\varDelta | f_i(\cdot | \omega _{-i})\right] , \end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \), where \(C = R\beta \).\(\square \)

Proposition 1 shows that our loose optimality cuts are asymptotically tight, since for every iteration \(\tau \) in LBDA(\(\alpha \)) the loose optimality cut for \(\hat{Q}_\alpha ^S\) at \(x^\tau \) is tight for \({L_{\alpha }^S}\) at \(x^\tau \). Hence, if the underlying total variations are small enough and if S is large enough, then the loose optimality cut is nearly tight for \(\hat{Q}_\alpha ^S\) at \(x^\tau \).

4.2 Error bound on the optimality gap of LBDA(\(\alpha \))

We are now ready to prove the performance guarantee of LBDA(\(\alpha \)) in Theorem 3. Intuitively, we are able to derive this bound since (i) our loose optimality cuts are asymptotically tight, and thus \({\hat{x}_\alpha }\) is near-optimal for the SAA problem with \(\hat{Q}_\alpha ^S\), (ii) the SAA \(\hat{Q}_\alpha ^S\) converges uniformly to \({\hat{Q}_\alpha }\), and (iii) Theorem 2 contains a uniform error on the difference between \({\hat{Q}_\alpha }\) and Q.

Proof of Theorem 3

Let \({\hat{x}_\alpha }\) denote the solution returned by LBDA(\(\alpha \)). Since \({\hat{x}_\alpha }:= x^\tau \) is the current solution in the final iteration \(\tau \) of the algorithm, it follows that \({\hat{x}_\alpha }\) is a minimizer of

$$\begin{aligned}\min _{x} \{cx + \hat{Q}_{\text {out}}^{\tau }(x): \quad Ax = b, x \in X\}.\end{aligned}$$

Moreover, since \(\hat{Q}_{\text {out}}^{\tau }\le \hat{Q}_\alpha ^S\), it follows that \({c{\hat{x}_\alpha }+ \hat{Q}_{\text {out}}^{\tau }({\hat{x}_\alpha }) \le {\hat{\eta }}_\alpha ^S}\), and thus, by rearranging terms and adding \(\hat{Q}_\alpha ^S\) on both sides,

$$\begin{aligned} c {\hat{x}_\alpha }+ \hat{Q}_\alpha ^S({\hat{x}_\alpha }) - {\hat{\eta }}_\alpha ^S \le \hat{Q}_\alpha ^S({\hat{x}_\alpha }) - \hat{Q}_{\text {out}}^{\tau }({\hat{x}_\alpha }).\end{aligned}$$
(18)

The right-hand side of (18) represents an upper bound on the optimality gap of \({\hat{x}_\alpha }\) in the SAA problem (11). The termination criterion of LBDA(\(\alpha \)) guarantees that this upper bound is not too large. Indeed, at termination it holds that

$$\begin{aligned}\hat{Q}_{\text {out}}^{\tau }({\hat{x}_\alpha }) \ge \hat{Q}_{\text {out}}^{\tau + 1}({\hat{x}_\alpha }) - \varepsilon \ge {L_{\alpha }^S}({\hat{x}_\alpha }) - \varepsilon ,\end{aligned}$$

and thus the upper bound in (18) reduces to

$$\begin{aligned} c {\hat{x}_\alpha }+ \hat{Q}_\alpha ^S({\hat{x}_\alpha }) - {\hat{\eta }}_\alpha ^S&\le \hat{Q}_\alpha ^S({\hat{x}_\alpha }) - {L_{\alpha }^S}({\hat{x}_\alpha })+ \varepsilon \nonumber \\ {}&\le \sup _{x \in {\mathcal {X}}}|\hat{Q}_\alpha ^S(x) - {L_{\alpha }^S}(x)| + \varepsilon .\end{aligned}$$
(19)

In the end, however, we are not interested in the optimality gap of \({\hat{x}_\alpha }\) in the SAA problem (11) but in the optimality gap

$$\begin{aligned}c{\hat{x}_\alpha }+ Q({\hat{x}_\alpha }) - \eta ^*\end{aligned}$$

of \({\hat{x}_\alpha }\) in the original MIR problem (1). Adding and subtracting both \(\hat{Q}_\alpha ^S({\hat{x}_\alpha })\) and \({\hat{\eta }}_\alpha ^S\), and using (19), yields

$$\begin{aligned} c{\hat{x}_\alpha }+ Q({\hat{x}_\alpha }) - \eta ^*&= (c{\hat{x}_\alpha }+ Q({\hat{x}_\alpha }) - {\hat{\eta }}_\alpha ^S) + (Q({\hat{x}_\alpha }) - \hat{Q}_\alpha ^S({\hat{x}_\alpha })) + (\hat{\eta }_\alpha ^S - \eta ^*) \\&\le \sup _{x \in {\mathcal {X}}}|\hat{Q}_\alpha ^S(x) - {L_{\alpha }^S}(x)| + \varepsilon + \sup _{x \in {\mathcal {X}}}|Q(x) - \hat{Q}_\alpha ^S(x)| + {\hat{\eta }}_\alpha ^S - \eta ^*. \end{aligned}$$

Applying Corollaries 1 and 2, and Proposition 1, we conclude that there exists a constant \(C > 0\) such that

$$\begin{aligned}c{\hat{x}_\alpha }+ Q({\hat{x}_\alpha }) - \eta ^* \le C \sum _{i = 1}^m \mathbb {E}_{\omega _{-i}}\left[ |\varDelta |f_i(\cdot | \omega _{-i})\right] + \varepsilon ,\end{aligned}$$

w.p. \(1\) as \(S \rightarrow \infty \). \(\square \)

The performance guarantee for LBDA(\(\alpha \)) in Theorem 3 is a worst-case bound. For many problem instances, the actual performance may be much better. In Sect. 5, we assess the performance of LBDA(\(\alpha \)) empirically on a wide range of test instances.

5 Numerical experiments

We test the performance of LBDA(\(\alpha \)) on problem instances from the literature and on randomly generated instances, see Sects. 5.2 and 5.3, respectively. In particular, we consider (variations of) an investment planning problem in [29], and two classes of problem instances available from SIPLIB [2], namely the SIZES problem [13] and the stochastic server location problem (SSLP) [18]. First, however, we describe the set-up of our numerical experiments in Sect. 5.1.

5.1 Set-up of numerical experiments

In our numerical experiments, we compare LBDA(\(\alpha \)) to several benchmark methods, in terms of costs, relative optimality gaps, and computations times. Since the performance of LBDA(\(\alpha \)) depends on \(\alpha \), we investigate four different approaches to select \(\alpha \).

First, we take \(\alpha \) equal to the zero vector. Second, we take \(\alpha = \alpha ^* := Tx^*\), where \(x^*\) is the optimal solution of the original problem. Obviously, for large problem instances \(x^*\) is unknown, however, we expect that \(\alpha ^*\) is a good choice of \(\alpha \) since the generalized \(\alpha \)-approximations are obtained by replacing Tx by \(\alpha \) in the Gomory relaxations, and thus \({\hat{Q}}_{\alpha ^*}\) is a good approximation of Q near the true optimal solution \(x^*\). We test this for the smaller problem instances for which \(x^*\) is known.

Since \(\alpha ^*\), however, typically cannot be computed for larger problem instances, we also propose an iterative scheme, in which we first obtain \({\hat{x}}_{\alpha _0}\) by running LBDA\((\alpha _0)\), where \(\alpha _0\) is the zero vector. Next, we run LBDA\((\alpha _1)\), where \(\alpha _1 := T{\hat{x}}_{\alpha _0}\). We extend this scheme to 100 iterations by recursively defining \(\alpha _{k + 1} := T{\hat{x}}_{\alpha _k}\), \(k = 1,\dots , 100\). We then select the best value of \(\alpha \) in terms of expected costs, denoted by \(\alpha ^\#\). Finally, we apply LBDA(\(\alpha \)) multiple times using 100 different values of \(\alpha \), drawn from an multivariate uniform distribution on \([0,100]^m\), and we denote the best value of \(\alpha \) in terms of the expected costs by \(\alpha ^+\).

In order to compare these approaches, note that the expected costs \(cx + Q(x)\) of a candidate solution x can be computed exactly if the random vector \(\omega \) has a finite number of realizations, as is the case for the SIZES, SSLP, and investment planning problems that we consider. Therefore, if the optimal value \(\eta ^*\) is known, then the relative optimality gap \(\rho (x)\), defined as

$$\begin{aligned} \rho (x) = \frac{cx + Q(x) - \eta ^*}{|\eta ^*|} * 100\%, \end{aligned}$$

can be computed exactly, and otherwise bounds on \(\rho (x)\) can be computed.

In contrast, for our randomly generated instances, we assume that \(\omega \) is continuously distributed. For these instances, we use the multiple replications procedure (MRP) [16] with Latin hypercube sampling [5] to obtain 95% confidence upper bounds on \(\rho (x)\). Moreover, we compare the performance of LBDA(\(\alpha \)) to a range of benchmark solutions, using out-of-sample estimation of \(cx + Q(x)\), with a sample size of \(10^5\), which guarantees that the standard errors of our results are sufficiently small. The benchmark and LBDA(\(\alpha \)) solutions are computed using a sample of size \(S = 1000\).

The first benchmark solution \({\bar{x}}_S\) is obtained by solving the deterministic equivalent formulation (DEF) of the corresponding SAA of the original problem (1). The DEF is a large-scale MIP, which, typically, cannot be solved in reasonable time by standard MIP solvers. Hence we also solve the DEF using a smaller sample size \(S' = 100\), resulting in the second benchmark solution \({\bar{x}}_{S'}\).

We obtain three additional benchmark solutions by solving the generalized \(\alpha \)-approximations exactly for \(\alpha = 0\), \(\alpha = \alpha ^*\) and \(\alpha = \alpha ^+\), that is, we find the optimal solution \(x_\alpha ^*\) of the approximating problem (4) with \({\hat{Q}} = \hat{Q}_\alpha ^S\). We do so by solving the approximation second-stage problems

$$\begin{aligned} \max _{k = 1,\dots ,K}\{\lambda ^k(\omega - Tx) + \psi ^k(\omega -\alpha )\} \end{aligned}$$

by enumeration over \(k = 1,\dots ,K\). For this reason, \(x_\alpha ^*\) can only be computed in reasonable time for small problem instances.

Finally, we consider two trivial benchmark solutions, which we expect to outperform significantly. First, we relax the integer restrictions on the second-stage variables in the SAA of the original problem (1), resulting in the benchmark solution \(x_\text {LP}\). Second, we solve the Jensen approximation, which replaces the distribution of \(\omega \) by a degenerate distribution at \(\mu = {\mathbb {E}}_\omega [\omega ]\), and denote the optimal solution by \(x_\mu \).

We run our experiments on a single Intel Xeon E5 2680v3 core @2.5GHz with Gurobi 7.0.2. To ensure a fair comparison between solutions, we use common random numbers where possible and we limit the computation time of each algorithm to two hours.

5.2 Test instances from the literature

5.2.1 The SIZES problem

We first consider all instances of the SIZES test problem suite [13] from SIPLIB. These instances have mixed-binary variables in both stages, and differ in the number of scenarios, namely 3, 5, and 10. The DEF of the largest instance has 341 constraints and 825 variables, of which 110 are binary. We refer to [2] for further details. In Table 1, we report the outcomes of LBDA(\(\alpha \)) and solving the DEF.

Table 1 The SIZES problem

We observe from Table 1 that LBDA(\(\alpha \)) performs very well for all choices of \(\alpha \). Indeed, on every instance, the optimality gaps of all LBDA\((\alpha )\) solutions are below 0.5%, and below 0.2% for \(\alpha = \alpha ^+\). Moreover, LBDA\((\alpha )\) runs very fast for all instances: for a single value of \(\alpha \), the computation time of LBDA(\(\alpha \)) is always below one second.

Another observation is that LBDA\((\alpha ^\#)\) and LBDA\((\alpha ^+)\) consistently outperform LBDA(0), at the expense of additional computation time. Nevertheless, the computation times of LBDA(\(\alpha \)) for \(\alpha ^\#\) and \(\alpha ^+\) are still moderate, and they scale much better to larger instances than solving the DEF. In particular, the time taken to solve the DEF grows exponentially as the sample size S increases, whereas the computation times of LBDA(\(\alpha \)) are approximately linear in S. Finally, LBDA(\(\alpha \)) performs very well if \(\alpha = \alpha ^*\). However, since \(\alpha ^*\) is not known in practice, it is useful to observe that LBDA\((\alpha ^+)\) achieves similar performance.

5.2.2 The stochastic server location problem

The SSLP instances are more challenging in terms of input size than the SIZES instances. Indeed, the DEF of the largest instance has over 1,000,000 binary decision variables and 120,000 constraints. Their first- and second-stage problems are pure binary and mixed-binary, respectively, and \(\omega \) follows a discrete distribution. A full problem description can be found in [18], in which the instances are solved using the \(D^2\) algorithm of [31]. See also  [1] and [12] for more recent computational experiments on these test instances using other exact approaches. In Table 2 we report the best known running time for each SSLP instance over all these exact approaches, along with the outcomes of LBDA(\(\alpha \)).

Table 2 The stochastic server location problem

Strikingly, LBDA(\(\alpha \)) was able to solve all instances to optimality for \({\alpha = \alpha ^+}\) and \({\alpha = \alpha ^*}\). Moreover, LBDA(0) and LBDA\((\alpha ^\#)\) solved all instances except SSLP_15_45_5, on which both achieved an optimality gap of 0.45%. In terms of computation time, LBDA(0) is clearly preferred to LBDA\((\alpha ^\#)\) and LBDA\((\alpha ^+)\), while achieving similar results. Finally, although directly comparing LBDA(\(\alpha \)) to the other approaches is not completely fair since the algorithms were run on different machines, it is clear that LBDA(\(\alpha \)) is generally faster than exact approaches. Indeed, LBDA(0) solved all instances in under one minute, and eight out of ten instances were solved in less than ten seconds, whereas the fastest exact approach required at least one minute for six out of ten instances, and over one hour for the largest instance.

5.2.3 An investment planning problem

We consider the following problem by Schultz et al. [29],

$$\begin{aligned}\min _{x}\{-3/2x_1 -4x_2 + \mathbb {E}_{\omega }[v(\omega ,x)]: \quad x \in [0,5]^2\},\end{aligned}$$

where

$$\begin{aligned}&v(\omega ,x) = \min _{y \in Y} \{{-16y_1} {-19}y_2 {-23}y_3 {-28}y_4: 2y_1+ 3y_2 +4y_3 + 5 y_4 \le \omega _1 - x_1\\&\quad 6y_1+ \;\;y_2 +3y_3 + \;\;y_4 \le \omega _2 - x_2\},\end{aligned}$$

and where the second-stage decision variables are binary, i.e. \(Y = \{0,1\}^4\), and the random vector \(\omega = (\omega _1, \omega _2)\) follows a discrete distribution which assigns equal probabilities to \(S = 441\) equidistant lattice points of \([5,15]^2\). Schultz et al. consider a second variant of this problem by choosing the technology matrix T as

$$\begin{aligned} T = H := \begin{pmatrix} 2/3 &{} 1/3\\ 1/3&{}2/3\end{pmatrix},\end{aligned}$$

whereas in the original formulation, T is the identity matrix \(I_2\). For both variants, we consider \(S \in \{ 4, 9, 36, 121, 441, 1681, 10201 \}\) and \(Y = {\mathbb {Z}}^4_+\), in addition to \(Y = \{0,1\}^4\), as is done in [19]. For each of the resulting 28 instances, Table 3 shows the results of LBDA(\(\alpha \)) and solving the DEF. Note that if Gurobi could not solve an instance within two hours, then we report the gaps to \({\hat{x}}^*\), the best solution that Gurobi was able to find.

Table 3 An investment planning problem

Overall, LBDA(\(\alpha \)) performs well on the instances in Table 3. In particular, for \(\alpha = \alpha ^\#\) and \(\alpha = \alpha ^+\), LBDA(\(\alpha \)) achieves gaps that are below 2% on 22 and 23 out of 28 instances, respectively. In general, LBDA(\(\alpha \)) achieves better results if S is larger. For example, if \(Y = \{0,1\}^4\) and \(T = H\), then the gaps are strictly decreasing in S. This is in line with the performance guarantee of LBDA(\(\alpha \)) in Theorem 3: if S is larger, then the distributions of \(\omega _1\) and \(\omega _2\) more closely resemble a continuous uniform distribution on [5, 15], which has small total variation, i.e. the error bound in Theorem 3 is small.

On some instances, LBDA(\(\alpha \)) does not consistently perform well for all values of \(\alpha \). For example, if \(Y = {\mathbb {Z}}^4_+\), \(T = H\), and \(S = 9\), then LBDA\((\alpha ^+)\) and LBDA(0) achieve gaps of 0.1% and 12.9%, respectively. Furthermore, the instances with \(Y = \{0,1\}^4\), \(T = H\), and \(S \in \{4, 9\}\) turn out to be difficult instances for LBDA(\(\alpha \)): for all choices of \(\alpha \), the resulting gaps are \(8.8\%\) and \(5\%\), respectively.

In fact, for every choice of \(\alpha \), LBDA(\(\alpha \)) achieves identical gaps if \({Y = \{0,1\}^4}\) and \(T = H\), but the gaps are much smaller if \(S \ge 36\), e.g. if \(S = 10201\), then the gaps are 0.1%. In contrast, there are large differences between the different choices of \(\alpha \) if \(Y = \{0,1\}^4\) and \(T = I\). For these instances, \(\alpha = \alpha ^\#\) and \(\alpha = \alpha ^+\) outperform the other choices of \(\alpha \). However, similar as for the SIZES instances in Sect. 5.2.1, there is a trade-off between performance and computation times, since LBDA\((\alpha ^+)\) and LBDA\((\alpha ^\#)\) require 100 LBDA(\(\alpha \)) runs, which is computationally more demanding than LBDA(0). Nevertheless, on every instance, the computation times of LBDA(\(\alpha \)) for \(\alpha \in \{\alpha ^\#,\alpha ^+\}\) are below 20 minutes, and below 3 minutes if \(S \le 1681\).

5.3 Randomly generated test instances

We generate random MIR problems of the form (1), with \(X = {\mathbb {R}}^{n_1}_+\) and

$$\begin{aligned}Q(x)=\mathbb {E}_{\omega }\left[ \min _y \left\{ qy:\quad Wy \ge \omega - Tx,\; y \in {\mathbb {Z}}^p_+\right\} \right] , \quad x\in X.\end{aligned}$$

In addition, we assume that the components of the random vector \(\omega \in {\mathbb {R}}^m\) follow independent normal distributions with mean 10 and standard deviation \(\sigma \in \{0.1,0.5,1,2,4,10\}\). The parameters c, q, T, and W are fixed, and their elements are drawn from discrete uniform distributions with supports contained in [1, 5], [5, 10], [1, 6], and [1, 6], respectively.

The reason that we consider multiple values of the standard deviation \(\sigma \) is that Theorem 3 implies that LBDA(\(\alpha \)) performs better as \(\sigma \) increases. This is because the total variations of the one-dimensional conditional pdf are small if \(\sigma \) is large, and thus the error bound on the optimality gap achieved by LBDA(\(\alpha \)) is also small.

To prevent noise in the outcomes of our experiments, we compute the average optimality gaps, costs, and computation times over 20 randomly generated test instances for each value of \(\sigma \). We consider test instances of two different sizes, namely \(n_1 = 10\), \(p = 5\), \(m = 5\) (small), and \(n_1 = 100\), \(p = 40\), \(m = 20\) (large). Tables 4 and 5 display the results for the small and large versions, respectively.

Table 4 Randomly generated test instances (\(n_1 = 10\), \(p = 5\), \(m = 5\))
Table 5 Randomly generated test instances (\(n_1 = 100\), \(p = 40\), \(m = 20\))

From these results, we observe that LBDA(\(\alpha \)) clearly outperforms the sampling solutions in terms of computation time and scalability to larger problem instances. In particular, we observe that the computation time of LBDA(\(\alpha \)) is of the same order of magnitude as that of \(x_\text {LP}\), while it performs significantly better in terms of optimality gaps and out-of-sample estimated expected costs. Undeniably, our results indicate that LBDA(\(\alpha \)) can be implemented very efficiently and that it can handle large MIR problem instances.

In line with the performance guarantee in Theorem 3, LBDA(\(\alpha \)) performs better for larger values of \(\sigma \). For example, on the small instances, the optimality gaps achieved by LBDA(0) are strictly decreasing in \(\sigma \), and for \(\sigma = 10.0\), LBDA\((\alpha ^+)\) outperforms the sampling solution \({\bar{x}}_S\). A similar observation is true for the optimal solution of the generalized \(\alpha \)-approximations \({\hat{Q}_\alpha }\), as we would expect based on the error bound for \({\hat{Q}_\alpha }\) in Theorem 2. Observe, however, that even for large values of \(\sigma \), the optimality gaps reported in Table 4 for LBDA(\(\alpha \)) are relatively large, i.e. around 3–4%, and that the optimality gaps achieved by e.g. LBDA\((\alpha ^+)\) are not strictly decreasing in \(\sigma \). The reason is that the actual optimality gaps are likely much smaller than the upper bounds reported in Table 4. This is because they are computed using the MRP, which relies on solving multiple SAAs of the original problem. Since, however, Gurobi has difficulties solving the DEFs of these SAAs in reasonable time, especially for large values of \(\sigma \), the bounds obtained using the MRP are typically not sharp.

Interestingly, \(x_\text {LP}\) also performs better as \(\sigma \) increases. An explanation is that our error bound for the generalized \(\alpha \)-approximation implies that the MIR function Q becomes closer to a convex function as \(\sigma \) increases. Thus, since the LP-relaxation of Q is a convex lower bound of Q, its approximation error is expected to become smaller as \(\sigma \) increases. Note however, that unlike the generalized \(\alpha \)-approximations, the approximation error of the LP-relaxation does not go to zero. Indeed, based on our results, LBDA(\(\alpha \)) is clearly preferred to \(x_\text {LP}\), since LBDA(\(\alpha \)) consistently outperforms \(x_\text {LP}\), at the expense of very little additional computation time.

Furthermore, the results in Tables 4 and 5 indicate that \(\alpha = T{\bar{x}}_S\) is a good choice for LBDA(\(\alpha \)). Indeed, if \(\alpha = T{\bar{x}}_S\), then LBDA(\(\alpha \)) and \(x^*_\alpha \) perform similar to \({\bar{x}}_S\). For example, on the small instances, they achieve optimality gaps that are on average within 0.2% and 1.1% of \({\bar{x}}_S\), respectively. However, since \({\bar{x}}_S\) is difficult to compute in practice, the use of LBDA\((T{\bar{x}}_S)\) is limited, whereas LBDA\((\alpha ^+)\) and LBDA\((\alpha ^\#)\) can be applied directly. Similar as for the instances in Sects. 5.2.1 and 5.2.3, they achieve significantly better results than LBDA(0), at the expense of higher computation times. In particular, on the large instances, LBDA\((\alpha ^\#)\) performs 2 to 5 times as well as LBDA(0), and on the small instances, LBDA\((\alpha ^+)\) achieves optimality gaps that are within 0.6% of \({\bar{x}}_S\) for \(\sigma \ge 0.5\). While the computation times of LBDA(\(\alpha \)) increase for \(\alpha = \alpha ^+\) and \(\alpha = \alpha ^\#\) compared to \(\alpha = 0\), they remain manageable: even the large instances are solved within 26 minutes.

Finally, observe from Table 4 that LBDA(\(\alpha \)) generally achieves better or similar performance as \(x_\alpha ^*\). In other words, the fact that LBDA(\(\alpha \)) uses loose optimality cuts to solve the generalized \(\alpha \)-approximations has no negative effect on the solution quality.

6 Conclusion

We consider two-stage mixed-integer recourse models with random right-hand side. Due to non-convexity of the recourse function, such models are extremely difficult to solve. We develop a tractable approximating model by using convex approximations of the recourse function. In particular, we propose a new class of convex approximations, the so-called generalized \(\alpha \)-approximations, and we derive a corresponding error bound on the difference between these approximations and the true recourse function. In addition, we show that this error bound is small if the variability of the random parameters in the model is large. More precisely, the error bound for the generalized \(\alpha \)-approximations goes to zero as the total variations of the one-dimensional conditional probability density functions of the random right-hand side vector in the model go to zero.

The advantage of the generalized \(\alpha \)-approximations over existing convex approximations is that it can be solved efficiently. In fact, we describe a loose Benders decomposition algorithm, LBDA(\(\alpha \)), which efficiently solves the corresponding approximating model. The quality of the candidate solution \({{\hat{x}_\alpha }}\) generated by LBDA(\(\alpha \)) in the original model is guaranteed by Theorem 3, which states an upper bound on the optimality gap of \({{\hat{x}_\alpha }}\). This performance guarantee is similar to the error bound we prove for the generalized \(\alpha \)-approximations. Indeed, we show that the optimality gap of \({{\hat{x}_\alpha }}\) is small if the variability of the random parameters in the model is large.

In addition to this theoretical guarantee on the solution quality, we assess LBDA(\(\alpha \)) empirically on a range of test instances. In particular, we consider the SIZES and SSLP instances from SIPLIB, an investment planning problem by [29], and randomly generated instances. We find that LBDA(\(\alpha \)) performs well in terms of computation times, scalability to larger problem instances, and solution quality. In particular, LBDA(\(\alpha \)) is able to solve larger instances than traditional sampling techniques and its computation times scale more favourably in the input size of the instances. In terms of solution quality, LBDA(\(\alpha \)) solves the SIZES and SSLP instances to near optimality and generally performs very well on the investment planning instances. Moreover, on the randomly generated instances, LBDA(\(\alpha \)) performs similar to traditional sampling techniques and achieves small optimality gaps if the variability of the random parameters in the model is medium to large.

One avenue for future research is to derive sharper theoretical error bounds for the generalized \(\alpha \)-approximations. While Theorem 3 provides conditions under which our solution method performs well, the quantitative error bound cannot be computed, as it depends on an unknown and potentially large constant C. A sharp tractable error bound would be an improvement over our current results. Another avenue is the extension of our solution method to more general mixed-integer recourse models, for example by allowing for randomness in the second-stage cost coefficients q, technology matrix T, or recourse matrix W.