1 Introduction

Recourse models are a class of models within the field of stochastic optimization that deal with decision-making problems under uncertainty (see, e.g., [1]). The explicit way in which these models incorporate uncertainty make them suitable for a wide array of practical problems, ranging from energy optimization [2] to scheduling [3] and vehicle routing problems [4]. In this paper, we focus on (two-stage) risk-averse mixed-integer recourse (MIR) models. These are a class of recourse models with two decision stages that combine two difficulties: integer restrictions on decision variables and risk aversion. Integer restrictions are often necessary for accurate modeling of practical problem aspects, such as on/off decisions, indivisibilities, or fixed batch sizes. Furthermore, risk aversion is an important aspect of preferences of decision makers and is reflected in the way uncertainty is captured in the model. Compared with risk-neutral models (in which the expected total cost is minimized), risk-averse models put more emphasis on avoiding extremely undesirable outcomes.

Unfortunately, risk-averse MIR models are notoriously hard to solve. In particular, the inclusion of integer restrictions makes these models significantly harder to solve than their continuous counterparts. Indeed, for a wide class of risk measures used in practice, risk-averse continuous recourse models are convex, and efficient solution methods have been developed in the literature. These methods exploit convexity of the objective function and make use of solution methods based on convex optimization. See, e.g., Ahmed [5], Miller and Ruszczyński [6] and Noyan [7] for decomposition methods and Rockafellar [8] for a progressive hedging algorithm.

Risk-averse mixed-integer recourse models, however, are generally not convex. As a result, efficient methods from convex optimization cannot be applied and hence, alternative methods are required. In the literature, some authors have proposed solution methods based on large-scale mixed-integer linear programming (MILP) reformulations [9, 10] or heuristic methods for specific problem settings [11]. However, these methods only work for problems with a specific structure (e.g., disaster relief planning [11] or location-allocation problems [10]) or a particular choice of risk measure (for instance, CVaR MIR models can be reformulated as risk-neutral MIR models with an additional auxiliary variable [9]) and of a limited size (since they are based on computationally demanding non-convex optimization techniques [9]). To the best of our knowledge, efficient solution methods for general large-scale risk-averse MIR models are still lacking.

In this paper we take an alternative approach: rather than aiming for an exact solution, we propose to solve a convex approximation instead. That is, we approximate the objective function by a convex function. The resulting convex approximation model is amenable to convex optimization techniques and therefore much easier to solve. This idea has been developed over the past few decades for risk-neutral MIR models first [12,13,14,15,16,17,18,19,20,21], and recently also for risk-averse MIR models with a mean-CVaR risk measure [22]. In this paper we extend this literature to the more general class of risk-averse MIR models with an arbitrary law invariant coherent risk measure.

Our first contribution is the derivation of a performance guarantee for convex approximations of risk-averse MIR models. In particular, we derive an error bound: an upper bound on the approximation error. We do so by extending the error bound from [22] for the case with CVaR to our more general setting by using the Kusuoka representation [23] of law invariant coherent risk measures. Like the error bounds from the literature on risk-neutral MIR models [14, 17], this error bound is asymptotic in nature and depends on the distribution of the second-stage right-hand side vector. If the distribution is highly dispersed, then the error bound is small and hence, our convex approximation is good. In comparison with the error bounds from [14] and [17], our error bound contains an additional factor associated with the risk measure \(\rho \): the error bound is higher if the maximum probability distortion factor associated with \(\rho \) is higher. For the special case of risk-averse totally unimodular integer recourse models, we derive a specialized, tighter error bound whose dependence on the risk measure \(\rho \) is more favorable.

Our second contribution is the extension of our error bounds to a setting where the distribution of the second-stage right-hand side vector is discrete. Our original error bounds are only valid if this right-hand side vector has a continuous distribution. In practice, however, distributions are often discretized to avoid the numerically demanding task of computing multi-dimensional integrals. A particular popular method for discretization is sample average approximation (SAA) [24]. We show that our error bounds translate to statistical error bounds if the distribution is discretized using SAA. This result not only holds for the error bounds in our paper, but carries over to many error bounds for convex approximations of risk-neutral MIR models as well (e.g., those in [12, 14,15,16,17, 22]).

Our third contribution is the construction of an algorithm based on our convex approximations that finds a near-optimal solution for risk-averse MIR models. Such an algorithm is warranted, because solving the convex approximation model is not a trivial task due to the complicated way in which it is defined. We formulate an algorithm for our risk-averse setting by adapting the loose benders decomposition algorithm (LBDA) by van der Laan and Romeijnders [17], which was originally defined for a risk-neutral setting. This algorithm uses a Benders’ decomposition approach with so-called “loose” optimality cuts in an SAA framework. In our extension to the risk-averse case we pay particular attention to a linearization of the risk-averse objective function. To guarantee performance of the resulting algorithmic solution, we provide an asymptotic performance guarantee, based on the error bound for our convex approximation.

Our fourth contribution consists of a numerical analysis of the performance of our convex approximation approach. In the experiments, we focus on the relation between the choice of risk measure and the performance of our convex approximations and our algorithm. We show that in some cases, both the approximations themselves and the algorithm perform better than our performance guarantees suggest, in particular as a function of the risk measure \(\rho \). This highlights the fact that our performance guarantees are worst-case error bounds. Moreover, we find evidence that our approach performs particularly well if the risk measure has an associated risk spectrum that is “smooth”.

The remainder of this paper is organized as follows. In Sect. 2, we mathematically define risk-averse MIR models and their convex approximations. In Sect. 3, we derive a corresponding error bound. In Sect. 4, we derive a specialized bound for risk-averse totally unimodular integer models and we analyze an example with simple integer recourse. In Sect. 5, we show that the error bounds from Sects. 3 and 4 constitute statistical error bounds for the corresponding SAA models. In Sect. 6, we extend the LBDA to our risk-averse setting and we derive a performance guarantee on the resulting solution. Section 7 contains our numerical experiments. Finally, in Sect. 8, we conclude and discuss directions for future research.

2 Problem definition

In this section we mathematically define risk-averse MIR models and we construct a convex approximation.

2.1 Risk-averse MIR model

We consider risk-averse MIR models of the form

$$\begin{aligned} \min _{x \in X} \{ c^T x + \underbrace{\rho \big (v(\xi ,x)\big )}_{Q(x)} \}. \end{aligned}$$
(1)

Here, x is the first-stage decision vector with unit cost c,Footnote 1 to be chosen from a feasible set of the form \(X:= \{ x \in \mathbb {R}_+^{n_1} \ \vert \ Ax = b\}\) before learning the realization of the random vector \(\xi \), defined on a probability space \((\Omega , \mathcal {F}, \mathbb {P})\) with support \(\Xi \). The value function \(v(\xi ,x)\), which we define below, represents the second-stage cost as a function of the first-stage decision x and the realization of the random vector \(\xi \). Since the second-stage cost \(v(\xi ,x)\) is a random variable, a risk measure \(\rho \) is employed that maps this random variable to a real number for every \(x \in X\); the resulting function Q is called the recourse function.

Formally, we define a risk measure as a function \(\rho : \mathcal {Z} \rightarrow \mathbb {R}\), on the space \(\mathcal {Z}:= L_1(\Omega ,\mathcal {F},\mathbb {P})\) of random variables Z for which \(\mathbb {E}[\vert Z \vert ] < +\infty \). We make a number of assumptions ensuring that \(\rho \) “reasonably” quantifies risk. Specifically, we assume that the risk measure \(\rho \) is law invariant [23] and coherent [25]. Moreover, in some cases we assume that \(\rho \) is also comonotone additive [23], in which case we refer to \(\rho \) as a spectral risk measure [26].

Definition 1

A risk measure \(\rho : \mathcal {Z} \rightarrow \mathbb {R}\) is said to be

  1. (1)

    Law invariant if \(\rho (Z) = \rho (Z')\) whenever \(Z, Z' \in \mathcal {Z}\) have the same cumulative distribution function;

  2. (2)

    Coherent if it satisfies the following axioms:

    1. (i)

      Convexity: \(\rho (tZ + (1-t)Z') \le t\rho (Z) + (1 - t)\rho (Z')\), for all \(Z,Z' \in \mathcal {Z}\),

    2. (ii)

      Monotonicity: \(\rho (Z) \ge \rho (Z')\), for all \(Z,Z' \in \mathcal {Z}\) satisfying \(Z \ge Z'\) a.s.,

    3. (iii)

      Translation equivariance: \(\rho (Z + a) = \rho (Z) + a\), for every \(Z \in \mathcal {Z}\) and \(a \in \mathbb {R}\),

    4. (iv)

      Positive homogeneity: \(\rho (tZ) = t \rho (Z)\), for all \(Z \in \mathcal {Z}\) and \(t > 0\);

  3. (3)

    Comonotone additive if \(\rho (Z + Z') = \rho (Z) + \rho (Z')\) whenever \(Z,Z' \in \mathcal {Z}\) are comonotone random variables, i.e., whenever \((Z,Z')\) has the same distribution as \((F_{Z}^{-1}(U), F_{Z'}^{-1}(U))\), where \(F_Z\) and \(F_{Z'}\) are the cumulative distribution functions of Z and \(Z'\), respectively, and U is a uniformly distributed random variable on (0, 1);

  4. (4)

    Spectral if it is law invariant, coherent, and comonotone additive.

The second-stage value function v in (1) is defined by

$$\begin{aligned} v(\xi , x) := \min _{y \in Y} \big \{ q(\xi )^Ty \ \vert \ T(\xi )x + Wy = h(\xi ) \big \}, \quad \xi \in \Xi , \ x \in X. \end{aligned}$$
(2)

Here, y is a decision vector representing the second-stage recourse actions, to be decided after deciding on the first-stage decision vector x and observing the realization of the random vector \(\xi \) with range \(\Xi \). The set \(Y:= \mathbb {Z}_+^{n_2} \times \mathbb {R}_+^{\bar{n}_2}\) models integer restrictions on some of the elements of y, hence the term mixed-integer recourse. The recourse actions come at a unit cost of \(q(\xi )\) and need to satisfy the constraint \(T(\xi )x + Wy = h(\xi )\). Note that q, T, and h are allowed to depend on \(\xi \). In the remainder of the paper, we write \(\xi = (q, T, h)\) with range \(\Xi = \Xi ^q \times \Xi ^T \times \Xi ^h\), and we drop the explicit dependence of \(q(\xi )\), \(T(\xi )\), and \(h(\xi )\) on \(\xi \), i.e., we simply write q, T, and h.

Throughout this paper we make the following assumptions.

Assumption 1

We assume that

  1. (a)

    the recourse is relatively complete and sufficiently expensive, i.e., \(-\infty< v(\xi ,x) < \infty \), for all \(\xi \in \Xi \) and \(x \in \mathbb {R}^{n_1}\),

  2. (b)

    the expectations of the \(\ell _1\) norm of h, q, and T are finite, i.e., \(\mathbb {E}^{\mathbb {P}}\big [ \Vert h \Vert _1 \big ] < \infty \), \(\mathbb {E}^{\mathbb {P}}\big [ \Vert q \Vert _1 \big ] < \infty \), and \(\mathbb {E}^{\mathbb {P}}\big [ \Vert T \Vert _1 \big ] < \infty \),

  3. (c)

    the recourse matrix W is integer,

  4. (d)

    h is continuously distributed with joint pdf f,

  5. (e)

    (qT) and h are pairwise independent.

Assumptions 1(a)–(b) ensure that the recourse function Q(x) is finite for every \(x \in \mathbb {R}^{n_1}\). Assumption 1(c) is required for the derivation of our error bounds in Sect. 3 and 4. However, this assumption is not very restrictive, since any rational matrix can be transformed into an integer matrix by appropriate scaling. Assumption 1(d) restricts the right-hand side vector h to be continuously distributed. This is essential for the derivation of error bounds, which depend on the total variations of the pdf f of h (although recent numerical and theoretical results in [17, 27] suggest that the interpretation of total variation error bounds extends to settings where h is discrete, too). Finally, Assumption 1(e) is for ease of presentation; similar results can be derived under a less restrictive assumption, resulting in more elaborate expressions for he error bounds in terms of conditional density functions of h, given q and T.

2.2 Convex approximation

In this subsection we define our convex approximation of the risk-averse MIR model from Sect. 2.1. In particular, we construct a so-called \(\alpha \)-approximation \(\tilde{Q}_\alpha \) of the recourse function Q. This approximation is based on a corresponding \(\alpha \)-approximation \(\tilde{v}_\alpha \) of the value function v from van der Laan and Romeijnders [17].

As a starting point, let \(q \in \Xi ^q\) be fixed and consider the dual representation of the LP-relaxation \(v_{\text {LP}}\) of the value function v:

$$\begin{aligned} v_{\text {LP}}(\xi ,x)&= \max _{\lambda \in \mathbb {R}^m}\{ \lambda ^T(h - Tx) \ \vert \ \lambda ^T W \le q \} = \max _{k \in K^q} (\lambda ^k)^T (h - Tx), \end{aligned}$$

where \(\lambda ^k:= q_{B^k}(B^k)^{-1}\), is the vertex of the dual feasible region corresponding to the dual feasible basis matrix \(B^k\), \(k \in K^q\). Identifying the optimal vertex for each value of \(h - Tx\) yields the following result from the literature.

Lemma 1

Consider the LP-relaxation \(v_\text {LP}\) of the value function v and let \(q \in \Xi ^q\) be fixed. Then, there exist dual feasible basis matrices \(B^k\), with associated cones \(\Lambda ^k:= \{ s \in \mathbb {R}^m \ \vert \ (B^k)^{-1} s \ge 0 \}\), \(k \in K^q\), with \(\cup _{k \in K^q} \Lambda ^k = \mathbb {R}^m\), and whose interiors are pairwise disjoint, such that \(v_{\text {LP}}(\xi ,x) = (\lambda ^k)^T (h - Tx)\) if \(h - Tx \in \Lambda ^k\).

Proof

See Theorem 2.9 in [14]. \(\square \)

Hence, for every fixed \(q \in \Xi ^q\), we can partition \(\mathbb {R}^m\) into the cones \(\Lambda ^k\), \(k \in K^q\), each associated with a single dual optimal basis matrix \(B^k\) and vertex \(\lambda ^k\). Romeijnders et al. [14] prove that a similar partition also exists for mixed-integer linear programs. In particular, they show that there exists \(\sigma ^k \in \Lambda ^k\) such that for all \(h-Tx \in \sigma ^k + \Lambda ^k\),

$$\begin{aligned} v(\xi ,x) = v_{LP}(\xi ,x) + \psi ^k(\xi ,x) \end{aligned}$$

where \(\psi ^k, k \in K^q\), represents the “penalty” incurred as a result of the integer restrictions on y. Specifically, \(\psi ^k\) is defined as

$$\begin{aligned} \psi ^k(\xi ,x)&= \min _{\begin{array}{c} y_{B^k} \\ y_{N^k} \end{array}} \{q_{B^k}^T y_{B^k} + q_{N^k}^Ty_{N_k}: B^ky_{B^k} + N^ky_{N^k} = h-Tx, (y_{B^k},y_{N^k}) \in \hat{Y}\} \\&\quad - q_{B^k}^T(B^k)^{-1}(h-Tx), \end{aligned}$$

where \(\hat{Y}\) imposes the same integrality and non-negativity restrictions on the variables but with the non-negativity constraints of the basic variables \(y_{B^k}\) relaxed. Romeijnders et al. [14] prove that for q fixed, \(\psi ^k\) is a “\(B^k\)-periodic” function (see Definition 2.2 in [14]).

Lemma 2

Consider the setting of Lemma 1 and let \(q \in \Xi ^q\) be fixed. Then, there exist vectors \(\sigma ^k \in \Lambda ^k\), \(k \in K^q\), such that

$$\begin{aligned} v(\xi ,x) = (\lambda ^k)^T(h - Tx) + \psi ^k(h - Tx), \quad \text {if } h - Tx \in \sigma ^k + \Lambda ^k, \end{aligned}$$

where \(\psi ^k\) is a \(B^k\)-periodic function.

Proof

See Theorem 2.9 in [14]. \(\square \)

From this lemma it is clear that for \(h - Tx \in \sigma ^k + \Lambda ^k\), the non-convexity of \(v(\xi ,x)\) as a function of x is a result of the periodicity of \(\psi ^k\). To eliminate this non-convexity, van der Laan and Romeijnders [17] propose to replace \(\psi ^k(h - Tx)\) by \(\psi ^k(h - \alpha )\), for some constant vector \(\alpha \in \mathbb {R}^m\). Taking the maximum over \(k \in K^q\), and performing this procedure for every \(q \in \Xi ^q\) then yields the \(\alpha \)-approximation of v.

Definition 2

Consider the value function v and let \(\lambda ^k\), and \(\psi ^k\), \(k \in K^q\), denote the dual vertices and \(B^k\)-periodic functions from Lemma 2. Then, for any \(\alpha \in \mathbb {R}^m\), we define the \(\alpha \)-approximation of v as

$$\begin{aligned} \tilde{v}_\alpha (\xi ,x) := \max _{k \in K^q} \{ (\lambda ^k)^T(h - Tx) + \psi ^k(h - \alpha ) \}, \qquad \xi \in \Xi , \ x \in \mathbb {R}^{n_1}. \end{aligned}$$

It is not hard to see that \(\tilde{v}_\alpha (\xi ,x)\) is indeed convex in x, as it is the pointwise maximum of a finite number of affine functions. Applying the risk measure \(\rho \) to \(\tilde{v}_\alpha (\xi ,x)\) yields the \(\alpha \)-approximation of the recourse function Q.

Definition 3

Consider the recourse function Q from (1). Then, its \(\alpha \)-approximation is defined by

$$\begin{aligned} \tilde{Q}_\alpha (x) := \rho \big (\tilde{v}_\alpha (\xi ,x)\big ), \quad x \in \mathbb {R}^{n_1}. \end{aligned}$$

Since \(\rho \) is a coherent risk measure, it is convexity-preserving (in particular, this is a consequence of the axioms monotonicity and convexity; see [25]). Hence, the approximating recourse function \(\tilde{Q}_\alpha \) is indeed a convex function.

3 Asymptotic error bound

In this section we derive a performance guarantee for our \(\alpha \)-approximation model. We do this by deriving an asymptotic error bound for the \(\alpha \)-approximation \(\tilde{Q}_\alpha \) from Definition 3. That is, we derive an upper bound on the maximum approximation error

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty := \sup _{x \in \mathbb {R}^{n_1}} \vert Q(x) - \tilde{Q}_\alpha (x) \vert . \end{aligned}$$

Similar as in the literature on risk-neutral models, we aim for an asymptotic error bound, in the sense that the bound converges to zero as the total variations of the conditional density functions \(f_i(\cdot \vert h_{-i})\), \(i=1,\ldots ,m\), of the right-hand-side vector h go to zero.

Definition 4

Let \(f: D \rightarrow \mathbb {R}\) be a real-valued function on an interval \(D \subseteq \mathbb {R}\) and let \(I \subseteq D\) be a given subinterval. Let \(\Pi (I)\) denote the set of all finite ordered sets \(P = \{z_1, \ldots , z_{N+1} \}\) with \(z_1< \cdots < z_{N+1}\) in I. Then, the total variation, total increase, and total decrease of f on I, denoted by \(\vert \Delta \vert f(I)\), \(\Delta ^+ f(I)\), and \(\Delta ^- f(I)\), respectively, are defined by

$$\begin{aligned} \vert \Delta \vert f(I) := \sup _{P \in \Pi (I)} V_f(P), \quad \Delta ^+ f(I) := \sup _{P \in \Pi (I)} V^+_f(P), \quad \Delta ^- f(I) := \sup _{P \in \Pi (I)} V^-_f(P), \end{aligned}$$

where \(V_f(P):= \sum _{i = 1}^{N} \vert f(z_{i+1}) - f(z_i) \vert \), \(V^+_f(P):= \sum _{i = 1}^{N} (f(z_{i+1}) - f(z_i) )^+\), and \(V^-_f(P):= \sum _{i = 1}^{N} ( f(z_{i+1}) - f(z_i) )^-\). We use short-hand notation \(\vert \Delta \vert f:= \vert \Delta \vert f(D)\) and \(\vert \Delta \vert f(\cup _{i=1,\ldots ,n} I_i):= \sum _{i=1}^n \vert \Delta \vert f(I_i)\), if \(I_1, \ldots , I_n\) are disjoint intervals in D, with straightforward analogues for \(\Delta ^+\) and \(\Delta ^-\). Finally, we say that f is of bounded variation if \(\vert \Delta \vert f < +\infty \).

Observe that for one-dimensional unimodal density functions f, the total variation \(\vert \Delta \vert f\) is equal to two times the value at the mode. For example, for a normal distribution with arbitrary mean \(\mu \) and standard deviation \(\sigma \), we have \(\vert \Delta \vert f = \sigma ^{-1} \sqrt{2/\pi }\). We restrict ourselves to joint density functions f for which the corresponding conditional density functions are of bounded variation.

Definition 5

We denote by \(\mathcal {H}^m\) the set of all m-dimensional joint pdfs f whose conditional density functions \(f_i(\cdot \vert t_{-i})\) are of bounded variation for all \(t_{-i} \in \mathbb {R}^{m-1}\), \(i=1,\ldots ,m\).

To derive an asymptotic error bound, we take the error bound from [22] for the special case \(\rho = {{\,\textrm{CVaR}\,}}_\beta \) as a starting point. We generalize this bound to our risk-averse MIR setting, where \(\rho \) can be any law invariant, coherent risk measure. For this purpose, we use the so-called Kusuoka representation [28] of law invariant coherent risk measures.

Lemma 3

(Kusuoka representation) Let \(\rho \) be a law invariant coherent risk measure. Then, there exists a set \(\mathcal {M}\) of probability measures on [0, 1] such that

$$\begin{aligned} \rho (Z) = \sup _{\mu \in \mathcal {M}} \int _0^1 {{\,\textrm{CVaR}\,}}_\beta (Z) d\mu (\beta ), \quad Z \in \mathcal {Z}, \end{aligned}$$

where \({{\,\textrm{CVaR}\,}}_\beta \) represents the conditional value-at-risk with parameter \(\beta \in [0,1)\), defined as

$$\begin{aligned} {{\,\textrm{CVaR}\,}}_\beta (Z) = \min _{\zeta \in \mathbb {R}} \big \{ \zeta + \frac{1}{1 - \beta } \mathbb {E}\big [ (Z - \zeta )^+ \big ] \big \}, \quad Z \in \mathcal {Z}. \end{aligned}$$

If, in addition, \(\rho \) is comonotone additive (i.e., if \(\rho \) is a spectral risk measure), then \(\mathcal {M}\) reduces to a singleton.

Proof

See, e.g., Shapiro [23]. (The original result by Kusuoka [28] holds in a more restrictive setting.) \(\square \)

Lemma 3 shows that every law invariant coherent risk measure can be represented as a supremum over convex combinations of CVaRs. Combining this fact with the error bound for CVaR from [22], we can derive our error bound.

Theorem 1

Consider the recourse Q from (1) and its \(\alpha \)-approximation \(\tilde{Q}_\alpha \) from Definition 3 in a setting with an underlying law invariant, coherent risk measure \(\rho \). Let \(\mathcal {M}\) be the set of probability measures on [0, 1) from the Kusuoka representation of \(\rho \). For every \(\mu \in \mathcal {M}\), define the function \(\phi _\mu : [0,1] \rightarrow \mathbb {R}\) as

$$\begin{aligned} \phi _\mu (p) := \int _{[0,p]} (1 - \beta )^{-1} d\mu (\beta ), \quad p \in [0,1]. \end{aligned}$$
(3)

Then, there exists a constant \(C > 0\) such that for every \(f \in \mathcal {H}^m\), we have

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty \le \sup _{\mu \in \mathcal {M}} \phi _\mu (1) \cdot C \cdot \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ]. \end{aligned}$$

Proof

Let \(x \in \mathbb {R}^{n_1}\) be given. Then, applying Lemma 3 to both Q(x) and \(\tilde{Q}_\alpha (x)\), we have

$$\begin{aligned} \vert Q(x) - \tilde{Q}_\alpha (x) \vert&= \Big \vert \sup _{\mu \in \mathcal {M}} \Big \{ \int _0^1 {{\,\textrm{CVaR}\,}}_\beta (v(\xi , x)) d\mu (\beta ) \Big \} \nonumber \\&\quad - \sup _{\mu \in \mathcal {M}} \Big \{ \int _0^1 {{\,\textrm{CVaR}\,}}_\beta (\tilde{v}_\alpha (\xi , x)) d\mu (\beta ) \Big \} \Big \vert \nonumber \\&\le \sup _{\mu \in \mathcal {M}} \Big \{ \int _0^1 \big \vert {{\,\textrm{CVaR}\,}}_\beta (v(\xi , x)) - {{\,\textrm{CVaR}\,}}_\beta (\tilde{v}_\alpha (\xi , x)) \big \vert d\mu (\beta ) \Big \}. \end{aligned}$$
(4)

By Theorem 1 in [22], there exists a constant \(C>0\), not depending on x, such that

$$\begin{aligned} \big \vert {{\,\textrm{CVaR}\,}}_\beta (v(\xi , x)) - {{\,\textrm{CVaR}\,}}_\beta (\tilde{v}_\alpha (\xi , x)) \big \vert&\le (1-\beta )^{-1} C \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ]. \end{aligned}$$

Here, C is finite by Theorem 2.1 in [29]. Substituting this inequality into the integral in (4), we obtain

$$\begin{aligned} \vert Q(x) - \tilde{Q}_\alpha (x) \vert&\le \sup _{\mu \in \mathcal {M}} \Big \{ \int _0^1 \Big ( (1-\beta )^{-1} C \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] \Big ) d\mu (\beta ) \Big \} \\&= \sup _{\mu \in \mathcal {M}} \Big \{ \int _0^1 (1-\beta )^{-1} d\mu (\beta ) \Big \} \cdot C \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] \\&= \sup _{\mu \in \mathcal {M}} \phi _\mu (1) \cdot C \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ], \end{aligned}$$

where the last step follows from the definition of the risk spectrum \(\phi _\mu \) in (3). The result now follows from the observation that the upper bound above does not depend on the value of x. \(\square \)

Similar to the error bounds from the literature, the error bound from Theorem 1 depends on the total variations of the one-dimensional conditional density functions of the right-hand side vector h.Footnote 2 As these total variations go to zero, the error bound converges to zero. Intuitively, this can be interpreted as follows. Note that a density function with a small total variation is relatively “flat”. As the total variation decreases further, the distribution becomes ever flatter. This means that the distribution of h becomes more highly dispersed. So on an intuitive level, the error bound above shows that, if the distribution of h is unimodal, then our convex approximations are good if the distribution of h is relatively highly dispersed. A final interesting interpretation of Theorem 1 is that Q is “almost convex” (i.e., close to a convex function) if the distribution of h is highly dispersed (and unimodal).

Comparing the error bound from Theorem 1 with the error bound for the risk-neutral case in [22], we see that we have an additional factor \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\). This expression depends on the specific risk measure \(\rho \). To interpret this expression, consider the spectral case, where \(\mathcal {M}\) is a singleton \(\{ \mu \}\). Then, the function \(\phi _\mu \) from Theorem 1 is called the risk spectrum [26] associated with \(\rho \) and \(\rho (Z)\), \(Z \in \mathcal {Z}\), can be represented as an expected value under a distorted distribution, where the risk spectrum \(\phi _\mu (p)\) represents the distortion (i.e., the factor by which probabilities are multiplied) at the p-quantile of the distribution of Z. Since \(\phi _\mu \) is non-decreasing, \(\phi _\mu (1)\) represents the maximum distortion of probabilities associated with the risk measure \(\rho _\mu \). It follows that for general law invariant coherent risk measures \(\rho \), the value \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) is the maximum possible distortion associated with the risk measure \(\rho \).

Since for every \(\mu \in \mathcal {M}\), \(\phi _\mu \) is a non-negative non-decreasing function on [0, 1] that integrates to one, it follows that \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1) \ge 1\). Hence, our error bound is at least as large as its risk-neutral analogue, with the minimum value attained for the expected value: \(\rho = \mathbb {E}\). As long as \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) is finite, our error bound has the asymptotic interpretation described above: the approximation error goes to zero as the distribution of h becomes more dispersed. For instance, this will hold if \(\mathcal {M}\) consists of a single element \(\mu \), (i.e., if \(\rho \) is a spectral risk measure), and \(\phi _\mu (1)\) is finite. Note that \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) can be finite even if \(\mathcal {M}\) contains uncountably many elements, as we illustrate in the following example.

Example 1

Consider the law invariant risk measure mean-upper semideviation (mean-USD) with parameter \(c \ge 0\), defined as

$$\begin{aligned} \rho (Z) := \mathbb {E}[Z] + c \cdot \mathbb {E}\Big [ \big ( Z - \mathbb {E}[Z] \big )^{+} \Big ], \quad Z \in \mathcal {Z}. \end{aligned}$$

By Example 2.46 in [30], this is a coherent risk measure for \(0 \le c \le 1\), and the set \(\mathcal {M}\) of probability measures on [0, 1) from its Kusuoka representation is given by \(\mathcal {M} = \{ \mu \in \mathcal {P}([0,1)) \ \vert \ \phi _\mu (1) \le 1 + c\}\). It follows that \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1) = 1 + c\).

Finally, we provide an example showing that the error bound from Theorem 1 can be tight.

Example 2

Consider a so-called simple integer recourse setting. That is, consider the model from (1), with the assumptions that \(n_1 = m = 1\), \(q = (1, 0)\) and \(T = [1]\) are fixed, \(Y = \mathbb {Z}_+ \times \mathbb {R}_+\), and \(W = [1 \ -1]\). In other words, we assume that the value function can be written as

$$\begin{aligned} v(h,x) = \min _{y_1 \in \mathbb {Z}_+} \big \{ y_1 \ \vert \ y_1 \ge h - x \big \} = \lceil h - x\rceil ^+, \quad h \in \mathbb {R}^m, \ x \in \mathbb {R}, \end{aligned}$$

where \(\lceil s\rceil ^+:= \max \{ \lceil s\rceil , 0\}\), \(s \in \mathbb {R}\). Moreover, assume that \(\rho = \text {CVaR}_\beta \) with \(\beta = 3/4\), and let the density function f of h be given by \(f(t) = 1/4\) if \(0 \le t < 3\), \(f(t) = 1/2\) if \(3 \le t < 3.5\), and \(f(t) = 0\) otherwise. We show that the error bound from Theorem 1 is tight.

Leveraging a result in Romeijnders et al. [16], the relevant constant is \(C = 1/8\). Moreover, it is not hard to show that \(\sup _{\mu \in \mathcal {M}} \phi _{\mu }(1) = (1 - \beta )^{-1} = 4\) and that \(|\Delta |f = 1\). Thus, the error bound reduces to \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \le 4 \cdot \frac{1}{8} \cdot 1 = 1/2\). Next, we show that there exist \(x \in \mathbb {R}\) and \(\alpha \in [0,1)\) such that \(|Q(x) - \tilde{Q}_\alpha (x) |\) attains the error bound. Since v and \(\tilde{v}_\alpha \) are monotonely non-decreasing in h, we have that \(Q(x) = \int _{-\infty }^\infty v(t,x) g(t) dt\) and \(\tilde{Q}_\alpha (x) = \tilde{v}_\alpha (t,x) g(t) dt\), where the pdf g represents the distribution of h, restricted to the fraction \(1 - \beta = 1/4\) of its highest outcomes. That is, \(g(t) = 2\) if \(3 \le t < 3.5\), and \(g(t) = 0\) otherwise. For \(x = 0\) and \(\alpha = 0.5\) we have \(|Q(x) - \tilde{Q}_\alpha (x) |= \left|\int _{-\infty }^\infty \left( v(t,x) - \tilde{v}_\alpha (t,x) \right) g(t) \right|= \left|\int _{3}^{3.5} \left( 4 - 3.5 \right) \cdot 2 \right|= 1/2\). This proves that the error bound is tight.

We conclude this section with an analogue of Theorem 1 for the law invariant, non-coherent risk measure expected excess (EE). This risk measure represents the expected amount by which a random variable exceeds a predefined threshold \(\tau \). Even though EE is not coherent, it has relevant practical applications in settings where we are interested in the expected amount by which a budget \(\tau \) is exceeded. The result below illustrates that our approach is not necessarily limited to coherent risk measures. Moreover, we will use it in our proof of Theorem 3 in Sect. 5.

Definition 6

Let \(Z \in \mathcal {Z}\) be a random variable. Then, its expected excess (EE) with parameter \(\tau \in \mathbb {R}\) is given by

$$\begin{aligned} {{\,\textrm{EE}\,}}_\tau (Z) := \mathbb {E}\big [ (Z - \tau )^+ \big ]. \end{aligned}$$

Proposition 1

Consider the recourse function Q from (1) and its \(\alpha \)-approximation \(\tilde{Q}_\alpha \) from Definition 3. Assume that the underlying risk measure satisfies \(\rho = {{\,\textrm{EE}\,}}_\tau \) for some \(\tau \in \mathbb {R}\). Then, there exists a constant \(C > 0\) such that for every \(f \in \mathcal {H}^m\), we have

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty \le C \cdot \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ]. \end{aligned}$$

Proof

Let \(x \in \mathbb {R}^{n_1}\) be given. Then,

$$\begin{aligned} \big \vert Q(x) - \tilde{Q}_\alpha (x) \big \vert&= \big \vert \mathbb {E}\big [ (v(\xi ,x) - \tau )^+ \big ] - \mathbb {E}\big [ (\tilde{v}_\alpha (\xi ,x) - \tau )^+ \big ] \big \vert , \end{aligned}$$

which equals \(\big \vert \mathbb {E}\big [ v^\tau (\xi ,x) \big ] - \mathbb {E}\big [ \tilde{v}_\alpha ^\tau (\xi ,x) \big ] \big \vert = \vert R^*(x,\tau ) - \tilde{Q}^*_\alpha (x,\tau )\vert \) in the notation of [22]. By the second part of the proof of Theorem 1 in that paper, it follows that there exists a constant \(C>0\), not depending on \(\tau \), such that for every \(f \in \mathcal {H}^m\),

$$\begin{aligned} \big \vert Q(x) - \tilde{Q}_\alpha (x) \big \vert&\le C \cdot \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ]. \end{aligned}$$

The result follows from the observation that the right-hand side here does not depend on the value of x or \(\tau \). \(\square \)

4 Non-asymptotic error bounds for special cases

Theorem 1 provides a worst-case error bound that holds for the general risk-averse MIR model from (1). A downside of the generality of this result is that it may be very conservative in specific settings: in particular, the constant C in Theorem 1 might be larger than necessary. In Sect. 4.1 we tackle this issue for the special case of totally unimodular integer recourse by deriving a specialized, tighter error bound. This bound also turns out to behave more favorably as a function of \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) than the bound from Theorem 1. Inspired by this observation, in Sect. 4.2 we provide a simple integer recourse example for which we derive an error bound that does not depend on the choice of risk measure \(\rho \) at all. Both results show that our convex approximation approach might perform better in practice than the asymptotic error bound from Theorem 1 suggests.

4.1 Totally unimodular integer recourse

We consider a risk-averse totally unimodular (TU) integer recourse model. That is, we consider the model (1) and we assume that the second-stage problem can be written as

$$\begin{aligned} v(\xi ,x) = \min _{y \in Y} \{ q^T y \ \vert \ W y + Tx \ge h \}, \end{aligned}$$
(5)

where W is a totally unimodular matrix and where all second-stage variables are restricted to integers, i.e., \(Y = \mathbb {Z}_+^{n_2}\). We derive a non-asymptotic error bound for this setting that behaves more favorably as a function of \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) than the error bound from Theorem 1.

Our analysis consists of three steps. First, we use the Kusuoka representation of \(\rho \) and the distortion interpretation of spectral risk measures to write \(\rho \) as a supremum over expected values under distorted probability distributions. Second, we apply an error bound from [22] for risk-neutral TU integer recourse models on this expected value expression. The resulting error bound is written in terms of the distorted probability distribution. Finally, we rewrite the result in terms of the original distribution. For this purpose, we use the following steps. First, we observe that in this TU integer setting, the value function \(v(\xi ,x)\) is non-decreasing as a function of \(h_i\). Second, we show that as a result, the distortion factor in the pdf of \(h_i\) must be non-decreasing as well. Finally, we use the lemma below to find an upper bound on the total variation of the distorted pdf in terms of the total variation of the original pdf f.

Lemma 4

Let \(f: \mathbb {R}\rightarrow \mathbb {R}\) be a pdf of bounded variation and let \(\zeta : \mathbb {R}\rightarrow \mathbb {R}\) be a non-negative non-decreasing bounded function satisfying \(\zeta (t) \le \zeta ^*\), for all \(t \in \mathbb {R}\). Define \(g:\mathbb {R}\rightarrow \mathbb {R}\) by \(g(t) = \lambda (t) f(t)\), \(t \in \mathbb {R}\). Then,

$$\begin{aligned} \vert \Delta \vert g \le \zeta ^* \vert \Delta \vert f. \end{aligned}$$

Proof

Follows directly from the proof of Proposition 3 in [16]. \(\square \)

Theorem 2

Consider the risk-averse TU integer recourse function Q and its convex approximation \(\tilde{Q}_\alpha \). Assume that the underlying risk measure \(\rho \) is law invariant and coherent and let \(\mathcal {M}\) be the set of probability measures on [0, 1) from its Kusuoka representation. Then, for every \(f \in \mathcal {H}^m\), we have

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty \le \sum _{i=1}^{m} \bar{\lambda }_i^* H\bigg ( \sup _{\mu \in \mathcal {M}} \big \{ \phi _\mu (1) \big \} \mathbb {E}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] \bigg ), \end{aligned}$$

where for every \(i=1,\ldots ,m\), we have \(\bar{\lambda }^*_i:= \mathbb {E}^{q}\big [ \max _{k \in K^q} \{ \lambda ^k_i \} \big ]\), where \(\lambda ^k\), \(k \in K^q\), \(q \in \Xi ^q\), are the dual vertices from Lemma 1; and where the function \(H: \mathbb {R}_+ \rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} H(t) = {\left\{ \begin{array}{ll} t/8, &{}\text {if } 0 \le t \le 4, \\ 1 - t/2, &{}\text {if } t > 4. \end{array}\right. } \end{aligned}$$
(6)

Proof

Let \(x \in \mathbb {R}^{n_1}\) be given. By the Kusuoka representation of \( \rho \) we have \(Q(x) = \sup _{\mu \in \mathcal {M}} \rho _\mu (v(\xi ,x))\), where \(\rho _\mu \) denotes the spectral risk measure corresponding to \(\mu \in \mathcal {M}\). Let \(\mu ^x \in \mathcal {M}\) be optimal in the supremum above (if the maximum is not attained, we can instead consider a sequence \(\{\mu _n\}_{n=1}^\infty \) converging to the optimum). We know that \(\rho _{\mu ^x}(v(\xi ,x))\) can be represented as an expected value under a distorted probability distribution \(\mathbb {P}^x\), where \(\phi _{\mu ^x}(p)\) reflects the distortion of the probabilities at the p-quantile of the distribution of \(v(\xi ,x)\). Recall that \(\phi _\mu \) is non-decreasing. Let \(f^x\) denote the joint pdf of h under \(\mathbb {P}^x\). Let \(i = 1,\ldots ,m\), be given and let \(f^x_i(\cdot ;t_{-i})\) denote \(f^x(t)\) as a function of \(t_i\). Due to the TU structure of the recourse matrix W, we know that \(v(\xi ,x)\) is non-decreasing as a function of \(h_i\). Hence, we can assume without loss of generality that there exists a non-negative, non-decreasing function \(\zeta ^x_i: \mathbb {R}\rightarrow \mathbb {R}\), satisfying \(\zeta ^x_i(t_i) \le \phi _{\mu ^x}(1)\) for all \(t_i \in \mathbb {R}\), such that \(f^x_i(t_i;t_{-i}) = \zeta ^x_i(t_i) f_i(t_i;t_{-i})\), \(t_i \in \mathbb {R}\).

Now, we can write \(Q(x) = \rho _{\mu ^x}(v(\xi ,x)) = \mathbb {E}^{\mathbb {P}^x}\big [ v(\xi ,x) \big ]\) and by feasibility of \(\mu _x\) and \(\mathbb {P}^x\), \(\tilde{Q}_\alpha (x) \le \rho _{\mu ^x}(\tilde{v}_\alpha (\xi ,x)) \le \mathbb {E}^{\mathbb {P}^x}\big [ \tilde{v}_\alpha (\xi ,x) \big ]\). It follows that

$$\begin{aligned} Q(x) - \tilde{Q}_\alpha (x)&\le \mathbb {E}^{\mathbb {P}^x}\big [ v(\xi ,x) \big ] - \mathbb {E}^{\mathbb {P}^x}\big [ \tilde{v}_\alpha (\xi ,x) \big ] \nonumber \\&\le \sum _{i=1}^m \bar{\lambda }^*_i H\Big ( \mathbb {E}^{\mathbb {P}^x}\big [ \vert \Delta \vert f^x_i(\cdot \vert h_{-i})\big ] \Big ), \end{aligned}$$
(7)

where the second inequality follows from applying the error bound for risk-neutral TU integer recourse models from Proposition 4 in [22].

It remains to write the expected value in the error bound above in terms of the original distribution \(\mathbb {P}\) and pdf f. Define \(\mathcal {T}_{-i}:= \{ t_{-i} \in \mathbb {R}^{m-1} \ \vert \ f_{-i}(t_{-i}) > 0 \}\) as the set on which \(f_{-i}\) is positive. Similarly, define \(\mathcal {T}^x_{-i}:= \{ t_{-i} \in \mathbb {R}^{m-1} \ \vert \ f^x_{-i}(t_{-i}) > 0 \}\) and observe that \(\mathcal {T}^x_{-i} \subseteq \mathcal {T}_{-i}\). Using the definition of the conditional density function \(f_i(\cdot \vert h_{-i})\) we have \(\mathbb {E}^{\mathbb {P}}\big [\vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] = \int _{\mathcal {T}_{-i}} \vert \Delta \vert \Big (\frac{f_i(\cdot ;t_{-i})}{f_{-i}(t_{-i})}\Big ) f_{-i}(t_{-i}) dt_{-i} = \int _{\mathcal {T}_{-i}} \vert \Delta \vert f_i(\cdot ;t_{-i}) dt_{-i}\). Similarly, we find \(\mathbb {E}^{\mathbb {P}^x}\big [\vert \Delta \vert f^x_i(\cdot \vert h_{-i}) \big ] = \int _{\mathcal {T}^x_{-i}} \vert \Delta \vert f^x_i(\cdot ;t_{-i}) dt_{-i}\). Using these equalities and the fact that \(\vert \Delta \vert f^x_i(\cdot ;t_{-i}) \le \phi _{\mu ^x}(1) \vert \Delta \vert f_i(\cdot ;t_{-i})\) by Lemma 4, we obtain

$$\begin{aligned} \mathbb {E}^{\mathbb {P}^x}\big [\vert \Delta \vert f^x_i(\cdot \vert h_{-i}) \big ]&= \int _{\mathcal {T}^x_{-i}} \vert \Delta \vert f^x_i(\cdot ;t_{-i}) dt_{-i} \le \int _{\mathcal {T}_{-i}} \vert \Delta \vert f^x_i(\cdot ;t_{-i}) dt_{-i} \\&\le \int _{\mathcal {T}_{-i}} \phi _{\mu ^x}(1) \vert \Delta \vert f_i(\cdot ;t_{-i}) dt_{-i} \le \sup _{\mu \in \mathcal {M}} \phi _\mu (1) \mathbb {E}^{\mathbb {P}}\big [\vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ]. \end{aligned}$$

Substituting this inequality into (7) and using an analogous analysis for an upper bound on the reverse difference \(\tilde{Q}_\alpha (x) - Q(x)\) yields the result. \(\square \)

Comparing the TU integer error bound from Theorem 2 with the general error bound from Theorem 1, we see two differences. First, instead of merely proving the existence of a constant C, we derive the constants \(\bar{\lambda }^*\), \(i=1,\ldots ,m\). Hence, the error bound from Theorem 2 is not merely asymptotic in nature, but can be used to derive an actual performance guarantee in finite cases. Second, the error bound from Theorem 2 depends on the risk measure \(\rho \) in a more generous way, as the factor \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) enters inside the function H, which is bounded from above by one. Hence, even if \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) is very large, the error bound remains relatively small. This provides evidence for our conjecture that the dependence of the error bound from Theorem 1 on the risk measure \(\rho \) might be (much) too conservative in some cases.

4.2 Simple integer recourse with an exponential distribution

We now consider the special case of risk-averse one-dimensional simple integer recourse (SIR) models [31] with an exponentially distributed right-hand side variable h. We use this very special case as an example for which we can derive an error bound that does not depend on the risk measure \(\rho \) at all. This provides further evidence that in some settings the approximation error \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) may depend on the risk measure \(\rho \) more favorably than the error bound from Theorem 1 suggests.

Specifically, we consider the model from (1), with the assumptions that \(n_1 = m = 1\); \(q = (1, 0)\) and \(T = [1]\) are fixed; \(Y = \mathbb {Z}_+ \times \mathbb {R}_+\); and \(W = [1 \ -1]\). In other words, we assume that the value function can be written as

$$\begin{aligned} v(h,x) = \min _{y_1 \in \mathbb {Z}_+} \big \{ y_1 \ \vert \ y_1 \ge h - x \big \} = \lceil h - x\rceil ^+, \quad h \in \mathbb {R}^m, \ x \in \mathbb {R}, \end{aligned}$$

where \(\lceil s\rceil ^+:= \max \{ \lceil s\rceil , 0\}\), \(s \in \mathbb {R}\). Moreover, we assume that h follows an exponential distribution with a rate parameter of \(\gamma \). We show that we can derive a uniform error bound that holds for all spectral risk measures \(\rho \) that satisfy some regularity conditions.

Consider the recourse function \(Q(x) = \rho \big ( v(\xi ,x) \big )\). Let \(\phi _\mu \) be the risk spectrum associated with \(\rho \) by its Kusuoka represenatation. Recall that \(\phi _\mu (\beta )\) represents the distortion factor of the \(\beta \)-quantile of the distribution of v(hx). We make the important observation that the value function \(v(h,x) = \lceil h - x\rceil ^+\) is non-decreasing in h. Hence, the \(\beta \)-quantile of the distribution of h corresponds to the \(\beta \)-quantile of the distribution of v(hx). As a result, we can write Q(x) as an expected value under a distorted distribution that can be expressed explicitly. We have \(Q(x) = \mathbb {E}^{f_\mu }\big [v(h,x)\big ]\), \(x \in \mathbb {R}\), where \(f_\mu \) denotes the distorted pdf defined by

$$\begin{aligned} f_\mu (t) = \phi _\mu (F(t))f(t), \quad t \in \mathbb {R}, \end{aligned}$$
(8)

where F and f are the cdf and pdf, respectively, of h. Note that \(f_\mu \) does not depend on the value of x. Similarly, for the \(\alpha \)-approximation we obtain \(\tilde{Q}_\alpha (x) = \mathbb {E}^{f_\mu }\big [ \tilde{Q}_\alpha (\xi ,x) \big ]\).

Using these representations, we can write Q and \(\tilde{Q}_\alpha \) as a recourse function and its \(\alpha \)-approximation, respectively, of a risk-neutral SIR model under the distorted distribution represented by \(f_\mu \). Hence, we can apply an error bound from the literature on convex approximations of risk-neutral MIR models (Theorem 5 in [16]). This yields

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty \le H \big ( \vert \Delta \vert f_\mu \big ), \end{aligned}$$
(9)

where H is the function from (6). Interestingly, under the assumption that h is exponentially distributed, we can derive a uniform upper bound on the total variation \(\vert \Delta \vert f_\mu \), and hence, on \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) that holds for all spectral risk measures that satisfy some regularity conditions.

Proposition 2

Consider the recourse function Q and its \(\alpha \)-approximation in the SIR setting described in Sect. 4.2, where h follows an exponential distribution with rate parameter \(\lambda \). Suppose that \(\rho \) is a spectral risk measure with associated probability measure \(\mu \) on [0, 1) and risk spectrum \(\phi _\mu \), and that it satisfies the following regularity conditions:

  1. (i)

    \(\phi _\mu (1) < + \infty \);

  2. (ii)

    The set M of points on which \(\mu \) has positive probability mass is finite. Moreover, \(\mu \) has a density g on \([0,1)\setminus M\);

  3. (iii)

    The set \(D:= \{ \beta \in (0,1] \ \vert \ (1-\beta )\phi _\mu (\beta ) \text { has a negative derivative at } \beta \}\) can be written as the union of a finite number of intervals.

Then, we have

$$\begin{aligned} \Vert Q - \tilde{Q}_\alpha \Vert _\infty \le H(2 \lambda ), \end{aligned}$$
(10)

where H is the function from (6).

Proof

By (9), it suffices to prove that \(\vert \Delta \vert f_\mu \le 2 \lambda \). Consider the function \(f_\mu \) as defined in (8). Using the change of variables \(\beta = F(t)\), we can express \(f_\mu \) as a function of \(\beta \). We obtain the function \(\bar{f}_\mu : [0,1] \rightarrow \mathbb {R}\), defined by

$$\begin{aligned} \bar{f}_\mu (\beta ) := f_\mu (F^{-1}(\beta )) = \lambda (1-\beta ) \phi _\mu (\beta ), \quad \beta \in [0,1], \end{aligned}$$

where we use that the inverse cdf \(F^{-1}\) of h is given by \(F^{-1}(\beta ) = -\frac{1}{\lambda } \log (1 - \beta )\), \(\beta \in [0,1]\). Since F is a continuous, invertible cdf, and \(F^{-1}(0) = 0\) and \(F^{-1}(1) = +\infty \), it follows that \(\vert \Delta \vert f_\mu = \vert \Delta \vert f_\mu ((-\infty ,0]) + \vert \Delta \vert f_\mu \big ( [0,+\infty ) \big ) = \bar{f}_\mu (0) + \vert \Delta \vert \bar{f}_\mu = \bar{f}_\mu (0) + \Delta ^+ \bar{f}_\mu + \Delta ^- \bar{f}_\mu \), where we use the fact that \(\vert \Delta \vert \bar{f}_\mu = \Delta ^+ \bar{f}_\mu + \Delta ^- \bar{f}_\mu \). By definition of \(\bar{f}_\mu \) and condition (i), we have \(\bar{f}_\mu (0) \ge 0 = \bar{f}_\mu (1)\) and hence, \(\Delta ^- \bar{f}_\mu = \Delta ^+ \bar{f}_\mu + \bar{f}_\mu (0)\). It follows that \(\vert \Delta \vert f_\mu = 2 \Delta ^- \bar{f}_\mu \). Hence, it suffices to find an upper bound on the total decrease \(\Delta ^- \bar{f}_\mu \) only.

We will show that \(\bar{f}_\mu \) is decreasing only on the set D defined in condition (iii), so that we can write \(\Delta ^- \bar{f}_\mu = \Delta ^- \bar{f}_\mu (D)\). To prove this, first note that \(\bar{f}_\mu \) is continuous for all \(\beta \in (0,1] {\setminus } M\), has an upward discontinuity at all \(\beta \in M\), and since \(\bar{f}_\mu \) is non-negative and \(\bar{f}_\mu (0) = 0\), it is non-decreasing at 0, in the sense that \(\bar{f}_\mu (0) \le \lim _{s \downarrow 0} \bar{f}_\mu \). It follows that \(\Delta ^- \bar{f}_\mu := \Delta ^- \bar{f}_\mu \big ( [0,1] \big ) = \Delta ^- \bar{f}_\mu \big ( (0,1] {\setminus } M \big )\). Next, we show that the derivative \(\bar{f}_\mu ^\prime (\beta )\) exists at all \(\beta \in (0,1] {\setminus } M\). Let \(\beta \in (0,1] {\setminus } M\) be given. From the definition of \(\bar{f}_\mu \) it is clear that the derivative \(\bar{f}_\mu ^\prime (\beta )\) exists if and only the derivative \(\phi _\mu ^\prime (\beta )\) exists, which is the case for all \(\beta \in (0,1] {\setminus } M\) by condition (ii). For such \(\beta \), it is given by \(\phi _\mu ^\prime (\beta ) = (1 - \beta )^{-1} g(\beta )\). It follows that \(\bar{f}_\mu ^\prime (\beta )\) exists for all \(\beta \in (0,1] {\setminus } M\) and is given by

$$\begin{aligned} \bar{f}_\mu ^\prime (\beta )&= \lambda (1-\beta ) \phi _\mu ^\prime (\beta ) - \lambda \phi _\mu (\beta ) = \lambda (1 - \beta ) g(\beta ) - \lambda \phi _\mu (\beta ), \quad \beta \in (0,1] \setminus M. \end{aligned}$$
(11)

Consider the set D from condition (iii) and observe that \(D \cap M = \emptyset \) since \(\phi _\mu ^\prime (\beta )\) does not exist for \(\beta \in M\). Hence, the set \(D \subseteq (0,1] {\setminus } M\), consisting of finitely many intervals by condition (iii), represents the subset of [0, 1] on which \(\bar{f}_\mu \) is decreasing. We conclude that \(\Delta ^- \bar{f}_\mu = \Delta ^- \bar{f}_\mu (D)\), where the right-hand side is well-defined by condition (iii).

Now, to obtain our upper bound, we express \(\Delta ^- \bar{f}_\mu (D)\) in terms of an integral over the derivative of \(\bar{f}_\mu \). It is not hard to prove that \(\bar{f}_\mu \) is absolutely continuous on D. Hence, by, e.g., Theorem 14 in [32] it follows that \(\Delta ^- \bar{f}_\mu (D) = \int _{D} \vert \bar{f}_\mu ^\prime (\beta ) \vert d\beta \). Using (11), we obtain

$$\begin{aligned} \Delta ^- \bar{f}_\mu (D)&= \int _{D} \vert \bar{f}_\mu ^\prime (\beta ) \vert d\beta = \int _{D} \big (\lambda \phi _\mu (\beta ) - \lambda (1 - \beta ) g(\beta ) \big ) d\beta \\&\le \lambda \int _{D} \phi _\mu (\beta ) d\beta \le \lambda \int _0^1 \phi _\mu (\beta ) d\beta = \lambda , \end{aligned}$$

where the Riemann integrals are well-defined by condition (iii) and where we use the fact that the risk spectrum \(\phi _\mu \) integrates to one by definition. It follows that \(\vert \Delta \vert f_\mu = \vert \Delta \vert \bar{f}_\mu = 2 \Delta ^- \bar{f}_\mu \le 2 \lambda \), which proves the result. \(\square \)

5 Statistical error bound for SAA

The error bounds from Sects. 3 and 4 depend on the joint density function f of the right-hand side vector h. Hence, they are only valid under the assumption that h is continuously distributed. However, solving recourse models under continuous distributions is generally a very hard task, even if the model is convex. The reason for this is that evaluating the objective function involves computing (multi-dimensional) integrals, which can be numerically challenging.

To overcome this difficulty one often resorts to discretizing the distribution of h. In particular, a common approach is sample average approximation (SAA) [24], in which we draw a sample from the distribution of \(\xi \) (and hence, of h) and then solve the model under the empirical distribution corresponding to the sample.

Definition 7

Consider the recourse function Q from (1) and its \(\alpha \)-approximation \(\tilde{Q}_\alpha \) from Definition 3, and let \(\Xi ^N:= (\xi ^1,\ldots ,\xi ^N)\) be an i.i.d. sample from the distribution of \(\xi \). Then, we define the sample average approximation (SAA) of Q and \(\tilde{Q}_\alpha \) by

$$\begin{aligned} Q^N(x) := \rho ^N\big (v(\xi ,x)\big ), \qquad \text {and} \qquad \tilde{Q}_\alpha ^N(x) := \rho ^N\big (\tilde{v}_\alpha (\xi ,x)\big ), \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$

respectively, where \(\rho ^N(\cdot )\) denotes the value of the risk measure \(\rho (\cdot )\) evaluated under the empirical distribution corresponding to \(\Xi ^N\).

Clearly, the total variation-based error bounds from Sect. 3 and 4 do not apply to the model with the (discrete) SAA distribution. Nevertheless, we show that the error bound based on the original continuous distribution persists in expectation under the SAA distribution. Only the standard bias of the empirical risk measure \(\rho ^N(\cdot )\) needs to be added to obtain a statistical performance guarantee on the quality of the convex approximation under the SAA distribution.

Theorem 3

Consider the setting of Theorem 1, with and let \(Q^N\) and \(\tilde{Q}_\alpha ^N\) denote the sample average approximations from Definition 7. If \(\rho \) is a spectral risk measure, then there exists a constant \(C > 0\) such that for every \(f \in \mathcal {H}^m\) and \(x \in \mathbb {R}^{n_1}\),

$$\begin{aligned} \Big \vert \mathbb {E}\Big [ Q^N(x) - \tilde{Q}_\alpha ^N(x) \Big ] \Big \vert&\le \sup _{\mu \in \mathcal {M}} \phi _\mu (1) \cdot C \cdot \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] + B^N(x), \end{aligned}$$
(12)

where \(B^N(x):= \max \Big \{Q(x) - \mathbb {E}[Q^N(x)], \tilde{Q}_\alpha (x) - \mathbb {E}[\tilde{Q}^N_\alpha (x)]\Big \}\).

Proof

We know from [33] that \({{\,\textrm{CVaR}\,}}_\beta ^N\) is negatively biased, i.e., that \(\mathbb {E}\left[ {{\,\textrm{CVaR}\,}}_\beta ^N\left( v(\xi ,x) \right) \right] \le {{\,\textrm{CVaR}\,}}_\beta \left( v(\xi ,x)\right) \). Combining this with the Kusuoka representation of the spectral risk measure \(\rho \) it immediatly follows that \(\rho ^N\) is negatively biased, too. Hence, \(\mathbb {E}[\rho ^N(\tilde{v}_\alpha (\xi ,x))] \le \rho (\tilde{v}_\alpha (\xi ,x))\) and \(\mathbb {E}[\rho ^N(v(\xi ,x))] \le \rho (v(\xi ,x))\), and thus \(B^N(x) \ge 0\) for every \(x \in \mathbb {R}^{n_1}\). Using this it follows that for every \(x \in \mathbb {R}^{n_1}\),

$$\begin{aligned} \mathbb {E}\left[ Q^N(x) - \tilde{Q}_\alpha ^N(x) \right]&= \mathbb {E}\left[ \rho ^N\left( v(\xi ,x) \right) \right] - \mathbb {E}\left[ \rho ^N\left( \tilde{v}_\alpha (\xi ,x)\right) \right] \\&\le \rho \left( v(\xi ,x) \right) - \mathbb {E}\left[ \rho ^N\left( \tilde{v}_\alpha (\xi ,x)\right) \right] \\&= \rho \left( v(\xi ,x) \right) - \rho \left( \tilde{v}_\alpha (\xi ,x)\right) \\&\quad +\Big (\rho \left( \tilde{v}_\alpha (\xi ,x)\right) - \mathbb {E}\left[ \rho ^N\left( \tilde{v}_\alpha (\xi ,x)\right) \right] \Big ) \\&\le \Big |\rho \left( v(\xi ,x) \right) - \rho \left( \tilde{v}_\alpha (\xi ,x)\right) \Big |+B^N(x), \end{aligned}$$

and analogously,

$$\begin{aligned} \mathbb {E}\left[ \tilde{Q}_\alpha ^N(x) - Q^N(x) \right] \le \Big |\rho \left( v(\xi ,x) \right) - \rho \left( \tilde{v}_\alpha (\xi ,x)\right) \Big |+B^N(x). \end{aligned}$$

The desired result follows from applying Theorem 1 to \(|\rho \left( v(\xi ,x) \right) - \rho \left( \tilde{v}_\alpha (\xi ,x)\right) |\). \(\square \)

Theorem 3 provides a theoretical justification for using convex approximations in an SAA framework. This extends our error bound from the more theoretical setting with continuous distributions to the more practical setting of SAA. Compared with Theorem 1, Theorem 3 contains an additional term \(B^N(x)\), which represents the bias of the empirical risk measure \(\rho ^N\), which typically decreases in the sample size N. Moreover, the bias \(B^N(x)\) equals zero in the risk-neutral case, i.e., when \(\rho = \mathbb {E}\).

Interestingly, the proof of Theorem 3 carries over to settings far beyond the setting considered in this paper. In particular, many error bounds from the literature on convex approximations of MIR models can be extended to statistical error bounds for the corresponding SAA model in a similar way. Even though the setting in Theorem 3 is the most general setting considered in the literature, this is still an important observation. The reason is that many of the error bounds from the literature have been tailored to special cases, leading to (much) tighter error bounds than the more general bounds in this paper. For instance, the line of proof of Theorem 3 may be applied to the error bounds from [18] to obtain tight statistical error bounds based on information about higher-order derivatives of the density function f. In short, Theorem 3 can be seen as an extension of the entire literature on error bounds for convex approximations of MIR models to the setting of SAA.

6 Loose Benders decomposition algorithm

In this section we construct an algorithm based on the \(\alpha \)-approximation from Definition 3. Such an algorithm is necessary since solving the \(\alpha \)-approximation model directly is typically computationally intractable. The reason for this is the fact that the \(\alpha \)-approximation \(\tilde{Q}_\alpha \) is defined through an enumeration of all periodic functions \(\psi ^k\) corresponding to the dual feasible basis matrices \(B^k\) of the LP-relaxation \(v_{\text {LP}}\), of which there are exponentially many.

We propose an extension of the Loose Benders’ decomposition algorithm (LBDA) from [17] to the risk-averse case, which is based on Benders’ decomposition in an SAA-framework. The algorithm overcomes the computational issue regarding the exponential number of periodic functions \(\psi ^k\) by iteratively computing only a few function values for a single function \(\psi ^k\) corresponding to an index k that is close to optimal at the current value of x. The potential suboptimality of k yields so-called “loose cuts”, which is reflected in the name of the algorithm. Thus, the algorithm iteratively approximates the \(\alpha \)-approximation, while avoiding computation of all functions \(\psi ^k\). Nevertheless, we will derive an error bound that guarantees that the quality of the approximation remains good.

6.1 Algorithm construction

The LBDA finds an approximate solution to the original risk-averse MIR model (1) by considering the \(\alpha \)-approximation of the SAA model, defined by

$$\begin{aligned} \tilde{\eta }_\alpha := \min _{x \in X} \{ c^T x + \tilde{Q}_\alpha ^S(x) \}. \end{aligned}$$
(13)

The LBDA uses a Benders’ decomposition framework to find a solution to this approximation model. In fact, it finds an approximate solution to this model by using so-called loose optimality cuts rather than exact cuts. We extend this framework by incorporating risk aversion in the LBDA.

6.1.1 Linearizing the risk-averse objective

The main difficulty in extending the LBDA to the risk-averse case is the non-linearity of the objective function in (13) as a result of the risk measure \(\rho \). To solve this issue, we develop a method to construct a piecewise linear outer approximation of the objective, using a single sorting procedure. The main building block is the result below, which shows that we can compute the (spectral) risk associated with a discrete uniform random variable by sorting its support.

Lemma 5

Let \(\rho \) be a spectral risk measure and let Z be a discrete uniform random variable on \(\{z^1, \ldots , z^S\}\). Define

$$\begin{aligned} w^s := \int _{\tfrac{s-1}{S}}^{\tfrac{s}{S}} \phi _\mu (p) dp, \quad s = 1,\ldots ,S. \end{aligned}$$

where \(\mu \) is the probability measure on [0, 1) from the Kusuoka representation of \(\rho \). Moreover, let \(\Pi \) be the collection of all permutation functions of the set \(\{1,\ldots ,S\}\), i.e., \(\Pi \) is the collection of all one-to-one functions \(\pi : \{1,\ldots ,S\} \rightarrow \{1,\ldots ,S\}\). Then,

$$\begin{aligned} \rho (Z) = \max _{\pi \in \Pi } \sum _{s=1}^{S} w^s z^{\pi (s)}, \end{aligned}$$

Moreover, any \(\pi ^* \in \Pi \) for which \(z_{\pi ^*(1)} \le z_{\pi ^*(2)} \ldots \le z_{\pi ^*(S)}\) holds (i.e., that sorts the set \(\{z^1,\ldots ,z^S\}\)) is optimal.

Proof

By [23], we can represent \(\rho \) in terms of its associated risk spectrum \(\phi _\mu \):

$$\begin{aligned} \rho (Z) = \int _0^1 \phi _\mu (p) F_Z^{-1}(p) dp, \end{aligned}$$

where \(F_Z^{-1}(p):= \inf \{z \in \mathbb {R}\ \vert \ F_Z(z) \ge p \}\). Using the fact that Z is uniformly distributed on \(\{z^1, \ldots , z^S\}\), we have

$$\begin{aligned} F_Z^{-1}(p) = {\left\{ \begin{array}{ll} -\infty &{} \text {if } p = 0, \\ z^{\pi ^*(s)} &{} \text {if } \frac{s-1}{S} < p \le \frac{s}{S}, \quad s=1,\ldots ,S, \end{array}\right. } \end{aligned}$$

where \(\pi ^* \in \Pi \) is such that \(z^{\pi ^*(1)} \le z^{\pi ^*(2)} \le \cdots \le z^{\pi ^*(S)}\), i.e., \(\pi ^*\) sorts the set \(\{z^1,\ldots ,z^S\}\). It follows that

$$\begin{aligned} \rho (Z)&= \sum _{s=1}^{S} \int _{\tfrac{s-1}{S}}^{\tfrac{s}{S}} \phi _\mu (p) z^{\pi ^*(s)} dp = \sum _{s=1}^{S} w^s z^{\pi ^*(s)}. \end{aligned}$$

Since \(\phi _\mu \) is non-negative and non-decreasing, we have \(0 \le w^1 \le w^2 \le \cdots \le w^S\). This implies that \(\rho (Z) = \sum _{s=1}^{S} w^s z^{\pi ^*(s)} \ge \sum _{s=1}^{S} w^s z^{\pi (s)}\) for all \(\pi \in \Pi \). It follows that

$$\begin{aligned} \rho (Z)&= \max _{\pi \in \Pi } \Big \{ \sum _{s=1}^S w^s z^{\pi (s)} \Big \}, \end{aligned}$$

with an optimal solution \(\pi ^*\). \(\square \)

Applying Lemma 5 to our approximating recourse function \(\tilde{Q}_\alpha ^S(\bar{x})\) for some fixed \(\bar{x}\), we can write

$$\begin{aligned} \tilde{Q}_\alpha ^S(\bar{x})&= \max _{\pi \in \Pi } \sum _{s=1}^S w^s \tilde{v}_\alpha (\xi ^{\pi (s)}, \bar{x}), \end{aligned}$$

and an optimal solution is found by any \(\pi ^* \in \Pi \) that sorts the values \(\tilde{v}_\alpha (\xi ^s, \bar{x})\), \(s=1,\ldots ,S\). Writing \(\tilde{\pi }\) for the inverse of \(\pi ^*\), we obtain a piecewise linear outer approximation of \(\tilde{Q}_\alpha ^S\):

$$\begin{aligned} \tilde{Q}_\alpha ^S(x) \ge \sum _{s=1}^S w^{\tilde{\pi }(s)} \tilde{v}_\alpha (\xi ^s, x), \end{aligned}$$
(14)

where the inequality follows since \(\pi ^*\) is feasible but not necessarily optimal for any x. So Lemma 5 provides a procedure to derive a piecewise linear outer approximation of the the approximating recourse function \(\tilde{Q}_\alpha ^S(x)\). We use this sorting-based linearization to extend the LBDA to the risk-averse case.

6.1.2 Algorithm description

We now present our LBDA in the risk-averse setting. In Algorithm 1 we provide pseudo-code for the algorithm. We explain each step below; for more elaborate explanations of some steps we refer to [17].

Algorithm 1
figure a

Risk-averse loose Benders’ decomposition algorithm (LBDA)

Besides a description of the underlying risk-averse MIR model, the risk-averse LBDA needs a few other inputs. First, it needs a shift parameter \(\alpha \) for the \(\alpha \)-approximation used in the algorithm. Next, it needs a uniform lower bound L on \(\tilde{Q}^S_\alpha \) (which always exists since we assume X is bounded). Finally, it needs a sample size S for the SAA and a tolerance level \(\varepsilon \) for the stopping criterion. The algorithm will output a near-optimal solution \(\tilde{x}_\alpha \) for (13).

In the initialization phase, we set the iteration counter \(\tau \) to zero and initialize an outer approximation function \(\tilde{Q}^\tau _{\text {OUT}}\) to the uniform lower bound L. Moreover, we draw a sample from the distribution of h and we compute the values \(w^1,\ldots ,w^S\) corresponding to \(\rho \). Note that these values only need to be computed once.

Every iteration of the algorithm consists of two parts. In the first part, we solve a master problem of the form \(\min _{x} \{ c^T x + \tilde{Q}^\tau _{\text {OUT}}(x) \ \vert \ Ax = b, \ x \in \mathbb {R}_+^{n_1} \}\), yielding a candidate solution \(x^\tau \). In the second part, we generate a so-called loose optimality cut at \(x^\tau \). This cut generation procedure consists of two steps, which we will discuss subsequently.

In the first step we derive a loose optimality cut for every scenario \(s=1,\ldots ,S\). Note that a tight optimality cut is given by \(\tilde{v}_\alpha (\xi ^s,x) \ge \lambda ^{k^\tau _s}(\xi ^s - T^s x^\tau ) + \psi ^{k^\tau _s}(h^s - \alpha )\), where \(k^\tau _s\) is an optimal index in the maximization problem defining \(\tilde{v}_\alpha (\xi ^s,x^\tau )\). It turns out, however, that finding \(k^\tau _s\) exactly is computationally challenging. Therefore, van der Laan and Romeijnders [17] propose to approximate \(k^\tau _s\) by \(\tilde{k}^\tau _s\), the optimal index in the dual representation of the LP-relaxation. We follow this procedure. See [17] for more details. Moreover, note that computing \(\psi ^{k^\tau _s}(h^s - \alpha )\) involves solving the Gomory relaxation \(v_{B^k}(h^s - \alpha )\), as explained in Lemma 2, which is a mixed-integer program.

In the second step we combine the scenario cuts into a single loose optimality cut for the master problem. We use the procedure from Sect. 6.1.1 to combine the scenarios in a linear way, despite the inherent nonlinearity of our risk measure \(\rho \). In particular, we find \(\pi ^\tau \in \Pi \) that sorts the values \(u^\tau _1, \ldots , u^\tau _S\). Defining \(U^\tau \) as the random variable with a uniform distribution on \(u^\tau _1, \ldots , u^\tau _S\) and writing \(\tilde{\pi }^\tau \) as the inverse of \(\pi ^\tau \), this means that \(\rho (U^\tau ) = \sum _{s=1}^S w^{\tilde{\pi }^\tau (s)} \big ( \lambda ^{\tilde{k}^\tau _s} h^s - T^s x^\tau + \psi ^{\tilde{k}^\tau _s}(h^s - \alpha ) \big )\). Now, we define the optimality cut \(\tilde{Q}^\tau _{\text {OUT}}(x) \ge \sum _{s=1}^S w^{\tilde{\pi }^\tau (s)} \big ( \lambda ^{\tilde{k}^\tau _s} h^s - T^s x + \psi ^{\tilde{k}^\tau _s}(h^s - \alpha ) \big )\) and add it to our outer approximation \(\tilde{Q}^{\tau +1}_{\text {OUT}}\). Our optimality cut is indeed feasible for all x, since \(\pi ^\tau \) is optimal at \(x^\tau \) and feasible in the corresponding maximization problem for all other x.

The algorithm stops whenever the improvement in the outer approximation as a result of the most recently added optimality cut is less than the predefined tolerance level \(\varepsilon \). The algorithm outputs the most recent solution \(x^\tau \) as the LBDA solution \(\tilde{x}_\alpha \).

6.2 Performance guarantee

We now present a performance guarantee for our risk-averse LBDA. That is, we derive an upper bound on the optimality gap

$$\begin{aligned} c\tilde{x}_\alpha + Q(\tilde{x}_\alpha ) - \eta ^*, \end{aligned}$$

where \(\tilde{x}_\alpha \) is the algorithm solution. Our performance guarantee is based on the performance guarantee in [17] for the risk-neutral LBDA. The proof for our risk-averse setting is similar, with a few necessary adjustments to account for risk aversion.

Theorem 4

Let \(\tilde{x}_\alpha \) be the solution from the LBDA with tolerance level \(\varepsilon > 0\). Then, there exists a constant \(C > 0\) such that for every \(f \in \mathcal {H}^m\),

$$\begin{aligned} c^T\tilde{x}_\alpha + Q(\tilde{x}_\alpha ) - \eta ^* \le \varepsilon + \phi _\mu (1) \cdot C \cdot \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ \vert \Delta \vert f_i(\cdot \vert h_{-i}) \big ] \end{aligned}$$

with probability one as \(S \rightarrow +\infty \).

Proof

See the appendix. \(\square \)

The performance guarantee from Theorem 4 is equal to the error bound from Theorem 1, up to the tolerance level \(\varepsilon \). This shows that, as the sample size goes to \(\infty \), the risk-averse LBDA produces a solution that is equally good as the solution to the \(\alpha \)-approximation model, in terms of the corresponding performance guarantee. In particular, this shows that the loose cuts used in the algorithm do not come at a cost in terms of (provable) solution quality. This is in line with the results for the risk-neutral case in [17].

7 Numerical experiments

In this section we perform a series of numerical experiments to gain more insight into the performance of our convex approximation approach in risk-averse MIR models. Since the main difference between our paper and the existing results in the literature on convex approximations of risk-neutral MIR models is the introduction of a risk measure \(\rho \), we will focus primarily on the effect of the risk measure \(\rho \) on different performance measures. For a numerical analysis of convex approximations and the LBDA in a risk-neutral setting, we refer to [13] and [17], respectively.

We will explore our convex approximations approach in a risk-averse one-dimensional SIR setting. We consider this setting for two reasons. First, it allows for exact computation of approximation errors, optimal solutions, and optimality gaps. Hence, we avoid having to resort to estimation of performance guarantees, which would make our results more noisy and which would hinder interpretation. Second, the one-dimensional setting makes it easier to plot certain results, again aiding interpretation of the results.

7.1 Asymptotic behavior

Before zooming in on the risk measure \(\rho \) we first test whether the asymptotic behavior of the error bounds from Theorem 1 and 2 is similar to the behavior of the actual approximation error \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \). We consider a one-dimensional SIR model with \(q^- = 0\) and \(q^+ = 1\), where the right-hand side variable h follows a normal distribution with mean zero and standard deviation \(\sigma > 0\). We consider three different risk measures: the expected value \(\rho = \mathbb {E}\) (denoted “mean”), \(\rho = {{\,\textrm{CVaR}\,}}_{0.8}\) (denoted “CVaR”), and the spectral risk measure defined by the risk spectrum \(\phi _\mu (p) = 2p\), \(p \in [0,1]\) (denoted “spectral”).

For each risk measure, we compute the maximum approximation error \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) for \(\sigma \in \{0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 5\}\). The results are presented in Fig. 1. For comparison, we also plot the error bound from Theorem 2 corresponding to \(\rho = \mathbb {E}\). Note that the error bounds for the other risk measures can be obtained by multiplying this bound by the factor \(\phi _\mu (1)\) corresponding to the respective risk measure. This factor equals 5 for CVaR and 2 for spectral.

Fig. 1
figure 1

Maximum approximation error for a one-dimensional SIR model with \(q^- = 0\) and \(q^+ = 1\), for three different risk measures, where the underlying distribution is a normal distribution truncated at its mean \(\mu =0\) with varying values of the standard deviation \(\sigma \). Also plotted is the error bound from Theorem 5 in [16] corresponding to the risk-neutral case \(\rho = \mathbb {E}\)

From Fig. 1 we observe that for all risk measures the maximum approximation error \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) converges to zero as \(\sigma \) increases. The rate of convergence is comparable to that of the error bound. This confirms that in this setting our asymptotic interpretation of the error bound translates to the actual approximation error.

Interestingly, in contrast with the approximation error in the risk-neutral case, the approximation error for CVaR exhibits an erratic pattern; compare the solid and dashed graph in Fig. 1. To understand this behavior, recall that we can represent \({{\,\textrm{CVaR}\,}}_{0.8}\) as an expected value under an adjusted pdf, which has a sharp peak at the 80th percentile of the original distribution. This peak may be located at a point where the approximation error \(v(h,x) - \tilde{v}_\alpha (h,x)\) is particularly large. Depending on the shape of the rest of the adjusted pdf (which depends on \(\sigma \)), the effect of this peak may be mitigated by errors in the other direction or not. If so, the corresponding value of \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) will be relatively small; if not, it will be large. This unpredictable behavior can be seen as the reason why the error bound for CVaR needs to be much higher than the one for mean, even though in most cases the corresponding approximation errors are comparable, as we can see from the graph.

In case of the risk measure “spectral”, we have a similar effect as for CVaR, but now there infinitely many peaks, one for each value of \(\beta \) for which the risk spectrum \(\phi _\mu (\beta )\) is increasing. It is possible that these peaks line up in a way that they magnify each other’s effect on \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \), which is reflected in the corresponding error bound. However, in many situations some of the peaks will cancel out the effect of other peaks, resulting in a low value for \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \). This is indeed what we observe in Fig. 1: the graph for spectral is even below that of mean, while its corresponding error bound is more than twice as large. This result suggests that combining multiple CVaRs into a spectral risk measure can lead to small approximation errors. We will analyze this topic in more detail in Sect. 7.2.

7.2 Smoothness of the risk spectrum

Recall that the error bounds from Theorem 1 and Theorem 2 depends on the risk measure \(\rho \) only through the associated value \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\). Since we only use a very limited amount of information about the risk measure \(\rho \), we conjecture that the resulting bound may be overly conservative for some risk measures. In particular, we are interested how, for a given \(\mu \in \mathcal {M}\), the shape of \(\phi _\mu \) (so not only its maximum value) affects the approximation error.

To investigate this matter, we consider two spectral risk measures with the same associated value \(\phi _\mu (1) = 2\). These risk measures differ in the “smoothness” of the associated risk spectrum \(\phi _\mu \). On the one hand we have \({{\,\textrm{CVaR}\,}}_{0.5}\), whose risk spectrum \(\phi _\mu \) is a step function with one jump at 0.5 (see Fig. 2). In this subsection we will refer to this risk measure simply as “CVaR”. On the other end we have the spectral risk measure associated with the risk spectrum

$$\begin{aligned} \phi (p)&= {\left\{ \begin{array}{ll} 0 &{}\text {if } p < 0.25, \\ 4(p-0.25) &{}\text {if } 0.25 \le p \le 0.75, \\ 1 &{}\text {if } p > 0.75. \end{array}\right. } \end{aligned}$$

This risk spectrum can be seen as a “smoothed” transformation of the risk spectrum for CVaR. Hence, we refer to the corresponding risk measure as “smoothed CVaR”.

Observe that smoothed CVaR is a convex combination of an infinite amount of CVaRs. For numerical reasons, we will not work with smooth itself, but approximate it with convex combinations of a finite number of CVaRs. For a given \(n \in \mathbb {N}\), we can approximate smooth CVaR by the risk measure \(\rho = \sum _{k = 1}^n \mu _k {{\,\textrm{CVaR}\,}}{\beta _k}\), where \(\beta _k = \frac{k - 1/2}{n}\) and \(\mu _k = \frac{2(1 - \beta _k)}{n}\), \(k=1,\ldots ,n\). See Fig. 2 for an illustration of the corresponding risk spectrum. In our experiments we will use \(n = 5\).

Fig. 2
figure 2

A plot of the risk spectra corresponding to CVaR, smoothed CVaR, and the approximation of the latter (with \(n = 4)\), as discussed in Sect. 7.2

For each of the two risk measures, we compute the maximum approximation error \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) in a one-dimensional SIR setting with \(q^- = 0\) and \(q^+ = 1\), where h follows a Weibull distribution with shape parameter k and scale parameter \(\lambda \). The reason for choosing the Weibull family is that it exhibits a wide variety of different shapes of the pdf, depending on the parameters, thereby increasing our confidence that the results are representative for more general cases. In Table 1 we present the difference between the maximum approximation errors for these two risk measures.

Table 1 Difference between \(\Vert Q - \tilde{Q}_\alpha \Vert _\infty \) for CVaR and for smoothed CVaR, for a one-dim. SIR model with \(q^+ = 1\), \(q^- = 0\), and a Weibull distribution with various settings for the shape parameter k and scale parameter \(\lambda \). Also presented are the average for each row and column

From Table 1 we see that CVaR has a larger corresponding maximum approximation error than smoothed CVaR in all settings, except for \(k = 0.6\), \(\lambda = 1.25\). These results suggest that generally speaking, spectral risk measures with a smoother associated risk spectrum lead to lower approximation errors. To understand these results, we provide plots of the maximum approximation errors and the corresponding adjusted pdfs for \(k=1\) and varying values for \(\lambda \) in Fig. 3.

Fig. 3
figure 3

Plots related to a one-dimensional SIR model with \(q^+ = 1\), \(q^- = 0\), under a Weibull distribution with \(k=1\) and varying values for \(\lambda \), for two different underlying risk measures: CVaR and smoothed CVaR. Top: the maximum approximation error as a function of \(\lambda \). The corresponding error bound from Theorem 2 is also plotted. Bottom: the adjusted pdf for each of the two risk measures, for \(k = 1\) and \(\lambda = 1\)

In Fig. 3a we see that the graphs of the maximum approximation error for both CVaR and smoothed CVaR decrease as \(\lambda \) increases, similar to the associated error bound. This is in line with the error bounds from Sect. 3 and 4, as the distribution of h becomes more dispersed with \(\lambda \). Also, we see a more erratic behavior for CVaR than for smoothed CVaR, which is also in line with our results in Sect. 7.1.

More importantly, the graph of CVaR is uniformly above that of smoothed CVaR, as we already learned from Table 1. Figure 3b provides an explanation for this finding. The adjusted pdf corresponding to CVaR has a sharp peak at the 50th percentile of the original distribution. For smoothed CVaR, which is a convex combination of a multitude of CVaRs, many such peaks are aggregated, resulting in a lower, less sharp peak. As a result, the total variation of the adjusted pdf of CVaR is significantly higher than for smoothed CVaR. Hence, based on risk-neutral error bounds, we expect the maximum approximation error for CVaR to be larger, which is indeed what we find.

These results suggest that spectral risk measures with relatively smooth risk spectra lead to relatively low approximation errors. So interestingly, this finding suggests that finding good solutions for models with a more complicated mixed CVaR risk measure might in fact be easier than for models with a simple CVaR risk measure. This is especially interesting from a modeling point of view, given the fact that risk preferences might be more accurately reflected by these more complicated mixed CVaR risk measures than by a single CVaR.

7.3 Risk-averse LBDA performance

Finally, we investigate the performance of our risk-averse LBDA. A comparison of the performance of the risk-neutral LBDA with alternative solution methods has already been performed in [17], showing that the LBDA is indeed competitive in the risk-neutral case. Here, we focus on investigating the effect of the choice of risk measure \(\rho \) on the performance of the algorithm.

For this purpose, we consider the stochastic server location problem (SSLP) from the stochastic integer programming library SIPLIB [34]. The first- and second-stage problems of the SSLP are pure binary and mixed-binary, respectively, and the largest instance has over 1,000,000 binary decision variables and 120,000 constraints. A full description can be found in [34]. We solve instances with 10 server locations and 50 potential clients, which are denoted by sslp_10_50_xxxx, where xxxx represents the number of scenarios, which is taken from the set \(\{ 50, 100, 500, 1000, 2000 \}\). This allows us to test the effect of an increase in the number of scenarios on the performance of the algorithm. We use four different risk measures: (1) the expectation, (2) a CVaR risk measure with parameter \(\beta = 0.8\), (3) a mean-CVaR risk measure which is an equally weighted average between the former two risk measures, and (4) a spectral risk measure defined by the risk spectrum \(\phi _\mu (p) = 2p\), \(p \in [0,1]\). For every combination of problem instance and risk measure, we run our LBDA algorithm, as well as a deterministic equivalent formulation with a computation time limit of 1 h. The instances are run using Gurobi 10.0.0 on an 11th Gen Intel(R) Core(TM) i5-1145G7 processor running at 1.50 GHz.

Table 2 Solution times for the numerical experiments in Sect. 7.3 in seconds, rounded to the nearest integer
Table 3 Objective function values for the numerical experiments in Sect. 7.3 (rounded to the nearest integer). For each combination of problem instance and risk measure, we present the objective function value of the solution found by the deterministic equivalent formulation (DEF) and by the LBDA. For instances in which the DEF does not find a solution within the time limit, we report the optimality gap between brackets

The results of the experiments are given in Tables 2 and 3. First, Table 2 shows the computation times of the LBDA and the deterministic equivalent formulation. We observe that the LBDA is able to find a solution within less than the 1-h time limit in all instances. The computation time grows more or less linearly in the number of scenarios. The deterministic equivalent, on the other hand, reaches the 1-h cutoff mark in almost all instances. Moreover, the deterministic equivalent formulation of the instances with 500–2000 scenarios and the spectral risk measure break down due to a memory overflow, as this naive implementation requires various additional variables for every scenario in order to deal with the spectral risk measure. The LBDA, on the other hand, does not have this problem. It finds a solution within a computation time comparable to other risk measures. This illustrates one advantage of our risk-averse LBDA over methods based on risk-neutral reformulations: it can easily deal with arbitrary spectral risk measures.

Next, Table 3 shows the objective function values of the solutions obtained by the LBDA and the deterministic equivalent formulation. We observe that for smaller instances, the LBDA finds the optimal solution for the risk measures expectation and spectral, whereas there is a positive optimality gap for the risk measures mean-CVaR and especially CVaR, due to the fact that LBDA only finds an approximate solution. Observe that the latter two have higher values for \(\sup _{\mu \in \mathcal {M}}\phi _\mu (1)\) than the former two. Hence, this result is in line with our performance guarantee from Theorem 4. For larger instances, we observe that the performance of the deterministic equivalent formulation deteriorates as the number of scenarios increases, while that of the LBDA remains consistent. This illustrates that the LBDA is well-equipped to solve large-scale problems.

We conclude that the numerical performance of the LBDA is in line with what can be expected from the performance guarantee in Theorem 4 and is particularly prominent in larger problem instances.

8 Conclusion

In this paper we develop convex approximations of risk-averse MIR models. We extend existing performance guarantees for these convex approximations to a general setting with an arbitrary law invariant coherent risk measure. The performance guarantee turns out to depend on the maximum distortion factor \(\sup _{\mu \in \mathcal {M}} \phi _\mu (1)\) corresponding to the risk measure \(\rho \). We also show how our performance guarantee extends to an SAA setting and we extend an SAA-based Benders’ decomposition algorithm from the literature to our risk-averse setting. Finally, we show in numerical experiments that the performance of the LBDA is in line with our theoretical performance guarantees. Moreover, it is capable of dealing with arbitrary spectral risk measures and is particularly suitable for solving large problem instances.

Future research may be aimed at deriving tighter error bounds than the ones in this paper. We present worst-case error bounds that hold under very general conditions, but our numerical experiments suggest that, at least in some cases, there might be room for improvement. For example, it would be interesting to see whether tighter error bounds can be derived by using more information about the risk measure \(\rho \) than our error bounds use. Another interesting avenue for future research is to investigate whether our convex approximation approach extends to risk-averse settings beyond the law invariant coherent case considered in this paper. Our error bound for the non-coherent risk measure expected excess can be seen as a first step in this direction.