1 Introduction

Stochastic programming is a methodology for modeling optimization problems under uncertainty. Traditionally, this uncertainty is accounted for by minimizing the expected total costs, and thus implicitly, a neutral stance toward risk is assumed. For recurring problems that have to be solved many times, this approach is justified by the law of large numbers. However, in many other applications we face a single-shot problem in which avoiding risk is desired.

In this paper, we focus on a class of models from stochastic programming that explicitly incorporate this aversion toward risk: mean-risk models. In these models, a weighted average of the expected total costs and a measure of risk is minimized. Thus, a balance is struck between minimizing the cost on average and avoiding high levels of risk. In particular, we will consider mean-risk models with two time stages, integer decision variables, and conditional value-at-risk (CVaR) as the risk measure. The random parameters in our model are the second-stage right-hand side and cost vector, and the technology matrix. Moreover, a key assumption is that the random right-hand side vector is continuously distributed. We refer to these models as two-stage mixed-integer mean-CVaR recourse models.

Integer decision variables are often required for realistic modeling of, e.g., indivisibilities or on/off decisions. However, including them in mean-CVaR recourse models makes these models significantly harder to solve than their continuous counterparts. Indeed, for continuous mean-CVaR recourse models, efficient solution methods are available from the literature. These methods exploit the convexity of the objective function. See, e.g., Ahmed [2], Miller and Ruszczyński [31], and Noyan [32] for decomposition algorithms based on the L-shaped algorithm by Van Slyke and Wets [52] and Rockafellar [37] for a progressive hedging algorithm.

Mixed-integer mean-CVaR recourse models, however, are generally not convex so that the aforementioned convex optimization-based methods cannot be applied. Thus, alternative solution methods are required for these models. Schultz and Tiedemann [44] show that the problem can be reformulated as a large-scale mixed-integer linear program (MILP) if the probability distributions of the random variables in the model are discrete and finite. Based on this reformulation they propose a decomposition algorithm using Lagrangean relaxation of the nonanticipativity constraints. Other authors solve the large-scale MILP reformulation using standard MILP solvers (e.g., [47]) or develop heuristics for specific problem settings [5]. However, these solution methods can only solve problems of limited size.

We will take a fundamentally different approach to deal with integer decision variables in mean-CVaR recourse models. Instead of aiming for an exact optimal solution, we will construct approximation models with a convex objective function. The rationale of doing so is that these convex approximation models can be solved efficiently using techniques from convex optimization, similar as continuous mean-CVaR recourse models. To guarantee the performance of the resulting approximating solutions we derive error bounds on the convex approximations. Such convex approximations and corresponding error bounds have been derived for risk-neutral mixed-integer stochastic programming problems; see Sect. 2.3 for a review of them. However, to our knowledge, this is the first paper that considers convex approximations for mixed-integer stochastic programs in a risk-averse setting.

The main contribution of this paper is that we construct convex approximations and derive corresponding error bounds for two-stage mixed-integer mean-CVaR recourse models. These error bounds converge to zero if the total variations of the probability density functions of the random right-hand side variables in the model converge to zero. Intuitively, this means that any mixed-integer mean-CVaR recourse model can be approximated arbitrarily well by a convex approximation if the variability of the random right-hand side variables in the model is sufficiently large. For the special cases of totally unimodular (TU) and simple integer mean-CVaR recourse models we perform a specialized analysis to derive tighter bounds. For the latter type of models, it turns out that the bound is particularly small if the random right-hand side variable in the model has a decreasing hazard rate.

The remainder of the paper is organized as follows. In Sect. 2 we formulate the mathematical model and review the relevant literature. Next, in Sect. 3 we consider the general setting of two-stage mixed-integer mean-CVaR recourse models and derive convex approximations with asymptotically converging error bounds. Section 4 deals with the special cases of TU and simple integer mean-CVaR recourse models. Section 5 provides a discussion of the results and directions for further research. Finally, Appendix A contains a generalization of existing risk-neutral results that we use in this paper and Appendix B contains proofs of several lemmas, propositions, and theorems.

2 Problem formulation and literature review

2.1 Problem formulation

We consider the two-stage mixed-integer mean-CVaR recourse model

$$\begin{aligned} \min _{x \in X} \big \{ cx + \mathscr {Q}^{\beta }_\rho (x) \big \}, \end{aligned}$$
(1)

where \(X = \{ x \in \mathbb {R}^{n_1} \ | \ Ax = b \}\) represents the set of feasible first-stage decisions that have to be made before some random parameters \(\xi \) are known, and \(\mathscr {Q}^{\beta }_\rho \) is the mean-CVaR recourse function

$$\begin{aligned} \mathscr {Q}^{\beta }_\rho (x)&:= (1-\rho )Q(x) + \rho R^{\beta }(x), \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$
(2)

with weight parameter \(\rho \in [0,1]\). Here, the mean recourse function Q and the CVaR recourse function \(R^{\beta }\) are defined by

$$\begin{aligned} Q(x)&:= \mathbb {E}_{\xi }\left[ v(\xi , x) \right] , \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$
(3)
$$\begin{aligned} R^{\beta }(x)&:= {{\,\mathrm{CVaR}\,}}_\beta \left[ v(\xi , x)\right] , \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$
(4)

where \({{\,\mathrm{CVaR}\,}}_\beta \) is the \(\beta \)-conditional value-at-risk (\(\beta \in (0,1)\)) defined in Definition 1, and v is the second-stage value function, defined by

$$\begin{aligned} v(\xi , x) := \min _{y} \left\{ qy \ | \ Wy = h - Tx, \ y \in \mathbb {Z}^{n_2}_+ \times \mathbb {R}^{n_3}_+ \right\} . \end{aligned}$$
(5)

The second-stage decision variables y represent the recourse actions that can be taken after the realization of \(\xi := (q, T, h)\) is known, in order to compensate for infeasibilities in the goal constraint \(Tx = h\). For ease of exposition, we assume that the first-stage decision variables x are continuous. However, all results in this paper still hold when some or all of these variables are restricted to be integer.

As an example of an application of our model, we discuss a stylized version of the disaster relief planning problem of Alem et al. [5] in Example 1 below.

Example 1

Consider the problem of distributing relief goods (water, food, medicine, etc.) after a natural disaster. A priori, the location and size of the disaster are naturally uncertain. However, where to store the relief goods needs to be determined before the disaster takes place. The goal is both to minimize the financial cost and to avoid shortages of relief goods at locations of need. We can model this problem using a two-stage mixed-integer mean-CVaR model.

In the first stage (before the disaster) we have to decide how many relief goods to store at each available storage location. The first-stage costs are the cost of acquiring these goods. When the disaster strikes, the required amount of relief goods in every area becomes known. In the second-stage, we need to allocate vehicles to transport goods from the different storage locations to the affected areas. The second-stage costs consist of the cost of using these vehicles plus a penalty on any unsatisfied demand (shortages) of relief goods. Since high shortages should be avoided, this problem is naturally modeled using a risk-averse approach. Furthermore, note that integer variables are needed to model the number of allocated vehicles in the second stage. \(\triangle \)

Our goal is to construct convex approximations \(\tilde{\mathscr {Q}}^{\beta }_\rho \) of the form \(\tilde{\mathscr {Q}}^{\beta }_\rho = (1-\rho )\tilde{Q} + \rho \tilde{R}^{\beta }\) for the mean-CVaR recourse function \(\mathscr {Q}^{\beta }_\rho \). Since convex approximations \(\tilde{Q}\) of Q are available in the literature (see Sect. 2.3), we focus on constructing convex approximations \(\tilde{R}^{\beta }\) of \(R^{\beta }\). As a performance guarantee, we will derive an upper bound on

$$\begin{aligned} \Vert \mathscr {Q}^{\beta }_\rho - \tilde{\mathscr {Q}}^{\beta }_\rho \Vert _\infty := \sup _{x \in X} | \mathscr {Q}^{\beta }_\rho (x) -\tilde{\mathscr {Q}}^{\beta }_\rho (x)|. \end{aligned}$$

Since

$$\begin{aligned} \Vert \mathscr {Q}^{\beta }_\rho - \tilde{\mathscr {Q}}^{\beta }_\rho \Vert _\infty \le (1-\rho ) \Vert Q - \tilde{Q} \Vert _\infty + \rho \Vert R^{\beta } -\tilde{R}^{\beta } \Vert _\infty , \end{aligned}$$
(6)

we will focus on deriving an upper bound on \(\Vert R^{\beta } -\tilde{R}^{\beta }\Vert _\infty \). Bounds on \(\Vert Q - \tilde{Q}\Vert _\infty \) are known from the literature. However, since these existing bounds only apply to recourse models with randomness in the right-hand side vector h only, we generalize them to our setting in Appendix A, where we allow q and T to be random as well.

Throughout this paper, we make the following assumptions.

Assumption 1

We assume that

  1. (a)

    the recourse is complete and sufficiently expensive, i.e., \( -\infty< v(\xi , x) < \infty \), for all \(\xi \in \varXi \) and \(x \in \mathbb {R}^{n_1}\), where \(\varXi \) denotes the support of \(\xi \).

  2. (b)

    the expectation of the \(\ell _1\) norm of \(\xi \) is finite, i.e., \(\mathbb {E}_\xi \Vert \xi \Vert _1 < \infty \), where \(\Vert \xi \Vert _1 :=\sum _{j=1}^n |q_j| + \sum _{i=1}^{m}\sum _{j=1}^{n_1} |T_{ij}| +\sum _{i=1}^{m} |h_i| \),

  3. (c)

    the recourse matrix W is integer,

  4. (d)

    the support \(\varXi \) of \(\xi \) can be written as \(\varXi = \varXi ^q \times \varXi ^T \times \varXi ^h\), where \(\varXi ^q\) is finite. Moreover, h is continuously distributed on \(\varXi ^h\) with joint pdf f,

  5. (e)

    (qT) and h are pairwise independent.

Assumption 1(a)–(b) ensure that Q(x) and \(R^{\beta }(x)\) are finite for every \(x \in \mathbb {R}^{n_1}\). Next, Assumption 1(c) is required for the proof of Theorem 1. However, this assumption is not very restrictive, since any rational matrix can be transformed into an integer one by appropriate scaling. Assumption 1(d)–(e) restrict the random right-hand side vector h to be continuously distributed. This is the key assumption on the random parameters \(\xi \) in our paper. The remaining assumptions in Assumption 1(d)–(e) are for ease of presentation; similar results as in this paper can be obtained for relaxed versions of these assumptions. Finally, we note that we assume that the probability distribution of \(\xi \) is known or can be accurately estimated, based on, e.g., historical data or expert opinions.

2.2 Conditional value-at-risk

In our risk-averse stochastic programming approach, we use conditional value-at-risk (CVaR) as the measure of risk. For probability parameter \(\beta \in (0,1)\), the \(\beta \)-CVaR of a random variable \(\theta \), written as \({{\,\mathrm{CVaR}\,}}_\beta [\theta ]\), has the interpretation of the conditional expectation of \(\theta \), given that \(\theta \) is at least as large as its \(\beta \)-quantile. Thus, intuitively, \({{\,\mathrm{CVaR}\,}}_\beta [\theta ]\) represents the average of the \(100(1-\beta )\%\) worst values of \(\theta \). We use the minimization representation of CVaR by Rockafellar and Uryasev [38] as our definition.

Definition 1

Let \(\theta \) be a random variable and let \(\beta \in (0,1)\) be given. Then, the \(\beta \)-CVaR of \(\theta \) is defined as

$$\begin{aligned} {{\,\mathrm{CVaR}\,}}_\beta [\theta ]&= \min _{\zeta \in \mathbb {R}} \Big \{ \zeta + \tfrac{1}{1-\beta } \mathbb {E}_\theta \big [ (\theta - \zeta )^+ \big ] \Big \}. \end{aligned}$$

Our choice for CVaR is motivated by the fact that this risk measure satisfies several desirable theoretical properties. First of all, CVaR is a coherent risk measure [38], and thus satisfies the axiomatic properties proposed by Artzner et al. [7]. In contrast, several popular risk measures such as value-at-risk violate some of these properties [1]. Second, Ogryczak and Ruszczyński [34] show that mean-CVaR recourse models are consistent with second-order stochastic dominance, a tool that establishes a preorder of random variables. This is relevant, since consistency with second-order stochastic dominance is desirable for accurately modeling risk aversion [26]. Third, Schultz and Tiedemann [44] show that mixed-integer mean-CVaR recourse models exhibit desirable properties such as continuity and stability. Furthermore, they show that under mild technical conditions an optimal solution to these models exist.

Due to its desirable properties, CVaR is one of the most popular risk measures in the literature on risk-averse optimization under uncertainty. For instance, it is the most popular choice for applications in supply chain network design under uncertainty [19]. See, e.g., [18, 36, 43, 46,47,48, 54] for applications of mean-CVaR recourse models in this field. Other areas of application include disaster relief planning [5, 32, 33], (energy) production planning [4, 9, 21, 27], transportation network protection [29], and water allocation [56]. The popularity of CVaR, and of mean-CVaR recourse models in particular, underlines the relevance of the models studied in this paper.

2.3 Solution methods for risk-neutral mixed-integer recourse models

Traditional solution methods for risk-neutral mixed-integer recourse models combine solution methods from deterministic mixed-integer and stochastic continuous optimization. See, e.g., Laporte and Louveaux [25] for the integer L-shaped method, Carøe and Schultz [12] for dual decomposition, Ahmed et al. [3] for branch-and-bound, Sen and Higle [45] for disjunctive decomposition, and [6, 8, 11, 16, 22, 35, 55] for recent work on cutting plane techniques. In general, however, these solution methods have difficulties solving large problem instances because they aim at finding an exact optimal solution. In contrast, we merely aim at finding good or near-optimal solutions to our mixed-integer mean-CVaR recourse model by means of convex approximations. For this reason, the remainder of this subsection is devoted to the literature on convex approximations for the corresponding risk-neutral case.

Convexity properties of risk-neutral mixed-integer stochastic programming problems were first analyzed by Klein Haneveld et al. [23] for the special case of simple integer recourse models. In fact, they exactly identified the probability distributions for which the mean recourse function Q in such models is convex. For all other cases, they derive so-called \(\alpha \)-approximations \(\tilde{Q}_\alpha \) of Q and corresponding error bounds. These convex approximations are extended by van der Vlerk to TU integer recourse models [50] and mixed-integer recourse models with a single recourse constraint [51]. However, only for the latter type of model does he derive an error bound for these convex approximations.

Recently, substantial progress has been made in deriving error bounds for convex approximations of mixed-integer recourse models with multiple non-separable recourse constraints. For example, for TU integer recourse models, Romeijnders et al. [39] derive an error bound for the \(\alpha \)-approximations from [50]. This error bound depends on the total variations of the density functions of the random right-hand side variables in the model. In particular, if these total variations are small, then the error bound is small and hence, the convex approximation is good. This is confirmed by numerical experiments in [42]. A tighter error bound is derived for an alternative convex approximation, called the shifted LP-relaxation approximation; see [41]. In fact, it is shown that the error bound is the best possible in a worst-case sense. The main building blocks in the derivation of this error bound are total variation bounds for the expectation of periodic functions.

The latest developments in this area are the extension of these convex approximations to the general case of two-stage mixed-integer recourse models. In particular, Romeijnders et al. [40] extend the shifted LP-relaxation approximation to this case, while van der Laan and Romeijnders [49] generalize the \(\alpha \)-approximations. For both approximations, a corresponding asymptotic error bound is derived, which converges to zero as the total variations of the density functions in the model go to zero. These bounds are derived by exploiting asymptotic periodicity of the second-stage value functions in combination with the total variation bounds from [41].

In this paper we generalize several results from this convex approximation literature to the risk-averse case. In particular, in Sect. 3 we use the asymptotic periodicity of mixed-integer value functions to derive convex approximations for general mixed-integer mean-CVaR recourse models. Moreover, we derive error bounds for these convex approximations using the total variation error bounds on the expectation of periodic functions from [41]. We also use these total variation bounds in Sect. 4 in a specialized analysis of TU integer and simple integer mean-CVaR recourse models.

2.3.1 Total variation

Similar to the error bounds for risk-neutral models from the literature, the error bounds in this paper will depend on the total variation of the one-dimensional conditional density functions of the random right-hand side variables in the model. Therefore, we conclude this section by defining the notion of total variation and some related concepts.

Definition 2

Let \(f: \mathbb {R} \rightarrow \mathbb {R}\) be a real-valued function and let \(I \subset \mathbb {R}\) be an interval. Let \(\varPi (I)\) denote the set of all finite ordered sets \(P = \{z_1, \ldots , z_{N+1} \}\) with \(z_1< \cdots < z_{N+1}\) in I. Then, the total variation of f on I, denoted by \(|\varDelta |f(I)\), is defined by

$$\begin{aligned} |\varDelta |f(I) := \sup _{P \in \varPi (I)} V_f(P), \end{aligned}$$

where \(V_f(P) := \sum _{i = 1}^{N} | f(z_{i+1}) - f(z_i) |\). We write \(|\varDelta |f := |\varDelta |f(\mathbb {R})\). We say that f is of bounded variation if \(|\varDelta |f < +\infty \).

Since the error bounds that we derive in this paper depend on the total variations of the one-dimensional conditional density functions of the random right-hand side variables in the model, we assume that these conditional density functions are of bounded variation.

Definition 3

For every \(i = 1,\ldots ,m\) and \(t_{-i} \in \mathbb {R}^{m-1}\), define the ith conditional density function\(f_i(\cdot |t_{-i})\) of the m-dimensional joint pdf f as

$$\begin{aligned} f_i(t_i|t_{-i}) = {\left\{ \begin{array}{ll} \frac{f(t)}{f_{-i}(t_{-i})}, &{}\quad \text {if } f_{-i}(t_{-i})>0, \\ 0, &{}\quad \text {if } f_{-i}(t_{-i})=0, \end{array}\right. } \end{aligned}$$

where \(f_{-i}\) represents the (marginal) joint density function of \(h_{-i}\), the random vector obtained by removing the ith element of h.

Definition 4

We denote by \(\mathscr {H}^m\) the set of all m-dimensional joint pdfs f whose conditional density functions \(f_i(\cdot |t_{-i})\) are of bounded variation for all \(t_{-i} \in \mathbb {R}^{m-1}\), \(i=1,\ldots ,m\).

3 General two-stage mixed-integer mean-CVaR recourse models

In this section we will derive convex approximations with corresponding error bounds for general mixed-integer mean-CVaR recourse models. The approach is based on the analysis by Romeijnders et al. [40] for the risk-neutral case. Although our mean-CVaR recourse model can be reformulated as a risk-neutral recourse model, the resulting model differs in structure from the model considered in [40]. We first lay out this structural difference.

To reformulate our model as a risk-neutral model, note that by Definition 1,

$$\begin{aligned} R^{\beta }(x) = \min _{\zeta \in \mathbb {R}} \Big \{ \zeta + \tfrac{1}{1-\beta } \mathbb {E}_\xi \big [ (v(\xi , x) - \zeta )^+ \big ] \Big \}, \quad x \in \mathbb {R}^{n_1}. \end{aligned}$$
(7)

Based on this expression we introduce a new recourse function

$$\begin{aligned} R^*(x, \zeta ) = \mathbb {E}_\xi \big [ v^{\zeta }(\xi , x) \big ], \quad x \in \mathbb {R}^{n_1}, \ \zeta \in \mathbb {R}, \end{aligned}$$
(8)

where \(v^{\zeta }\) is the corresponding second-stage value function, defined as

$$\begin{aligned} v^{\zeta }(\xi , x) := (v(\xi , x) - \zeta )^+, \quad \xi \in \varXi , \ x \in \mathbb {R}^{n_1}, \ \zeta \in \mathbb {R}. \end{aligned}$$
(9)

Using these two functions the mixed-integer mean-CVaR recourse model (1) can be reformulated as

$$\begin{aligned} \min _{ x \in X, \zeta \in \mathbb {R}} \big \{ cx + (1-\rho )Q(x) + \rho \zeta + \rho \tfrac{1}{1-\beta }R^*(x, \zeta ) \big \}. \end{aligned}$$
(10)

Interpreting \(\zeta \) as a first-stage variable, as suggested by [38], we observe that (10) reduces to a risk-neutral mixed-integer recourse problem. Here, for any \(\xi \in \varXi \) and \(x \in \mathbb {R}^{n_1}\) the second-stage value function \(v^{\zeta }\) can be written as

$$\begin{aligned} v^{\zeta }(\xi , x) = \min _{y,\eta ,z} \{\eta \ | \ Tx + Wy = h, \ \eta - qy - z = -\,\zeta , \ y \in \mathbb {Z}^{n_2}_+ \times \mathbb {R}^{n_3}_+, \ \eta ,z \in \mathbb {R}_+ \}. \end{aligned}$$

Observe that the right-hand side of the constraint \(\eta - qy - z =-\,\zeta \) does not depend on h, but only on the first-stage variable \(\zeta \). This means that, in contrast with Romeijnders et al. [40], the problem in (10) corresponds to a risk-neutral mixed-integer recourse model in which not all right-hand side variables are random. Since the results in [40] heavily rely on the pdfs of these (continuously distributed) random right-hand side variables, they are not applicable to the risk-neutral reformulation above and hence, an additional analysis is necessary. Moreover, this subtle difference in the right-hand side has surprising consequences for the type of convex approximation that we will derive.

3.1 Asymptotic semi-periodicity of \(v^{\zeta }\)

The first step in our analysis is proving that the value function \(v^{\zeta }\) is asymptotically semi-periodic in h; see Proposition 1. By asymptotic semi-periodicity we mean that on particular unbounded subsets of its domain, \(v^{\zeta }\) is the sum of a linear and periodic function. Gomory [17] identified this for the pure integer case and Romeijnders et al. [40] generalized it to the mixed-integer case. In this section we use the notation of the latter reference. We also repeat some of the definitions they introduced for the sake of completeness.

To understand why \(v^{\zeta }\) exhibits semi-periodicity, consider the LP-relaxation \(v_\text {LP}\) of the mixed-integer value function v and let \(q \in \varXi ^q\) be fixed. By the basis decomposition theorem by Walkup and Wets [53], we can identify basis matrices \(B^k\) and corresponding polyhedral cones \(\varLambda ^k \subseteq \mathbb {R}^m\), \(k \in K^q\), such that for all \(h - Tx \in \varLambda ^k\), the function \(v_\text {LP}(\xi , x)\) attains its value through the basis matrix \(B^k\), i.e., \(v_\text {LP}(\xi , x) =q_{B^k}(B^k)^{-1}(h - Tx)\). We will see that a similar result holds for the mixed-integer value function v, but only on shifted versions \(\varLambda ^k(d^k)\) of the cones \(\varLambda ^k, \ k \in K^q\).

Remark 1

Throughout this paper we omit the dependence of, e.g., \(\varLambda ^k\) and \(d^k\) on q. Instead, we assume without loss of generality that the index sets \(K^q\), \(q \in \varXi ^q\), are disjoint, i.e., \(K^{q_1} \cap K^{q_2} = \emptyset \) for all \(q_1, q_2 \in \varXi ^q\) with \(q_1 \ne q_2\). Note, however, that it is still possible that, e.g., \(B^{k_1} = B^{k_2}\) for some \(k_1 \in K^{q_1}\), \(k_2 \in K^{q_2}\), with \(q_1 \ne q_2\).

Definition 5

Let \(\varLambda \subset \mathbb {R}^m\) be a closed convex cone and let \(d \in \mathbb {R}_+\) be given. Then, we define \(\varLambda (d)\) as the set of points in \(\varLambda \) with at least Euclidean distance d to the boundary of \(\varLambda \).

Romeijnders et al. [40] show that there exist constants \(d^k>0\), \(k \in K^q\), such that for all \(h - Tx \in \varLambda ^k(d^k)\), the mixed-integer value function \(v(\xi , x)\) attains its value through the basis matrix \(B^k\). That is, \(v(\xi , x) = q_{B^k}(B^k)^{-1}(h - Tx) + \psi ^k(h - Tx)\), where the function \(\psi ^k\) represents the “penalty” incurred from having integer decision variables. These functions \(\psi ^k\) are \(B^k\)-periodic on \(\varLambda ^k(d^k)\). It turns out that \(v^{\zeta }\) exhibits the same type of periodicity.

Definition 6

Let the function \(g : \mathbb {R}^m \rightarrow \mathbb {R}^n\) be given and let B be an \(m \times m \) matrix. Then, g is called B-periodic if \(g(x) =g(x+Bl)\) for every \(x \in \mathbb {R}^m\) and \(l \in \mathbb {Z}^m\).

Proposition 1

Consider the second-stage value function \(v^{\zeta }\) from (9) for a fixed \(q \in \varXi ^q\). Then, there exist dual feasible basis matrices \(B^k\) of \(v_\text {LP}\), closed convex polyhedral cones \(\varLambda ^k := \{ t \in \mathbb {R}^m \ | \ (B^k)^{-1}t \ge 0 \}\), positive constants \(d^k\) and \(r^k\), and \(B^k\)-periodic functions \(\psi ^k\), \(k \in K^q\), such that

  1. (i)

    \(\cup _{k = 1}^K \varLambda ^k = \mathbb {R}^m\),

  2. (ii)

    \(({{\,\mathrm{int}\,}}\varLambda ^k) \cap ({{\,\mathrm{int}\,}}\varLambda ^l) =\emptyset \) for every \(k, l \in K^q\) with \(k\ne l\),

  3. (iii)

    for every \(k \in K^q\),

    $$\begin{aligned} v^{\zeta }(\xi , x)&= \big ( q_{B^k} (B^k)^{-1} (h - Tx) + \psi ^k (h - Tx) - \zeta \big )^+, \quad h - Tx \in \varLambda ^k(d^k), \end{aligned}$$

    where \(\psi ^k \equiv \psi ^l\) if \(q_{B^k}(B^k)^{-1} =q_{B^l}(B^l)^{-1}\),

  4. (iv)

    for every \(k \in K^q\)

    $$\begin{aligned} 0 \le \psi ^k(s) \le r^k, \quad s \in \mathbb {R}^m. \end{aligned}$$

Proof

Since W is an integer matrix by Assumption 1(c), the result follows directly from Theorem 2.9 in [40] and the definition of \(v^{\zeta }\). \(\square \)

Proposition 1 shows that on shifted convex cones \(\varLambda ^k(d^k)\), the approximating value function \(v^{\zeta }\) is the positive part of the sum of a linear and a periodic function in h. Hence, \(v^{\zeta }\) is indeed asymptotically semi-periodic in h.

3.2 Convex approximations of \(v^{\zeta }\) and \(R^{\beta }\)

In this subsection we construct two convex approximations \(\hat{v}^{\zeta }\) and \(\tilde{v}^{\zeta }_\alpha \) of the second-stage value function \(v^{\zeta }\), yielding two corresponding convex approximations \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) of the CVaR recourse function \(R^{\beta }\). Moreover, we derive a characterization of the differences \(v^{\zeta } - \hat{v}^{\zeta }\) and \(v^{\zeta } - \tilde{v}^{\zeta }_\alpha \). These characterizations are used in Sect. 3.3 to derive upper bounds on the approximation errors \(| R^{\beta } - \hat{R}^{\beta } |\) and \(| R^{\beta } -\tilde{R}^{\beta }_\alpha |\).

3.2.1 Construction of the convex approximations

We will use the asymptotic periodicity of v from Proposition 1 in order to construct two types of convex approximations of \(v^{\zeta }\). For \(q \in \varXi ^q\), \(k \in K^q\), and \(\zeta \in \mathbb {R}\) given, we know from Proposition 1 that

$$\begin{aligned} v^{\zeta }(\xi , x)&= \big (q_{B^k} (B^k)^{-1} (h - Tx) + \psi ^k(h - Tx) - \zeta \big )^+, \quad h -Tx \in \varLambda ^k(d^k). \end{aligned}$$

Observe that the first-stage decision vector x appears as an argument of the \(B^k\)-periodic function \(\psi ^k\). This means that for \(h -Tx \in \varLambda ^k(d^k)\), the function \(v^{\zeta }(\xi , x)\) is periodic in x. This periodicity is the cause of the non-convexity of \(v^{\zeta }(\xi , x)\) in x. In order to construct convex approximations of \(v^{\zeta }\), we propose two “convexifying” adjustments to this periodic term \(\psi ^k(h - Tx)\).

A first convex approximation of \(v^{\zeta }\) is obtained by replacing \(\psi ^k\) by its mean value \(\varGamma ^k\). This results in a shifted version of the LP-relaxation with shifting constant \(\varGamma ^k\). Hence, we refer to this kind of approximation as the shifted LP-relaxation approximation. Since every \(B^k\)-periodic function is also \(p_kI_m\)-periodic with \(p_k := |\det (B^k)|\) (see [40]), we can characterize the mean value of \(\psi ^k\) as

$$\begin{aligned} \varGamma ^k := p_k^{-m} \int _0^{p_k} \cdots \int _{0}^{p_k} \psi ^k(s) ds_1 \cdots ds_m. \end{aligned}$$
(11)

Surprisingly, however, in our mean-CVaR recourse model we need to make an adjustment in order to be able to derive an asymptotically converging error bound. In particular, for \(k \in K^q\) with \(q_{B^k}=0\), we should use the mean value of \((\psi ^k - \zeta )^+ +\zeta \) instead of \(\varGamma ^k\). In Example 2 we illustrate in more detail why this adjustment is needed.

To construct a second convex approximation of \(v^{\zeta }\), we replace the term Tx in the argument of \(\psi ^k\) by a constant vector \(\alpha \in \mathbb {R}^m\), yielding \(\psi ^k(h - \alpha )\). We call the resulting approximation a generalized\(\alpha \)-approximation; cf. [49]. This approximation is still semi-periodic in h, and thus not convex in h. However, it is convex in x, which is what we desire for optimization purposes.

Both approaches above yield an approximation of \(v^{\zeta }(\xi ,x)\) for \(h - Tx \in \varLambda ^k(d^k)\) for each \(k \in K^q\). We combine these approximations by taking the pointwise maximum over all \(k \in K^q\).

Definition 7

Consider the mixed-integer value function \(v^{\zeta }\) from (9) and let \(B^k\), \(q_{B^k}\), and \(\psi ^k\), \(k \in K^q\), \(q \in \varXi ^q\), be the basis matrices, corresponding cost vectors, and \(B^k\)-periodic functions from Proposition 1, respectively. Then, we define the shifted LP-relaxation approximation\(\hat{v}^{\zeta }\) of \(v^{\zeta }\) by

$$\begin{aligned} \hat{v}^{\zeta }(\xi , x)&= \left( \max _{k \in K^q} \left\{ q_{B^k}(B^k)^{-1}(h - Tx) + \varGamma ^k_\zeta \right\} - \zeta \right) ^+, \quad \xi \in \varXi , \ x \in \mathbb {R}^{n_1}, \ \zeta \in \mathbb {R}. \end{aligned}$$

where for every \(k \in K^q\),

$$\begin{aligned} \varGamma _\zeta ^k&:= {\left\{ \begin{array}{ll} p_k^{-m} \int _0^{p_k} \cdots \int _{0}^{p_k} \psi ^k(s) ds_1 \cdots ds_m, &{}\quad \text {if }\,\, q_{B^k}\ne 0,\\ p_k^{-m} \int _0^{p_k} \cdots \int _{0}^{p_k} (\psi ^k(s) - \zeta )^+ ds_1 \cdots ds_m + \zeta , &{}\quad \text {if }\,\, q_{B^k}=0, \\ \end{array}\right. } \end{aligned}$$

with \(p_k := |\det (B^k)|\). Moreover, for every \(\xi \in \varXi \), \(x \in \mathbb {R}^{n_1}\), and \(\zeta \in \mathbb {R}\), we define the generalized\(\alpha \)-approximation\(\tilde{v}^{\zeta }_\alpha \) of \(v^{\zeta }\) with parameter \(\alpha \in \mathbb {R}^m\) by

$$\begin{aligned} \tilde{v}^{\zeta }_\alpha (\xi , x)&= \left( \max _{k \in K^q} \left\{ q_{B^k}(B^k)^{-1}(h - Tx) + \psi ^k(h - \alpha )\right\} - \zeta \right) ^+. \end{aligned}$$

As mentioned before, we make an adjustment to the shifted LP-relaxation approximation in the case \(q_{B^k} = 0\). Instead of using the mean value \(\varGamma ^k\) of \(\psi ^k\), we use the mean value of \((\psi ^k - \zeta )^+ + \zeta \). In the example below we show that this adjustment is necessary in order to derive error bounds that are asymptotically converging, in the sense that they converge to zero as the total variations of the conditional density functions of the random right-hand side variables \(h_i\), \(i=1,\ldots ,m\), go to zero.

Example 2

Consider a mixed-integer value function v given by

$$\begin{aligned} v(\xi , x) = \min \{ u \ | y^+ - y^- + u = h - x, \ y^+,y^- \in \mathbb {Z}_+, \ u \in \mathbb {R}_+ \}, \quad \xi \in \varXi , x \in \mathbb {R}, \end{aligned}$$

where \(\varXi ^q = \{1\}\), \(\varXi ^T = \{[1]\}\), and \(\varXi ^h = \mathbb {R}\). The LP-relaxation \(v_{LP}\) of v equals \(v_{LP} \equiv 0\), since for every \(\hat{h} := h - x \in \mathbb {R}\) with \(\hat{h} \ge 0\) we can select \(y^+ = \hat{h}, \ y^ - = u = 0\) and for \(\hat{h} < 0\) we can select \(y^- = -\hat{h}, \ y^+ = u = 0\). Indeed, if \(\hat{h} > 0\), then \(y^+\) is the basic variable corresponding to basis matrix \(B^1 =[1]\) with costs \(q_{B^1} = 0\) and if \(\hat{h} < 0\), then \(y^-\) is the basic variable corresponding to \(B^2 = [-1]\) with \(q_{B^2} = 0\). Since the mixed-integer value function v equals \(v(\xi , x) =\psi (\hat{h}) := \hat{h}-\lfloor \hat{h}\rfloor \) for all \(\hat{h} \in \mathbb {R}\), we have \(\psi ^1 = \psi ^2 = \psi \) and thus \(\varGamma ^1 = \varGamma ^2 =\int _0^1\psi (s)ds = \frac{1}{2}\).

Now suppose that we simply use \(\varGamma ^k\) (rather than \(\varGamma ^k_\zeta \)) to construct the convex approximation

$$\begin{aligned} \bar{v}^{\zeta }(\xi , x) = \left( \max _{k=1,2}\{ q_{B^k}(B^k)^{-1} (h - x) + \varGamma ^k - \zeta \}\right) ^+ = \left( \tfrac{1}{2} - \zeta \right) ^+, \quad \xi \in \varXi , \ x \in \mathbb {R}, \end{aligned}$$

of \(v^{\zeta }\) and the corresponding convex approximation \(\bar{R}^{\beta }(x) := \min _{\zeta \in \mathbb {R}}\big \{ \zeta +\tfrac{1}{1-\beta } \bar{R}^*(x,\zeta ) \big \}\) of \(R^{\beta }\), where \(\bar{R}^*(x, \zeta ) := \mathbb {E}_\xi \big [ \bar{v}^{\zeta }(\xi , x) \big ]\). We will show that the resulting approximation error \(\Vert R^{\beta } - \bar{R}^{\beta } \Vert _\infty \) is not asymptotically converging in general.

First note that for every \(x \in \mathbb {R}\) we have \(\bar{R}^\beta (x) = \min _{\zeta \in \mathbb {R}} \big \{ \zeta + \tfrac{1}{1-\beta } (\tfrac{1}{2} - \zeta )^+ \big \} = \text {CVaR}_\beta \big [\tfrac{1}{2}\big ] = \tfrac{1}{2}\) by definition of CVaR. Now, suppose that h is uniformly distributed on the interval [0, N], where N is a positive integer, and consider the value \(x=0\) for the first-stage decision variable. Then, since h is continuously distributed we know from [38] that \(R^{\beta }(x) = {{\,\mathrm{CVaR}\,}}_\beta [v(\xi , x)] = \mathbb {E}_h[v(\xi , x) \ | \ v(\xi , x) \ge q_\beta (x)]\), where \(q_\beta (x)\) is the \(\beta \)-quantile of \(v(\xi , x) = \psi (\hat{h}) = h - \lfloor h\rfloor \). It follows by straightforward computation that \(R^{\beta }(x) = 1 - \beta /2\). Hence, \(|R^{\beta }(x) -\bar{R}^{\beta }(x)| = |\tfrac{1}{2} - \beta /2|\), which is positive if \(\beta \ne \tfrac{1}{2}\). Note that this expression does not depend on N. Hence, as N goes to infinity (i.e., the total variation of the density function of h goes to zero), the approximation error remains constant, i.e., it does not converge to zero asymptotically. \(\triangle \)

Using the approximating value functions from Definition 7, we define corresponding convex approximations of the CVaR recoure function \(R^{\beta }\). These can be seen as extensions of the convex approximations in [40, 49] to our mean-CVaR setting.

Definition 8

Consider the CVaR recourse function \(R^{\beta }\) from (4). We define the shifted LP-relaxation approximation\(\hat{R}^{\beta }\) of \(R^{\beta }\) by

$$\begin{aligned} \hat{R}^{\beta }(x) := \min _{\zeta \in \mathbb {R}}\big \{ \zeta +\tfrac{1}{1-\beta } \hat{R}^*(x, \zeta ) \big \}, \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$

where \(\hat{R}^*(x, \zeta ) := \mathbb {E}_\xi \big [ \hat{v}^{\zeta }(\xi , x) \big ]\), with \(\hat{v}^{\zeta }\) defined in Definition 7. Moreover, we define the generalized\(\alpha \)-approximation\(\tilde{R}^{\beta }_\alpha \) of \(R^{\beta }\) with parameter \(\alpha \in \mathbb {R}^m\) by

$$\begin{aligned} \tilde{R}^{\beta }_\alpha (x) := \min _{\zeta \in \mathbb {R}}\big \{ \zeta + \tfrac{1}{1-\beta } \tilde{R}^*_\alpha (x, \zeta ) \big \}, \quad x \in \mathbb {R}^{n_1}, \end{aligned}$$

where \(\tilde{R}^*_\alpha (x, \zeta ) := \mathbb {E}_\xi \big [ \tilde{v}^{\zeta }_\alpha (\xi , x) \big ]\), with \(\tilde{v}^{\zeta }_\alpha \) defined in Definition 7.

Since the approximations from Definition 8 are convex, the resulting convex approximation models can be solved using techniques from convex optimization. As a result, they can be solved much more efficiently than the original (non-convex) model in (1). This is indeed true for the generalized \(\alpha \)-approximations, whereas for the shifted LP-relaxation approximation some computational challenges remain.

The first computational challenge is that the shifted LP-relaxation approximation \(\hat{R}^{\beta }\) requires computing the means \(\varGamma _\zeta ^k\) for all \(k \in K^q\). For special cases, such as pure integer recourse models with a totally unimodular recourse matrix W (cf. Sect. 4), it is possible to derive analytic expressions for these means. However, in general they need to be approximated in practical computations. In contrast, the generalized \(\alpha \)-approximations only need computation of the function values \(\psi ^k(h - \alpha )\), which are obtained by solving a single mixed-integer linear program, or in fact a Gomory relaxation of this mixed-integer linear program.

The second computational challenge is that the convex approximations are defined as the maximum over all dual feasible basis matrices \(B^k\), \(k \in K^q\), of which there are exponentially many in general. This challenge can be overcome for both approximations by taking the optimal basis matrix of the LP-relaxation instead of the maximum, see also [49]. This is again an approximation, but van der Laan and Romeijnders [49] show both theoretically and using numerical experiments that it yields good results.

Finally, we remark that for computational purposes the continuously distributed random vectors in the model need to be discretized. For instance, using Jensen [20] and Edmundson–Madansky [14, 30] lower and upper bounds or using a sample average approximation (SAA), see [24]. However, if the discretization is fine enough, this does not affect the quality of the convex approximations.

3.2.2 Properties of \(\hat{v}^{\zeta }\) and \(\tilde{v}^{\zeta }_\alpha \)

We now present several properties of the approximating value functions \(\hat{v}^{\zeta }\) and \(\tilde{v}^{\zeta }_\alpha \). In particular, we focus on the differences \(v^{\zeta } - \hat{v}^{\zeta }\) and \(v^{\zeta } - \tilde{v}^{\zeta }_\alpha \), which can be interpreted as the underlying difference functions in the approximation errors \(|R^{\beta } - \hat{R}^{\beta }|\) and \(|R^{\beta } -\tilde{R}^{\beta }_\alpha |\). Since several proofs of the results in this subsection are similar to the proofs of corresponding results in [40] for the risk-neutral case, we postpone them to the Appendix. Moreover, since the derivations for \(\hat{v}^{\zeta }\) and \(\tilde{v}^{\zeta }_\alpha \) are analogous, we will avoid repetition and focus on \(\hat{v}^{\zeta }\) in our discussions.

First we show that the difference between \(v^{\zeta }\) and its shifted LP-relaxation approximation \(\hat{v}^{\zeta }\) is uniformly bounded.

Lemma 1

Consider the value function \(v^{\zeta }\) from (9) and its shifted LP-relaxation approximation \(\hat{v}^{\zeta }\) and generalized \(\alpha \)-approximation \(\tilde{v}^{\zeta }_\alpha \) from Definition 7. Then, there exists a constant \(\gamma > 0\) such that for every \(\zeta \in \mathbb {R}\),

$$\begin{aligned} \Vert v^{\zeta } - \hat{v}^{\zeta }\Vert _\infty \le \gamma \quad \text {and} \quad \Vert v^{\zeta } - \tilde{v}^{\zeta }_\alpha \Vert _\infty \le \gamma . \end{aligned}$$

Proof

See Appendix. \(\square \)

Next, we work towards a characterization of the difference \(v^{\zeta } - \hat{v}^{\zeta }\) in terms of periodic functions. Recall from Proposition 1 that for any given \(q \in \varXi ^q\), \(k \in K^q\), and \(h - Tx \in \varLambda ^k(d_k)\), the value of \(v^{\zeta }(\xi , x)\) is generated by the dual feasible basis matrix \(B^k\), i.e., \(v^{\zeta }(\xi , x) = \big (q_{B^k}(B^k)^{-1}(h-Tx) +\psi ^k(h - Tx) - \zeta \big )^+\). The following lemma shows that on a subset \(\sigma ^k + \varLambda ^k\) of \(\varLambda ^k(d_k)\), the convex approximation \(\hat{v}^{\zeta }(\xi , x)\) is generated by the same basis matrix \(B^k\).

Lemma 2

Consider the value function \(v^{\zeta }\) from (9) and its shifted LP-relaxation approximation \(\hat{v}^{\zeta }\) from Definition 8. Moreover, let \(B^k\), \(\varLambda ^k\), and \(d^k\) be the basis matrices, cones, and scalars from Proposition 1. Then, for every \(q \in \varXi ^q\) and \(k \in K^q\), there exists a vector \(\sigma ^k \in \varLambda ^k(d_k)\) such that

$$\begin{aligned} \hat{v}^{\zeta }(\xi , x) = \big ( q_{B^k}(B^k)^{-1}(h - Tx) +\varGamma ^k_\zeta - \zeta \big )^+, \quad h - Tx\in \sigma ^k+\varLambda ^k, \end{aligned}$$

and

$$\begin{aligned} \tilde{v}^{\zeta }_\alpha (\xi , x) = \big ( q_{B^k}(B^k)^{-1}(h - Tx) +\psi ^k(h - \alpha ) - \zeta \big )^+, \quad h-Tx\in \sigma ^k+\varLambda ^k, \end{aligned}$$

Proof

See Appendix. \(\square \)

Since \(\sigma ^k + \varLambda ^k \subseteq \varLambda ^k(d^k)\), it now follows that for all \(h - Tx \in \sigma ^k + \varLambda ^k\), both \(v^{\zeta }\) and \(\hat{v}^{\zeta }\) are generated by the same basis matrix \(B^k\). Using this fact, we can derive subsets of \(\sigma ^k + \varLambda ^k\), \(k \in K^q\), on which the difference \(v^{\zeta } - \hat{v}^{\zeta }\) is \(B^k\)-periodic with a mean value of zero. In particular, if \(q_{B^k} \ne 0\), then (using \(0 \le \psi ^k \le r^k\)),

$$\begin{aligned} v^{\zeta }(\xi , x) - \hat{v}^{\zeta }(\xi , x) = {\left\{ \begin{array}{ll} \psi ^k(h - Tx) - \varGamma ^k, &{}\quad \text {if }\,\, q_{B^k}(B^k)^{-1}(h - Tx) \ge \zeta , \\ 0, &{}\quad \text {if } \,\, q_{B^k}(B^k)^{-1}(h - Tx) \le \zeta - r^k, \end{array}\right. } \end{aligned}$$

whereas if \(q_{B^k} = 0\) we have (using the definition of \(\varGamma ^k_\zeta \))

$$\begin{aligned} v^{\zeta }(\xi , x) - \hat{v}^{\zeta }(\xi , x)&= \big (\psi ^k(h - Tx) -\zeta \big )^+ - \mu ^k_\zeta , \end{aligned}$$

where \(\mu ^k_\zeta := p_k^{-m} \int _0^{p_k} \cdots \int _{0}^{p_k} (\psi ^k(s) - \zeta )^+ ds_1 \ldots ds_m\). Indeed the right-hand sides above are \(B^k\)-periodic functions of h. Moreover, it can be shown that the complement of these subsets on which \(v^{\zeta } - \hat{v}^{\zeta }\) is \(B^k\)-periodic, \(k \in K^q\), is “relatively small”, in the sense that it can be covered by finitely many hyperslices. We summarize these results below.

Definition 9

A hyperslice in \(\mathbb {R}^m\) is a set H of the form

$$\begin{aligned} H := \{ s \in \mathbb {R}^m \ | \ b \le a^T s \le b + \delta \}, \end{aligned}$$

where \(a \in \mathbb {R}^m {\setminus } \{0\}\), \(b \in \mathbb {R}\), and \(\delta \in \mathbb {R}\) with \(\delta > 0\).

Proposition 2

Consider the value function \(v^{\zeta }\) from (9) and its convex approximations \(\hat{v}^{\zeta }\) and \(\tilde{v}^{\zeta }_\alpha \) from Definition 7. Then, for every \(q \in \varXi ^q\) and \(\zeta \in \mathbb {R}\), there exists a finite number of closed convex polyhedral sets \(\mathscr {A}_j \subseteq \mathbb {R}^m\), \(j \in J^q_\zeta \), whose interiors are mutually disjoint, such that

  1. (i)

    for all \(h - Tx \in \mathscr {A}_j\), \(j \in J^q_\zeta \), we can write

    $$\begin{aligned} v^{\zeta }(\xi , x) - \hat{v}^{\zeta }(\xi , x) = \phi ^{\zeta }_j(h - Tx), \ \ \text {and} \ \ \ v^{\zeta }(\xi , x) - \tilde{v}^{\zeta }_\alpha (\xi , x) =\bar{\phi }^{\zeta }_j(h - Tx), \end{aligned}$$

    where \(\phi ^{\zeta }_j\) and \(\bar{\phi }^{\zeta }_j\) are bounded \(B^k\)-periodic functions for some \(k \in K^q\) with mean value equal to zero.

  2. (ii)

    the set \(\mathscr {N}^q_\zeta := \mathbb {R}^m {\setminus } \bigcup _{j \in J^q_\zeta } \mathscr {A}_j\) can be covered by finitely many hyperslices.

Proof

See Appendix. \(\square \)

3.3 Total variation error bounds

We now derive upper bounds on the approximation errors \(|R^{\beta }(x) -\hat{R}^{\beta }(x)|\) and \(|R^{\beta }(x) -\tilde{R}^{\beta }_\alpha (x)|\) using the results from Sect. 3.2.2. We outline our approach for \(\hat{R}^{\beta }\); the analysis for \(\tilde{R}^{\beta }_\alpha \) is analogous.

We first derive an upper bound on \(|R^*(x, \zeta ) - \hat{R}^*(x, \zeta )|\). For every \(x \in \mathbb {R}^{n_1}\) and \(\zeta \in \mathbb {R}\), we have by definition of \(R^*\) and \(\hat{R}^*\) that

$$\begin{aligned} |R^*(x, \zeta ) - \hat{R}^*(x, \zeta )|&= \big | \mathbb {E}_\xi \big [v^{\zeta }(\xi , x) \big ] - \mathbb {E}_\xi \big [\hat{v}^{\zeta }(\xi , x) \big ]\big | \nonumber \\&\le \mathbb {E}_{q,T}\Big [ \big | \mathbb {E}_h\big [ v^{\zeta }(\xi , x) - \hat{v}^{\zeta }(\xi , x) \big ] \big | \Big ] \nonumber \\&= \mathbb {E}_{q,T}\left[ \big | \int _{\mathbb {R}^m}\big ( v^{\zeta }(q, T, s, x) - \hat{v}^{\zeta }(q, T, s, x) \big ) f(s) ds \big | \right] , \end{aligned}$$
(12)

where we use that the right-hand side vector h is independent from (qT) by Assumption 1(e). Consider the integral over \(\mathbb {R}^m\) in (12) for a fixed \(q \in \varXi ^q\) and \(T \in \varXi ^T\). The main idea is to use Proposition 2 to split up this integral into integrals over two types of subsets of \(\mathbb {R}^m\): subsets \(\mathscr {A}_j\), \(j \in J^q_\zeta \), on which the expression \(v^{\zeta } - \hat{v}^{\zeta }\) in the integrand is a \(B^k\)-periodic function for some \(k \in K^q\), and the complement \(\mathscr {N}^q_\zeta \) of these subsets. Then, the integrals over \(\mathscr {A}_j\), \(j \in J^q_\zeta \), can be bounded using a result from [40] that exploits periodicity in the integrand. Furthermore, the integral over the complement set \(\mathscr {N}^q_\zeta \) can be bounded using Lemma 1 and another result in [40] that provides an upper bound on the probability \(\mathbb {P}\{ h - Tx \in \mathscr {N}^q_\zeta \ | \ q, T \}\). Together, this yields a uniform upper bound on \(|R^*(x, \zeta ) -\hat{R}^*(x, \zeta )|\). Finally, is not hard to prove that this also constitutes an upper bound on \(\Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty \).

Theorem 1

Consider the CVaR recourse function \(R^{\beta }\) from (4). Moreover, consider its shifted LP-relaxation approximation \(\hat{R}^{\beta }\) and generalized \(\alpha \)-approximation \(\tilde{R}^{\beta }_\alpha \) with parameter \(\alpha \in \mathbb {R}^m\) from Definition 8. Then, there exist finite, positive constants \(C_1\) and \(C_2\) such that for all \(f \in \mathscr {H}^m\) we have

$$\begin{aligned} \Vert R^{\beta } - \hat{R}^{\beta }\Vert _\infty \le \frac{1}{1-\beta }C_1 \sum _{i=1}^{m} \mathbb {E}_{{h}_{-i}} \big [ |\varDelta |f_i(\cdot |h_{-i}) \big ], \end{aligned}$$
(13)

and

$$\begin{aligned} \Vert R^{\beta } - \tilde{R}^{\beta }_\alpha \Vert _\infty \le \frac{1}{1-\beta }C_2 \sum _{i=1}^{m} \mathbb {E}_{{h}_{-i}} \big [ |\varDelta |f_i(\cdot |h_{-i}) \big ]. \end{aligned}$$
(14)

Proof

We will prove (13); the proof of (14) is completely analogous. First, we show that \(\Vert R^{\beta } -\hat{R}^{\beta } \Vert _\infty \le \tfrac{1}{1-\beta } \Vert R^* - \hat{R}^* \Vert _\infty \). Fix \(x \in \mathbb {R}^{n_1}\) and let \(\zeta ^*\) be the minimizer in the minimization representation of \(R_\beta (x)\) in (7). Since \(\zeta ^*\) is not necessarily optimal for the minimization problem defining \(\hat{R}_\beta (x)\) in Definition 8, we have \(\hat{R}_\beta (x) -R_\beta (x) \le \tfrac{1}{1-\beta }\big (\hat{R}^*_\beta (x,\zeta ^*) -R^*_\beta (x,\zeta ^*)\big ) \le \tfrac{1}{1-\beta } \Vert \hat{R}^*_\beta - R^*_\beta \Vert _\infty \). Using an analogous argument for the reverse difference, we obtain \(\Vert R_\beta - \hat{R}_\beta \Vert _\infty \le \tfrac{1}{1-\beta } \Vert \hat{R}^*_\beta - R^*_\beta \Vert _\infty \).

Next, we derive a constant \(C_1\) such that \(\Vert R^* -\hat{R}^*\Vert _\infty \le C_1 \sum _{i=1}^{m} \mathbb {E}_{{h}_{-i}} \left[ |\varDelta |f_i(\cdot |h_{-i}) \right] \). Let \(x \in \mathbb {R}^{n_1}\) and \(\zeta \in \mathbb {R}\) be given and take (12) as a starting point. Splitting up the integral in the right-hand side of (12) according to Proposition 2 yields

$$\begin{aligned} \Big | \int _{\mathbb {R}^m}\big ( v^{\zeta }(\xi _s, x) - \hat{v}^{\zeta }(\xi _s, x) \big ) f(s) ds \Big |&\le \sum _{j \in J^q_\zeta } \Big | \int _{Tx + \mathscr {A}_j}\phi ^{\zeta }_j(s - Tx) f(s) ds \Big | \nonumber \\&\quad + \int _{Tx + \mathscr {N}^q_\zeta }\big | v^{\zeta }(\xi _s, x) -\hat{v}^{\zeta }(\xi _s, x) \big | f(s) ds, \end{aligned}$$
(15)

where we write \(\xi _s := (q, T, s)\), \(s \in \mathbb {R}^m\). Consider the first term in the right-hand side of (15). Since \(Tx + \mathscr {A}_j\) is a convex set and \(\phi ^j_\zeta \) is a bounded zero-mean \(B^{k_j}\)-periodic function for some \(k_j \in K^q\), we can apply Theorem 4.13 from [40] to obtain

$$\begin{aligned} \Big |\int _{Tx + \mathscr {A}_j} \phi ^j_\zeta (s) f(s) ds \Big | \le \frac{1}{4}r^{k_j} |\det (B^{k_j})| \sum _{i=1}^{m} \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ]. \end{aligned}$$
(16)

Next, consider the second term in the right-hand side of (15). Applying Lemma 1 to this integral, we obtain

$$\begin{aligned} \int _{Tx + \mathscr {N}^q_\zeta } \big | v^{\zeta }(\xi _s, x) -\hat{v}^{\zeta }(\xi _s, x) \big | f(s) ds&\le \gamma \int _{Tx + \mathscr {N}^q_\zeta } f(s) ds \nonumber \\&= \gamma \mathbb {P}\{ h - Tx \in \mathscr {N}^q_\zeta \ | \ q, T \}. \end{aligned}$$
(17)

By Proposition 2(ii), the set \(\mathscr {N}^q_\zeta \) in the right-hand side above can be covered by finitely many hyperslices. By Theorem 4.6 from [40], this implies that there exists a constant \(D^q > 0\) such that \(\mathbb {P}\{ h - Tx \in \mathscr {N}^q_\zeta \ | \ q, T \} \le D^q \sum _{i=1}^{m} \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ]\). Substituting this into (17) yields

$$\begin{aligned} \int _{\mathscr {N}^q_\zeta }\big | v^{\zeta }(\xi _s, x) -\hat{v}^{\zeta }(\xi _s, x) \big | f(s) ds&\le \gamma D^q \sum _{i=1}^{m} \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ], \end{aligned}$$
(18)

for some constant \(D^q>0\). Now, defining \(C^q := \gamma D^q +\sum _{j \in J^q_\zeta }\frac{1}{4}r^{k_j} |\det (B^{k_j})|\), and substituting (16) and (18) into (15), we obtain

$$\begin{aligned} \Big | \int _{\mathbb {R}^m}\big ( v^{\zeta }(\xi _s, x) - \hat{v}^{\zeta }(\xi _s, x) \big ) f(s) ds \Big | \le C^q \sum _{i=1}^{m} \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ]. \end{aligned}$$
(19)

Finally, defining \(C_1 := \max _{q \in \varXi ^q} C^q\) and substituting (19) into (12) yields

$$\begin{aligned} |R^*(x, \zeta ) - \hat{R}^*(x, \zeta )|&\le \mathbb {E}_{q, T} \left[ C^q \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ |\varDelta |f_i(\cdot | h_{-i}) \big ] \right] \\&\le C_1 \sum _{i=1}^{m} \mathbb {E}_{h_{-i}}\big [ |\varDelta |f_i(\cdot | h_{-i}) \big ]. \end{aligned}$$

Now, (13) follows from the inequality \(\Vert R_\beta - \hat{R}_\beta \Vert _\infty \le \tfrac{1}{1-\beta } \Vert \hat{R}^*_\beta - R^*_\beta \Vert _\infty \) and the observation that the right-hand side above does not depend on the value of x or \(\zeta \). \(\square \)

The error bounds from Theorem 1 are asymptotically converging, i.e., they converge to zero as the total variations of the density functions of the random right-hand side variables in the model converge to zero. For instance, for independently distributed normal random variables this is the case if all standard deviations \(\sigma _i\) go to \(\infty \). In fact, Theorem 1 implies that any mixed-integer CVaR recourse function \(R_\beta \) can be approximated reasonably well by a convex approximation \(\hat{R}_\beta \) or \(\tilde{R}^{\beta }_\alpha \) if the aforementioned total variations are small.

Interestingly, the error bounds from Theorem 1 differ from their risk-neutral counterparts in Proposition 3 only by an additional factor \(\tfrac{1}{1-\beta }\). Hence, combining these error bounds with corresponding risk-neutral error bounds as suggested in (6) results in an expression for the joint error bound with a similar asymptotic behavior.

4 Two-stage TU integer mean-CVaR recourse models

In this section we derive tighter error bounds for the special case of two-stage TU integer mean-CVaR recourse models. That is, we consider the model from Sect. 2.1 and we make the additional assumption that the second-stage value function can be written as

$$\begin{aligned} v(\xi , x) := \min _{\bar{y}} \left\{ \bar{q}\bar{y} \ | \ \bar{W}\bar{y} \ge h - Tx, \ \bar{y} \in \mathbb {Z}_+^{n_2} \right\} , \end{aligned}$$
(20)

where \(\bar{W}\) is a totally unimodular matrix. This is indeed a special case of the value function (5) from Sect. 2.1, with \(n_3 = m\), \(q = (\bar{q}, 0)\), \(y = (\bar{y}, z)\), and \(W = [\bar{W} \,{-}\,I_m]\), where \(I_m\) is the \(m \times m\) identity matrix. We exploit the special structure of this model to derive sharper error bounds for the shifted LP-relaxation and generalized \(\alpha \)-approximation than those in Theorem 1.

4.1 Convex approximations

The TU integer structure of the value function v from (20) allows for simplified representations of the convex approximations \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) from Definition 7. These will be used in the proofs of the tighter error bounds in Theorem 2 and 3. We first derive a simplified representation of v itself.

Since \(\bar{W}\) is a TU (and thus, integer) matrix, it follows that

$$\begin{aligned} v(\xi , x)&= \min _{\bar{y}} \big \{ \bar{q}\bar{y} \ | \ \bar{W}\bar{y} \ge \lceil h - Tx \rceil , \ \bar{y} \in \mathbb {Z}_+^{n_2} \big \} \\&= \min _{\bar{y}} \big \{ \bar{q}\bar{y} \ | \ \bar{W}\bar{y} \ge \lceil h - Tx \rceil , \ \bar{y} \in \mathbb {R}_+^{n_2} \big \}, \end{aligned}$$

where the round-up operator \(\lceil \cdot \rceil \) is defined element-wise for vectors. By Assumption 1(a) and strong LP-duality, we obtain the dual maximization problem

$$\begin{aligned} v(\xi , x)&= \max _{\lambda } \big \{ \lambda \lceil h - Tx\rceil \ | \ \lambda \bar{W} \le \bar{q}, \ \lambda \in \mathbb {R}_+^m \big \}. \end{aligned}$$

Here, the dual feasible region \(\{ \lambda \in \mathbb {R}_+^m \ | \ \lambda \bar{W} \le \bar{q} \}\) is a non-empty, bounded polyhedron for every \(q \in \varXi ^q\), and hence it has a positive, finite number of extreme points. These extreme points can be characterized as \(\lambda ^k := q_{B^k}(B^k)^{-1}, \ k \in K^q\). Note that at least one of these points is optimal in the dual problem. Hence, we can write

$$\begin{aligned} v(\xi , x) = \max _{k \in K^q} \big \{ \lambda ^k \lceil h - Tx\rceil \big \}. \end{aligned}$$
(21)

Based on (21) we can derive simplified representations of the convex approximations \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) from Definition 7.

Lemma 3

Let \(R^{\beta }(x) = {{\,\mathrm{CVaR}\,}}_\beta [v(\xi , x)]\) be the CVaR recourse function from (4), where v is the TU integer value function from (20). Then, the convex approximations \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) from Definition 8 can be represented as

$$\begin{aligned} \hat{R}^{\beta }(x) = {{\,\mathrm{CVaR}\,}}_\beta \big [\hat{v}(\xi , x)\big ], \quad \tilde{R}^{\beta }_\alpha (x) = {{\,\mathrm{CVaR}\,}}_\beta \big [\tilde{v}_\alpha (\xi , x)\big ], \end{aligned}$$

for all \(x \in \mathbb {R}^{n_1}\), where \(\hat{v}\) and \(\tilde{v}_\alpha \) are defined by

$$\begin{aligned} \hat{v}(\xi , x)&= \max _{k \in K^q} \big \{ \lambda ^k \big (h - Tx + \tfrac{1}{2} \iota _m\big ) \big \},&\tilde{v}_\alpha (\xi , x) = \max _{k \in K^q} \big \{ \lambda ^k (\lceil h - \alpha \rceil + \alpha - Tx) \big \}, \end{aligned}$$

for all \(\xi \in \varXi \), \(x \in \mathbb {R}^{n_1}\), where \(\iota _m = (1, \ldots , 1) \in \mathbb {R}^m\).

Proof

Let \(\xi \in \varXi \), \(\zeta \in \mathbb {R}\), and \(x \in \mathbb {R}^{n_1}\) be given and consider the function \(\hat{v}^{\zeta }(\xi , x)\) from Definition 7. By Example 3.4 in [40] it follows from straightforward analysis that \(\hat{v}^{\zeta }(\xi ,x) = (\hat{v}(\xi , x) - \zeta )^+\). Then, from the definition of \(\hat{R}^{\beta }\) and the definition of CVaR, it follows that \(\hat{R}^{\beta }(x) = {{\,\mathrm{CVaR}\,}}[\hat{v}(\xi , x)]\). The proof for \(\tilde{R}^{\beta }_\alpha \) is analogous. \(\square \)

Note that the convex approximations \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) in Lemma 3 are structurally similar to the original CVaR recourse function \(\hat{R}^{\beta }\), while the approximating value functions \(\hat{v}\) and \(\tilde{v}_\alpha \) are structurally similar to the mixed-integer value function v in (21).

4.2 Error bounds

In this subsection we derive tight error bounds for the shifted LP-relaxation approximation \(\hat{R}^{\beta }\) and the generalized \(\alpha \)-approximation \(\tilde{R}^{\beta }_\alpha \) by exploiting the TU integer structure of the value function v. Since the derivations for \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) are analogous, we only discuss the derivation for the former.

Our approach to derive sharp error bounds consists of three main steps. First, in Lemma 4 we find an upper bound on the approximation error \(\hat{R}^{\beta }(x) -R^{\beta }(x)\) in terms of the approximation error for a risk-neutral recourse function, under a conditional probability distribution. Second, we apply existing results from the risk-neutral literature to this approximation error to obtain an error bound, in terms of this conditional probability distribution. Finally, we rewrite this error bound in terms of the original probability distribution; the resulting error bounds are presented in Theorems 2 and 3.

By definition of CVaR we have

$$\begin{aligned} R^{\beta }(x) = \min _{\zeta \in \mathbb {R}} \big \{ \zeta + \tfrac{1}{1 - \beta } \mathbb {E}_{\xi }[ (v(\xi , x) - \zeta )^+] \big \}, \end{aligned}$$

where an optimal argument \(\zeta \) is given by the \(\beta \)-value-at-risk (VaR) of \(v(\xi , x)\), defined by \(\zeta ^{\beta }(x) := \min \big \{ \zeta \in \mathbb {R}\ | \ \mathbb {P}\{ v(\xi , x) \le \zeta \} \ge \beta \big \}\) ; see [38]. By Lemma 3, the approximation \(\hat{R}^{\beta }(x)\) has a similar representation, with the \(\beta \)-VaR of \(\hat{v}(\xi , x)\) as an optimal argument: \(\hat{\zeta }^{\beta }(x) := \min \big \{ \zeta \in \mathbb {R}\ | \ \mathbb {P}\{\hat{v}(\xi , x) \le \zeta \} \ge \beta \big \}\). Note that \(\zeta ^{\beta }(x) \ne \hat{\zeta }^{\beta }(x)\) in general. However, since \(\zeta ^{\beta }(x)\) is optimal for \(R^{\beta }(x)\) and feasible for \(\hat{R}^{\beta }(x)\), we obtain the inequality

$$\begin{aligned} \hat{R}^{\beta }(x) - R^{\beta }(x) \le \frac{1}{1-\beta } \mathbb {E}_{q,T} \Big [\mathbb {E}_{h} \big [ (\hat{v}(\xi , x) - \zeta ^{\beta }(x))^+ - (v(\xi , x) -\zeta ^{\beta }(x))^+ \big ] \Big ]. \end{aligned}$$
(22)

Using this inequality as a starting point, we will derive an upper bound on the approximation error \(\hat{R}^{\beta }(x) - R^{\beta }(x)\). An analogous derivation will yield an upper bound on the reverse difference \(R^{\beta }(x) - \hat{R}^{\beta }(x)\).

We start by deriving an upper bound on the expression

$$\begin{aligned} \varDelta ^{\beta }(x;q,T) := \mathbb {E}_{h} \big [ (\hat{v}(\xi , x) -\zeta ^{\beta }(x))^+ - (v(\xi , x) - \zeta ^{\beta }(x))^+ \big ] \end{aligned}$$
(23)

in the right-hand side of (22). For the sake of argument, suppose that we could remove the positive part operators in (23). Then, we would obtain \(\varDelta ^{\beta }(x;q,T) = \mathbb {E}_{h} \big [ \hat{v}(\xi , x) - v(\xi , x) \big ]\). Note that this is the approximation error for a risk-neutral recourse function. Hence, we could directly apply existing results from the risk-neutral literature [41] to obtain an upper bound. Using this idea, we take the approach of conditioning on two complementary cases. In the first case, the positive part operators indeed drop out, while the second case reduces to zero.

Lemma 4

Let \(q \in \varXi ^q\), \(T \in \varXi ^T\), and \(x \in \mathbb {R}^{n_1}\) be given and consider \(\varDelta ^{\beta }(x;q,T)\) from (23). Then,

$$\begin{aligned} \varDelta ^{\beta }(x;q,T) \le \mathbb {P}\{ \hat{v}(\xi , x)> \zeta ^{\beta }(x) \ | \ q, T \} \mathbb {E}_{h}\left[ \hat{v}(\xi , x) - v(\xi , x) \ | \ \hat{v}(\xi , x) > \zeta ^{\beta }(x) \right] . \end{aligned}$$

Proof

We take (23) as a starting point and consider the complementary cases \(\hat{v}(\xi , x) > \zeta ^{\beta }(x)\) and \(\hat{v}(\xi , x) \le \zeta ^{\beta }(x)\). First, suppose that \(\hat{v}(\xi , x) > \zeta ^{\beta }(x)\). Then, \((\hat{v}(\xi , x) - \zeta ^{\beta }(x))^+ = \hat{v}(\xi , x) - \zeta ^{\beta }(x)\). Using this fact and \((v(\xi , x) - \zeta ^{\beta }(x))^+ \ge v(\xi , x) - \zeta ^{\beta }(x)\), we obtain

$$\begin{aligned} (\hat{v}(\xi , x) - \zeta ^{\beta }(x))^+ - (v(\xi , x) -\zeta ^{\beta }(x))^+&\le \hat{v}(\xi , x) - v(\xi , x). \end{aligned}$$
(24)

Second, suppose that \(\hat{v}(\xi , x) \le \zeta ^{\beta }(x)\). Then, \((\hat{v}(\xi , x) - \zeta ^{\beta }(x))^+ = 0\). Using \((v(\xi , x) -\zeta ^{\beta }(x))^+ \ge 0\), we get

$$\begin{aligned} (\hat{v}(\xi , x) - \zeta ^{\beta }(x))^+ - (v(\xi , x)-\zeta ^{\beta }(x))^+&\le 0. \end{aligned}$$
(25)

Using (24) and (25) and defining \(p^{\beta }_x := \mathbb {P}\{ \hat{v}(\xi , x) > \zeta ^{\beta }(x) \ | \ q, T \} = 0\), we find by conditioning on \(\hat{v}(\xi , x) > \zeta ^{\beta }(x)\) and \(\hat{v}(\xi , x) \le \zeta ^{\beta }(x)\) that

$$\begin{aligned} \varDelta ^{\beta }(x;q,T)\le & {} p^{\beta }_x \mathbb {E}_{h}\big [\hat{v}(\xi , x) -v(\xi , x) \ | \ \hat{v}(\xi , x) > \zeta ^{\beta }(x) \big ] \\&\quad +\, (1 - p^{\beta }_x) \mathbb {E}_{h}\big [0 \ | \ \hat{v}(\xi , x) \le \zeta ^{\beta }(x) \big ] . \end{aligned}$$

The result follows from the observation that the second term above equals zero. \(\square \)

Remark 2

In Lemma 4 it could be that \(\mathbb {P}\{\hat{v}(\xi , x) > \zeta ^{\beta }(x) \ | \ q, T \} = 0\), in which case the conditional expectation \(\mathbb {E}_{h}[\hat{v}(\xi , x) -v(\xi , x) \ | \ \hat{v}(\xi , x) > \zeta ^{\beta }(x) ]\) is ill-defined. In that case, we define this conditional expectation as zero. Then, we clearly have that \(\varDelta ^{\beta }(x; q, T) \le 0\), so Lemma 4 remains valid. \(\triangle \)

Lemma 4 provides an upper bound on \(\varDelta ^{\beta }(x;q,T)\) in terms of the approximation error of a risk-neutral model under a conditional probability distribution. This means that we can directly apply existing error bounds for risk-neutral recourse functions to obtain an upper bound on \(\varDelta ^{\beta }(x; q, T)\) and thus, on \(\hat{R}^{\beta }(x) - R^{\beta }(x)\). Note, however, that this upper bound will be in terms of the conditional pdf of h, given \(\hat{v}(\xi , x) > \zeta ^{\beta }(x)\). By rewriting this upper bound in terms of the original pdf f of h, we obtain the error bounds in Theorem 2. These uniform error bounds can be interpreted as the risk-averse generalizations of Proposition 4 in the Appendix.

Theorem 2

Consider the CVaR recourse function \(R^{\beta }\) from (4), where v is the TU integer value function from (20), and consider its shifted LP-relaxation approximation \(\hat{R}^{\beta }\) and generalized \(\alpha \)-approximation \(\tilde{R}^{\beta }_\alpha \) from Definition 8. Then, if \(f \in \mathscr {H}^m\), we have

$$\begin{aligned} \Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty&\le \frac{1}{2(1-\beta )}\sum _{i=1}^m \bar{\lambda }^*_i g \Big (\mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ] \Big ), \end{aligned}$$
(26)
$$\begin{aligned} \Vert R^{\beta } - \tilde{R}^{\beta }_\alpha \Vert _\infty&\le \frac{1}{1-\beta } \sum _{i=1}^m \bar{\lambda }^*_i g\Big ( \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ] \Big ), \end{aligned}$$
(27)

where for every \(i=1,\ldots ,m\), we have \(\bar{\lambda }^*_i := \mathbb {E}_q[\lambda ^*_{q,i} ]\), with \(\lambda ^*_{q,i} := \max _{k \in K^q} \{\lambda ^k_i \}\), \(q \in \varXi ^q\), and the function \(g:\mathbb {R}_+ \rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} g(t) ={\left\{ \begin{array}{ll} t/8, &{}\quad 0 \le t \le 4, \\ 1 - 2/t, &{}\quad t > 4. \end{array}\right. } \end{aligned}$$
(28)

Proof

See Appendix. \(\square \)

In comparison with Theorem 1, Theorem 2 provides tractable analytic expressions (in terms of \(\bar{\lambda }^*_i\)) for the constants \(C_1\) and \(C_2\). Using these expressions, the error bounds from Theorem 2 are generally much tighter than those from Theorem 1. Moreover, observe that the error bounds from Theorem 2 differ from their risk-neutral counterparts in Proposition 4 only in the additional factor \(\tfrac{1}{1-\beta }\), similar as for the error bounds from Theorem 1 in Sect. 3. Finally, it should be noted that the error bounds for the shifted LP-relaxation approximation \(\hat{R}^{\beta }\) are a factor 2 smaller than those for the \(\alpha \)-approximation \(\tilde{R}^{\beta }_\alpha \).

It turns out that we can derive even tighter bounds by exploiting the fact that the expectation in Lemma 4 is conditional on \(\hat{v}(\xi , x) > \zeta ^{\beta }(x)\). Intuitively, this means that the (upper bound on the) approximation error \(\hat{R}^{\beta }(x) - R^{\beta }(x)\) is only determined by values of \(\xi \) for which \(\hat{v}(\xi , x)\) is large. Since the TU integer approximating value function \(\hat{v}\) is monotone in \(h_i\), it follows that for a given x, q, T, and \(h_{-i}\), this is equivalent to \(h_i \ge \tau _i\) for some \(\tau _i \in \mathbb {R}\). Hence, we only need to account for the total variation over the interval \([\tau _i, +\infty )\), for some appropriately defined scalar \(\tau _i\).

Definition 10

Let v be the second-stage value function from (20) and let \(\hat{v}\) and \(\tilde{v}_\alpha \) be as in Lemma 3. Furthermore, let \(\zeta ^{\beta }(x) := \min \big \{ \zeta \in \mathbb {R}\ | \ \mathbb {P}\{ v(\xi , x) \le \zeta \} \ge \beta \big \}\) denote the \(\beta \)-VaR of \(v(\xi , x)\) and similarly, let \(\hat{\zeta }^{\beta }(x)\) and \(\tilde{\zeta }^{\beta }_\alpha (x)\) denote the \(\beta \)-VaR of \(\hat{v}(\xi , x)\) and \(\tilde{v}_\alpha (\xi , x)\), respectively. Finally, let \(i=1,\ldots ,m\), be given and define \(\xi _{-i} := (q, T, h_{-i})\). Then, for every \(\xi _{-i} \in \varXi ^q \times \varXi ^T \times \mathbb {R}^{m-1}\), we define

$$\begin{aligned} \hat{\tau }^{\beta }_{x,i}(\xi _{-i})&:= \inf \big \{ h_i \in \mathbb {R}\ | \ \big ( \hat{v}(\xi , x)> \zeta ^{\beta }(x) \big ) \ \vee \ \big ( v(\xi , x)> \hat{\zeta }^{\beta }(x) \big ) \big \}, \quad \text {and} \\ \tilde{\tau }^{\beta ,\alpha }_{x,i}(\xi _{-i})&:= \inf \big \{ h_i \in \mathbb {R}\ | \ \big (\tilde{v}_\alpha (\xi , x)> \zeta ^{\beta }(x) \big ) \ \vee \ \big ( v(\xi , x) > \tilde{\zeta }^{\beta }_\alpha (x) \big ) \big \}. \end{aligned}$$

Theorem 3

Consider the setting of Theorem 2 If \(f \in \mathscr {H}^m\), then for every \(x \in \mathbb {R}^{n_1}\) we have

$$\begin{aligned} |R^{\beta }(x) - \hat{R}^{\beta }(x)|&\le \frac{1}{1-\beta }\sum _{i=1}^m \mathbb {E}_{q,T}\Big [ \lambda ^*_{q,i} g\Big ( \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i})\big ( [\hat{\tau }^{\beta }_{x,i}(\xi _{-i}), +\infty ) \big ) \big ] \Big ) \Big ], \end{aligned}$$
(29)
$$\begin{aligned} |R^{\beta }(x) - \tilde{R}^{\beta }_\alpha (x)|&\le \frac{2}{1-\beta } \sum _{i=1}^m \mathbb {E}_{q,T}\Big [ \lambda ^*_{q,i} g\Big ( \mathbb {E}_{h_{-i}} \big [ |\varDelta |f_i(\cdot | h_{-i}) \big ( [\tilde{\tau }^{\beta ,\alpha }_{x,i}(\xi _{-i}), +\infty ) \big ) \big ] \Big ) \Big ], \end{aligned}$$
(30)

where g is the function from Theorem 2 and for every \(i=1,\ldots ,m\), the constants \(\lambda ^*_{q,i} := \max _{k \in K^q} \{ \lambda ^k_i \}\), \(q \in \varXi ^q\), are as in Theorem 2, and \(\hat{\tau }^{\beta }_{x,i}\) and \(\tilde{\tau }^{\beta ,\alpha }_{x,i}\) are defined in Definition 10.

Proof

See Appendix. \(\square \)

Theorem 3 exploits the fact that CVaR represents the expected value of the \((1-\beta )\times 100\%\) worst-case values only. As a result, the error bounds in Theorem 3 only depend on the total variation of the conditional pdfs of h over that part of its support that corresponds to these worst-case values. Since this support decreases if \(\beta \) increases, this total variation is non-increasing in \(\beta \). This effect explains why, contrary to what Theorem 1 suggests, the approximation errors \(|R^{\beta }(x) - \hat{R}^{\beta }(x)|\) and \(|R^{\beta }(x) - \tilde{R}^{\beta }_\alpha (x)|\) may actually be decreasing in \(\beta \). We illustrate this for the special case of simple integer recourse models in the next subsection.

4.3 Simple integer recourse

In this subsection we study the behavior of the error bounds from Theorem 3 in the special case of so-called one-dimensional simple integer recourse (SIR). Similar as in the risk-neutral case [23, 28, 41], we can exploit the special structure of this problem to construct a convex approximation with a sharp error bound. Surprisingly, for random variables h with a non-increasing positive tail, the error bound depends on the hazard rate of the distribution of h. Contrary to the bound in Theorem 1 from Sect. 3, this error bound is not necessarily large if \(\beta {\uparrow } 1\). This is a desirable property, since we are generally interested in large values for the CVaR parameter \(\beta \in (0,1)\). In fact, we prove that for heavy-tailed distributions with a decreasing hazard rate the error bound converges to zero if \(\beta {\uparrow } 1\).

The one-dimensional simple integer recourse model is defined as a special case of the TU integer recourse model defined by (20), with \(n_2 = 1\), \(\bar{W} = [1]\), \(\bar{q}=1\) and \(T=[1]\). Note that q and T are assumed to be deterministic; only the right-hand side vector \(h \in \mathbb {R}\) is random, with pdf f and cdf F. The second-stage value function can then be written as

$$\begin{aligned} v(h, x) = \lceil h - x\rceil ^+, \quad h, x \in \mathbb {R}, \end{aligned}$$
(31)

while its convex approximations \(\hat{v}\) and \(\tilde{v}_\alpha \) reduce to

$$\begin{aligned} \hat{v}(h, x)&= (h - x + 1/2)^+ \quad \text {and} \quad \tilde{v}_\alpha (h, x) = (\lceil h - \alpha \rceil + \alpha - x)^+, \end{aligned}$$

for all \(h, x \in \mathbb {R}\). Below we analyze the error bounds from Theorem 3 for these convex approximations. However, since the bounds for \(\hat{R}^{\beta }\) and \(\tilde{R}^{\beta }_\alpha \) differ only by a factor 2, we present the results for the shifted LP-relaxation \(\hat{R}^{\beta }\) only. We start by presenting a simplified version of the error bound in (29) from Theorem 3.

Corollary 1

Let \(R^{\beta }\) be the CVaR recourse function from (4), where v is the SIR value function from (31). Moreover, let \(\hat{R}^{\beta }\) be the shifted LP-relaxation approximation from Definition 8. Then,

$$\begin{aligned} \Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty&\le \frac{1}{1-\beta }g \Big (|\varDelta |f\big ( [\tau ^{\beta }, +\infty ) \big ) \Big ), \end{aligned}$$
(32)

where \(\tau ^{\beta } := F^{-1}(\beta ) - 1\) and g is defined in (28).

Proof

See Appendix. \(\square \)

It is not immediately clear whether the error bound in Corollary 1 is increasing or decreasing in \(\beta \). On the one hand, the fraction \(\tfrac{1}{1-\beta }\) increases in \(\beta \) and goes to \(+\infty \) as \(\beta {\uparrow } 1\). On the other hand, \(g\big (|\varDelta |f\big ( [\tau ^{\beta }, +\infty ) \big )\big )\) decreases in \(\beta \) and goes to zero as \(\beta {\uparrow } 1\), since the left end-point \(\tau ^{\beta }\) of the interval over which we take the total variation of f goes to \(+\infty \). Below, we identify conditions on the tail of the pdf f under which the error bound goes to zero as \(\beta {\uparrow } 1\). We do so for random variables h for which the pdf f has a positive, non-increasing right tail; see Assumption 2. This includes many commonly-used probability distributions such as the normal, gamma, Weibull, and lognormal distribution.

Assumption 2

The pdf f of the random variable h has a positive, non-increasing right tail. That is, there exists a scalar \(z \in \mathbb {R}\) such that f is positive and non-increasing on \([z, +\infty )\).

Corollary 2

Consider the setting of Corollary 1 and suppose that Assumption 2 holds. Then, for \(\beta \ge F(z+1)\), we have

$$\begin{aligned} \Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty \le \frac{f(\tau ^{\beta })}{8(1-\beta )} . \end{aligned}$$

Proof

Since \(\beta \ge F(z + 1)\), it follows that \(\tau ^{\beta } \ge z\). Since f has a non-increasing right tail, this implies that \(|\varDelta |f \big ( [\tau ^{\beta }, +\infty ) \big ) = f(\tau _\beta )\). The result now follows from the observation that \(g(t) \le 1/8\) for all \(t \ge 0\). \(\square \)

The error bound from Corollary 2 is closely related to the hazard rate of h. It turns out that the error bound (and hence, also the error itself) converges to zero if this hazard rate goes to zero.

Definition 11

Let h be a continuous random variable with pdf f and cdf F. Then, the hazard rate\(\lambda \) of h is defined as

$$\begin{aligned} \lambda (t) = \frac{f(t)}{1 - F(t)}, \quad t \in \mathbb {R}. \end{aligned}$$

We say h has a decreasing hazard rate if \(\lim \nolimits _{t \rightarrow \infty } \lambda (t) = 0\).

Theorem 4

Let \(R^{\beta }\) be the CVaR recourse function from (4), where v is the SIR value function from (31). Moreover, let \(\hat{R}^{\beta }\) be the shifted LP-relaxation approximation from Definition 8. Suppose that Assumption 2 holds and that h has a decreasing hazard rate. Then,

$$\begin{aligned} \lim \limits _{\beta {\uparrow } 1} \Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty = 0. \end{aligned}$$

Proof

For \(\beta \) sufficiently close to 1, the condition \(\beta \ge F(z+1)\) of Corollary 2 holds. Hence, by Corollary 2 it follows that

$$\begin{aligned} \lim \limits _{\beta {\uparrow } 1}\Vert R^{\beta } - \hat{R}^{\beta } \Vert _\infty \le \tfrac{1}{8}\lim \limits _{\beta {\uparrow } 1} \tfrac{f(\tau ^{\beta })}{1-\beta } = \tfrac{1}{8} \lim \limits _{\beta {\uparrow } 1} \tfrac{f(F^{-1}(\beta ) - 1)}{1-F(F^{-1}(\beta ))}. \end{aligned}$$
(33)

Consider the limit in the right-hand side above. Performing a change of variable \(t = F^{-1}(\beta )\), and using that \(\lim \nolimits _{\beta {\uparrow } 1} F^{-1}(\beta ) = +\infty \), we get

$$\begin{aligned} \lim \limits _{\beta {\uparrow } 1} \tfrac{f(F^{-1}(\beta ) -1)}{1-F(F^{-1}(\beta ))}&= \lim \limits _{t \rightarrow \infty } \tfrac{f(t-1)}{1 - F(t)} = 0, \end{aligned}$$

where the last equality follows by Lemma 5 in Appendix B. Substituting this into (33) completes the proof. \(\square \)

Theorem 4 shows that the convex approximation \(\hat{R}^{\beta }\) is good for large values of \(\beta \) if h has a decreasing hazard rate. Since every distribution with a decreasing hazard rate has a heavy tail [15], the convex approximation is good in cases where extreme events are relatively likely to occur. Interestingly, this is precisely the situation in which explicit modeling of risk is desired. More generally, contrary to what the error bounds from Theorem 1 suggest, Theorem 4 provides evidence that the approximation errors of our convex approximations need not explode as \(\beta {\uparrow } 1\). In fact, they may even converge to zero.

5 Summary and conclusions

We considered two-stage mean-CVaR recourse models, where the second-stage problem is a mixed-integer linear program. These models are non-convex due to the presence of integer variables and hence, they are extremely hard to solve. Inspired by results from the literature on corresponding risk-neutral models we construct convex approximation models, which can be solved efficiently using techniques from convex optimization. In particular, we define two types of convex approximations of the CVaR recourse function \(R_\beta \).

In order to guarantee the performance of the resulting approximate solutions, we derive error bounds: upper bounds on the approximation errors. These error bounds depend on the total variations of the one-dimensional conditional density functions of the random right-hand side variables in the model. In particular, the error bounds converge to zero if all these total varations go to zero. This implies that all CVaR recourse functions \(R_\beta \) can be approximated arbitrarily well by a convex function if these total variations are small enough.

For the special case of two-stage TU integer mean-CVaR recourse models, we derive sharper error bounds by exploiting the special structure of these problems. In particular, for simple integer recourse models we show that the error bound is small if the random right-hand side variable in the model has a decreasing hazard rate, implying that its distribution is heavy-tailed. In such a situation, explicit modeling of risk aversion is desired to accurately model the underlying practical decision problem. Hence, our convex approximation approach works well in precisely those cases in which risk-averse optimization is relevant.

Future research efforts may be aimed at finding sharper error bounds for other special cases of two-stage mixed-integer mean-CVaR recourse models. Other directions for future research include assessing the actual performance of the approximations (compared to their error bounds) in a numerical study and constructing convex approximations for mixed-integer mean-risk recourse models with other risk measures than CVaR.