Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An important problem in quantitative risk management is to aggregate several individually studied types of risks into an overall position. Mathematically, this translates into studying the worst-case distribution tails of \(\varPsi (X)\), where \(\varPsi :\mathbb {R}^n\rightarrow \mathbb {R}\) is a given function that represents the risk (or undesirability) of an outcome, and where \(X\) is a random vector that takes values in \(\mathbb {R}^n\) and whose distribution is only partially known. For example, one may only have information about the marginals of \(X\) and possibly partial information about some of the moments.

To solve such problems, duality is often exploited, as the dual may be easier to approach numerically or analytically [2–5, 14]. Being able to formulate a dual is also important in cases where the primal is approachable algorithmically, as solving the primal and dual problems jointly provides an approximation guarantee throughout the run of a solve: if the duality gap (the difference between the primal and dual objective values) falls below a chosen threshold relative to the primal objective, the algorithm can be stopped with a guarantee of approximating the optimum to a fixed precision that depends on the chosen threshold. This is a well-known technique in convex optimization, see, e.g., [1].

Although for some special cases of the marginal problem analytic solutions and powerful numerical heuristics exist [6, 12, 13, 18, 19], these techniques do not apply when additional constraints are imposed to force the probability measures over which we maximize the risk to conform with empirical observations: In a typical case, the bulk of the empirical data may be contained in a region \(D\) that can be approximated by an ellipsoid or the union of several (disjoint or overlapping) polyhedra. For a probability measure \(\mu \) to be considered a reasonable explanation of the true distribution of (multidimensional) losses, one would require the probability mass contained in \(D\) to lie in an empirically estimated confidence region, that is, \(\ell \le \mu (D)\le u\) for some estimated bounds \(\ell <u\). In such a situation, the derivation of robust risk aggregation bounds via dual problems remains a powerful and interesting approach.

In this chapter, we formulate a general optimization problem, which can be seen as a doubly infinite linear programming problem, and we show that the associated dual generalizes several well known special cases. We then apply this duality framework to a new class of risk management models we propose in Sect. 4.

2 A General Duality Relation

Let \((\varPhi ,\mathfrak {F})\), \((\varGamma ,\mathfrak {G})\) and \((\varSigma ,\mathfrak {S})\) be complete measure spaces, and let \(A:\,\varGamma \times \varPhi \rightarrow \mathbb {R}\), \(a:\,\varGamma \rightarrow \mathbb {R}\), \(B:\,\varSigma \times \varPhi \rightarrow \mathbb {R}\), \(b:\,\varSigma \rightarrow \mathbb {R}\), and \(c:\,\varPhi \rightarrow \mathbb {R}\) be bounded measurable functions on these spaces and the corresponding product spaces. Let \(\fancyscript{M}_{\mathfrak {F}}\), \(\fancyscript{M}_{\mathfrak {G}}\) and \(\fancyscript{M}_{\mathfrak {S}}\) be the set of signed measures with finite variation on \((\varPhi ,\mathfrak {F})\), \((\varGamma ,\mathfrak {G})\), and \((\varSigma ,\mathfrak {S})\) respectively. We now consider the following pair of optimization problems over \(\fancyscript{M}_{\mathfrak {F}}\) and \(\fancyscript{M}_{\mathfrak {G}}\times \fancyscript{M}_{\mathfrak {S}}\), respectively,

$$\begin{aligned} \text {(P)}\quad \sup _{\fancyscript{F}\in \fancyscript{M}_{\mathfrak {F}}}\,&\int \limits _{\varPhi }c(x){{\mathrm{d}}}\fancyscript{F}(x)\\ \text {s.t. }&\int \limits _{\varPhi }A(y,x){{\mathrm{d}}}\fancyscript{F}(x)\le a(y),\quad (y\in \varGamma ),\\&\int \limits _{\varPhi }B(z,x){{\mathrm{d}}}\fancyscript{F}(x)= b(z),\quad (z\in \varSigma ),\\&\fancyscript{F}\ge 0, \end{aligned}$$

and

$$\begin{aligned} \text {(D)}\quad \inf _{(\fancyscript{G},\fancyscript{S})\in \fancyscript{M}_{\mathfrak {G}}\times \fancyscript{M}_{\mathfrak {S}}} \,&\int \limits _{\varGamma }a(y){{\mathrm{d}}}\fancyscript{G}(y)+\int \limits _{\varSigma }b(z){{\mathrm{d}}}\fancyscript{S}(z),\\ \text {s.t. }&\int \limits _{\varGamma }A(y,x){{\mathrm{d}}}\fancyscript{G}(y)+\int \limits _{\varSigma } B(z,x){{\mathrm{d}}}\fancyscript{S}(z)\ge c(x),\quad (x\in \varPhi ),\\&\fancyscript{G}\ge 0. \end{aligned}$$

We claim that the infinite-programming problems (P) and (D) are duals of each other.

Theorem 1

(Weak Duality) For every (P)-feasible measure \(\fancyscript{F}\) and every (D)-feasible pair \((\fancyscript{G},\fancyscript{S})\) we have

$$\begin{aligned} \int \limits _{\varPhi }c(x){{\mathrm{d}}}\fancyscript{F}(x)\le \int \limits _{\varGamma }a(y){{\mathrm{d}}}\fancyscript{G}(y)+ \int \limits _{\varSigma }b(z){{\mathrm{d}}}\fancyscript{S}(z). \end{aligned}$$

Proof

Using Fubini’s Theorem, we have

$$\begin{aligned} \int \limits _{\varPhi }c(x){{\mathrm{d}}}\fancyscript{F}(x)&\le \int \limits _{\varGamma \times \varPhi }A(y,x){{\mathrm{d}}}(\fancyscript{G}\times \fancyscript{F})(y,x)+ \int \limits _{\varSigma \times \varPhi }B(z,x){{\mathrm{d}}}(\fancyscript{S}\times \fancyscript{F})(z,x)\\&\le \int \limits _{\varGamma }a(y){{\mathrm{d}}}\fancyscript{G}(y)+ \int \limits _{\varSigma }b(z){{\mathrm{d}}}\fancyscript{S}(z). \end{aligned}$$

In various special cases, such as those discussed in Sect. 3, strong duality is known to hold subject to regularity assumptions, that is, the optimal values of (P) and (D) coincide. Another special case under which strong duality applies is when the measures \(\fancyscript{F}\), \(\fancyscript{G}\), and \(\fancyscript{S}\) have densities in appropriate Hilbert spaces, see the forthcoming DPhil thesis of the second author [17].

We remark that the quantifiers in the constraints can be weakened if the set of allowable measures is restricted. For example, if \(\fancyscript{G}\) is restricted to lie in a set of measures that are absolutely continuous with respect to a fixed measure \(\fancyscript{G}_0\in \fancyscript{M}_{\mathfrak {G}}\), then the quantifier \((y\in \varGamma )\) can be weakened to \((\fancyscript{G}_0\text {-almost all }y\in \varGamma )\).

3 Classical Examples

Our general duality relation of Theorem 1 generalizes many classical duality results, of which we now point out a few examples. Let \(p(x_1,\ldots ,x_k)\) be a function of \(k\) arguments. Then we write

$$\begin{aligned} 1_{\{x: p(x)\ge 0\}}:=1_{\{y: p(y)\ge 0\}}(x)={\left\{ \begin{array}{ll}1\quad &{}\text {if }p(x)\ge 0,\\ 0\quad &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

In other words, we write the argument \(x\) of the indicator function directly into the set \(\{y: p(y)\ge 0\}\) that defines the function, rather than using a separate set of variables \(y\). This abuse of notation will make it easier to identify which inequality is satisfied by the arguments where the function \(1_{\{y: p(y)\ge 0\}}(x)\) takes the value \(1\).

We start with the Moment Problem studied by Bertsimas and Popescu [2], who considered generalized Chebychev inequalities of the form

$$\begin{aligned} \text {(P')}\quad \sup _{X}\;&{{\mathrm{P}}}[r(X)\ge 0]\\ \text {s.t. }&{{\mathrm{E}}}_{\mu }[X_1^{k_1},\ldots , X_n^{k_n}]=b_{k},\quad (k\in J),\\&X\text { a random vector taking values in }\mathbb {R}^n,\nonumber \end{aligned}$$

where \(r:\mathbb {R}^n\rightarrow \mathbb {R}\) is a multivariate polynomial and \(J\subset \mathbb {N}^n\) is a finite sets of multi-indices. In other words, some moments of \(X\) are known. By choosing \(\varPhi =\mathbb {R}^n\), \(\varGamma =\varvec{\emptyset }\), \(\varSigma =J\cup \{0\}\),

$$\begin{aligned}&B(k,x)=x_1^{k_1},\ldots , x_n^{k_n},\quad b(k)=b_k,\quad (k\in J),\\&B(0,x)={{\mathrm{1}}}_{\mathbb {R}^n},\quad b(0)=1, \end{aligned}$$

and \(c(x)={{\mathrm{1}}}_{\{x:\,r(x)\ge 0\}}\), where we made use of the abuse of notation discussed above, problem (P’) becomes a special case of the primal problem considered in Sect. 2,

$$\begin{aligned} \text {(P)}\quad \sup _{\fancyscript{F}}\,&\int \limits _{\mathbb {R}^n}{{\mathrm{1}}}_{\{x:\,r(x)\ge 0\}}{{\mathrm{d}}}\fancyscript{F}(x)\\ \text {s.t. }&\int \limits _{\mathbb {R}^n}x_1^{k_1},\ldots , x_n^{k_n}{{\mathrm{d}}}\fancyscript{F}(x)=b_{k},\quad (k\in J),\\&\int \limits _{\mathbb {R}^n}1{{\mathrm{d}}}\fancyscript{F}(x)=1,\\&\fancyscript{F}\ge 0. \end{aligned}$$

Our dual

$$\begin{aligned} \text {(D)}\quad \inf _{(z,z_0)\in \mathbb {R}^{|J|+1}}\,&\sum _{k\in J} z_k b_k + z_0\\ \text {s.t. }&\sum _{k\in J} z_k x_1^{k_1},\ldots , x_n^{k_n} + z_0\ge {{\mathrm{1}}}_{\{x: r(x)\ge 0\}}, \quad (x\in \mathbb {R}^n) \end{aligned}$$

is easily seen to be identical with the dual (D’) identified by Bertsimas and Popescu,

$$\begin{aligned} \text {(D')}\quad \inf _{(z,z_0)\in \mathbb {R}^{|J|+1}}\,&\sum _{k\in J} z_k b_k + z_0\\ \text {s.t. }&\forall \,x\in \mathbb {R}^n, r(x)\ge 0\Rightarrow \sum _{k\in J} z_k x_1^{k_1},\ldots , x_n^{k_n} + z_0-1\ge 0,\\&\forall \,x\in \mathbb {R}^n, \sum _{k\in J} z_k x_1^{k_1},\ldots , x_n^{k_n} + z_0\ge 0. \end{aligned}$$

Note that since \(\varGamma ,\varSigma \) are finite, the constraints of (D’) are polynomial copositivity constraints. The numerical solution of semi-infinite programming problems of this type can be approached via a nested hierarchy of semidefinite programming relaxations that yield better and better approximations to (D’). The highest level problem within this hierarchy is guaranteed to solve (D’) exactly, although the corresponding SDP is of exponential size in the dimension \(n\), in the degree of the polymomial \(r\), and in \(\max _{k\in J}(\sum _i k_i)\). For further details see [2, 7, 10], and Sect. 4.6 below.

Next, we consider the Marginal Problem studied by Rüschendorf [15, 16] and Ramachandran and Rüschendorf [14],

$$\begin{aligned} \text {(P')}\quad \sup _{\fancyscript{F}\in \fancyscript{M}_{F_1,\ldots ,F_n}}\,\int \limits _{\mathbb {R}^n}h(x){{\mathrm{d}}}\fancyscript{F}(x), \end{aligned}$$

where \(\fancyscript{M}_{F_1,\ldots ,F_n}\) is the set of probability measures on \(\mathbb {R}^n\) whose marginals have the cdfs \(F_i\) \((i=1,\ldots ,n)\). Problem (P’) can easily be seen as a special case of the framework of Sect. 2 by setting \(c(x)=h(x)\), \(\varPhi =\mathbb {R}^n\), \(\varGamma =\varvec{\emptyset }\), \(\varSigma =\mathbb {N}_n\times \mathbb {R}\), \(B(i,z,x)={{\mathrm{1}}}_{\{y:\,y_i\le z\}}\) (using the abuse of notation discussed earlier), and \(b_i(z)=F_i(z)\) \((i\in \mathbb {N}_n,\,z\in \mathbb {R})\),

$$\begin{aligned} \text {(P)}\quad \sup _{\fancyscript{F}}\,&\int \limits _{\mathbb {R}^n}h(x){{\mathrm{d}}}\fancyscript{F}(x)\\ \text {s.t. }&\int \limits _{\mathbb {R}}{{\mathrm{1}}}_{\{x_i\le z\}}{{\mathrm{d}}}\fancyscript{F}(x) =F_i(z),\quad (z\in \mathbb {R}, i\in \mathbb {N}_n)\\&\fancyscript{F}\ge 0. \end{aligned}$$

Taking the dual, we find

$$\begin{aligned} \text {(D)}\quad \inf _{\fancyscript{S}_1,\ldots ,\fancyscript{S}_n}\,&\sum _{i=1}^n\int \limits _{\mathbb {R}}F_i(z){{\mathrm{d}}}\fancyscript{S}_i(z)\\ \text {s.t. }&\sum _{i=1}^n\int \limits _{\mathbb {R}}{{\mathrm{1}}}_{\{x_i\le z\}}{{\mathrm{d}}}\fancyscript{S}_i(z) \ge h(x),\quad (x\in \mathbb {R}^n). \end{aligned}$$

The signed measures \(\fancyscript{S}_i\) being of finite variation, the functions \(S_i(z)=\fancyscript{S}((-\infty ,z])\) and the limits \(s_i=\lim _{z\rightarrow \infty }S_i(z)=\fancyscript{S}((-\infty ,+\infty ))\) are well defined and finite. Furthermore, using \(\lim _{z\rightarrow -\infty }F_i(z)=0\) and \(\lim _{z\rightarrow +\infty }F_i(z)=1\), we have

$$\begin{aligned} \sum _{i=1}^n\int \limits _{\mathbb {R}}F_i(z){{\mathrm{d}}}\fancyscript{S}(z)&= \sum _{i=1}^n\left( F_i(z)S_i(z)|^{+\infty }_{-\infty }-\int \limits _{\mathbb {R}}S_i(z){{\mathrm{d}}}F_i(z) \right) \\&=\sum _{i=1}^n s_i - \sum _{i=1}^n\int \limits _{\mathbb {R}}S_i(z){{\mathrm{d}}}F_i(z)\\&=\sum _{i=1}^n\int \limits _{\mathbb {R}}(s_i-S_i(z)){{\mathrm{d}}}F_i(z), \end{aligned}$$

and likewise,

$$\begin{aligned} \sum _{i=1}^n\int \limits _{\mathbb {R}}{{\mathrm{1}}}_{\{x_i\le z\}}{{\mathrm{d}}}\fancyscript{S}_i(z)= \sum _{i=1}^n\int \limits _{x_i}^{+\infty }1{{\mathrm{d}}}\fancyscript{S}_i(z) =\sum _{i=1}^n (s_i - S_i(x_i)). \end{aligned}$$

Writing \(h_i(z)=s_i-S_i(z)\), (D) is, therefore, equivalent to

$$\begin{aligned} \text {(D')}\quad \inf _{h_1,\ldots ,h_n}\,&\sum _{i=1}^n\int \limits _{\mathbb {R}}h_i(z){{\mathrm{d}}}F_i(z)\\ \text {s.t. }&\sum _{i=0}^n h_i(x_i)\ge h(x),\quad (x\in \mathbb {R}^n). \end{aligned}$$

This is the dual identified by Ramachandran and Rüschendorf [14]. Due to the general form of the functions \(h_i\), the infinite programming problem (D’) is not directly usable in numerical computations. However, for specific \(h(x)\), (D’)-feasible functions \((h_1,\ldots ,h_n)\) can sometimes be constructed explicitly, yielding an upper bound on the optimal objective function value of (P’) by virtue of Theorem 1. Embrechts and Puccetti [3–5] used this approach to derive quantile bounds on \(X_1+\cdots +X_n\), where \(X\) is a random vector with known marginals but unknown joint distribution. In this case, the relevant primal objective function is defined by \(h(x)={{\mathrm{1}}}_{\{x:\,e^{{{\mathrm{T}}}}x\ge t\}}\), where \(t\in \mathbb {R}\) is a fixed level. More generally, \(h(x)={{\mathrm{1}}}_{\{x:\,\varPsi (x)\ge t\}}\) can be chosen, where \(\varPsi \) is a relevant risk aggregation function, or \(h(x)\) can model any risk measure of choice.

Our next example is the Marginal Problem with Copula Bounds, an extension to the marginal problem mentioned in [3]. The copula defined by the probability measure \(\fancyscript{F}\) with marginals \(F_i\) is the function

$$\begin{aligned} \fancyscript{C}_{\fancyscript{F}}:\,[0, 1]^n&\rightarrow \,[0, 1],\\ u&\mapsto F\left( F_1^{-1}(u_1),\ldots ,F_n^{-1}(u_n)\right) . \end{aligned}$$

A copula is any function \(\fancyscript{C}:\,[0, 1]^n\rightarrow \,[0,1]\) that satisfies \(\fancyscript{C}=\fancyscript{C}_{\fancyscript{F}}\) for some probability measure \(\fancyscript{F}\) on \(\mathbb {R}^n\). Equivalently, a copula is the multivariate cdf of any probability measure on the unit cube \([0, 1]^n\) with uniform marginals. In quantitative risk management, using the model

$$\begin{aligned} \quad \sup _{\fancyscript{F}\in \fancyscript{M}_{F_1,\ldots ,F_n}}\,\int \limits _{\mathbb {R}^n}h(x){{\mathrm{d}}}\fancyscript{F}(x) \end{aligned}$$

to bound the worst-case risk for a random vector \(X\) with marginal distributions \(F_i\) can be overly conservative, as no dependence structure between the coordinates of \(X_i\) is assumed given at all. The structure that determines this dependence being the copula \(\fancyscript{C}_{\fancyscript{F}}\), where \(\fancyscript{F}\) is the multivariate distribution of \(X\), Embrechts and Puccetti [3] suggest problems of the form

$$\begin{aligned} \text {(P')}\quad \sup _{\mu \in \fancyscript{M}_{F_1,\ldots ,F_n}}\,&\int \limits _{\mathbb {R}^n}h(x){{\mathrm{d}}}\mu (x), \\ \text {s.t. }&\fancyscript{C}_{\text {lo}}\le \fancyscript{C}_{\fancyscript{F}}\le \fancyscript{C}_{\text {up}}, \end{aligned}$$

as a natural framework to study the situation in which partial dependence information is available. In problem (P’), \(\fancyscript{C}_{\text {lo}}\) and \(\fancyscript{C}_{\text {up}}\) are given copulas, and inequality between copulas is defined by pointwise inequality,

$$\begin{aligned} \fancyscript{C}_{\text {lo}}(u)\le \fancyscript{C}_{\fancyscript{F}}(u)\quad (u\in [0, 1]^n). \end{aligned}$$

Once again, (P’) is a special case of the general framework studied in Sect. 2, as it is equivalent to write

$$\begin{aligned} \text {(P)}\quad \sup _{\fancyscript{F}}\,&\int \limits _{\mathbb {R}^n}h(x){{\mathrm{d}}}\fancyscript{F}(x)\\ \text {s.t. }&\int \limits _{\mathbb {R}^n}{{\mathrm{1}}}_{\{x\le (F_1^{-1}(u_1),\ldots ,F_n^{-1}(u_n))\}} (u,x){{\mathrm{d}}}\fancyscript{F}(x)\le \fancyscript{C}_{\text {up}}(u),\quad (u\in [0, 1]^n),\\&\int \limits _{\mathbb {R}^n}-{{\mathrm{1}}}_{\{x\le (F_1^{-1}(u_1),\ldots ,F_n^{-1}(u_n))\}} (u,x){{\mathrm{d}}}\fancyscript{F}(x)\le -\fancyscript{C}_{\text {lo}}(u),\quad (u\in [0, 1]^n),\\&\int \limits _{\mathbb {R}^n}{{\mathrm{1}}}_{\{x_i\le z\}} (z,x){{\mathrm{d}}}\fancyscript{F}(x)=F_i(z),\quad (i\in \mathbb {N}_n,\,z\in \mathbb {R}),\\&\fancyscript{F}\ge 0. \end{aligned}$$

The dual of this problem is given by

$$\begin{aligned} \text {(D)}\quad \inf _{\fancyscript{G}_{\text {up}},\fancyscript{G}_{\text {lo}},\fancyscript{S}_1,\ldots ,\fancyscript{S}_n}\,&\int \limits _{[0, 1]^n}\fancyscript{C}_{\text {up}}(u){{\mathrm{d}}}\fancyscript{G}_{\text {up}}(u)- \int \limits _{[0, 1]^n}\fancyscript{C}_{\text {lo}}(u){{\mathrm{d}}}\fancyscript{G}_{\text {lo}}(u) +\sum _{i=1}^n\int \limits _{\mathbb {R}}F_i(z){{\mathrm{d}}}\fancyscript{S}_i(z)\\ \text {s.t. }&\int \limits _{[0, 1]^n}{{\mathrm{1}}}_{\{x\le (F_1^{-1}(u_1),\ldots ,F_n^{-1}(u_n))\}} (u,x){{\mathrm{d}}}\fancyscript{G}_{\text {up}}(u)\\&-\int \limits _{[0, 1]^n}{{\mathrm{1}}}_{\{x\le (F_1^{-1}(u_1),\ldots ,F_n^{-1}(u_n))\}} (u,x){{\mathrm{d}}}\fancyscript{G}_{\text {lo}}(u)\\&+\sum _{i=1}^n\int \limits _{\mathbb {R}}{{\mathrm{1}}}_{\{x_i\le z\}}{{\mathrm{d}}}\fancyscript{S}_i(z) \ge h(x),\quad (x\in \mathbb {R}^n),\\&\fancyscript{G}_{\text {lo}},\fancyscript{G}_{\text {up}}\ge 0. \end{aligned}$$

Using the notation \(s_i, S_i\) introduced in Sect. 3, this problem can be written as

$$\begin{aligned} \inf _{\fancyscript{G}_{\text {up}},\fancyscript{G}_{\text {lo}},\fancyscript{S}_1,\ldots \fancyscript{S}_n}\,&\int \limits _{[0, 1]^n}\fancyscript{C}_{\text {up}}(u){{\mathrm{d}}}\fancyscript{G}_{\text {up}}(u)- \int \limits _{[0, 1]^n}\fancyscript{C}_{\text {lo}}(u){{\mathrm{d}}}\fancyscript{G}_{\text {lo}}(u) +\sum _{i=1}^n\int \limits _{\mathbb {R}}(s_i-S_i(z)){{\mathrm{d}}}F_i(z)\\ \text {s.t. }&\fancyscript{G}_{\text {up}}(\fancyscript{B}(x)) -\fancyscript{G}_{\text {lo}}(\fancyscript{B}(x)) +\sum _{i=1}^n(s_i-S_i(x_i))\ge h(x),\quad (x\in \mathbb {R}^n),\\&\fancyscript{G}_{\text {up}},\fancyscript{G}_{\text {lo}}\ge 0, \end{aligned}$$

where \(\fancyscript{B}(x)=\{u\in [0, 1]^n:\,u\ge (F_1(x_1),\ldots ,F_n(x_n))\}\). To the best of our knowledge, this dual has not been identified before.

Due to the high dimensionality of the space of variables and constraints both in the primal and dual, the marginal problem with copula bounds is difficult to solve numerically, even for very coarse discrete approximations.

4 Robust Risk Aggregation via Bounds on Integrals

In quantitative risk management, distributions are often estimated within a parametric family from the available data. For example, the tails of marginal distributions may be estimated via extreme value theory, or a Gaussian copula may be fitted to the multivariate distribution of all risks under consideration, to model their dependencies. The choice of a parametric family introduces model uncertainty, while fitting a distribution from this family via statistical estimation introduces parameter uncertainty. In both cases, a more robust alternative would be to study models in which the available data is only used to estimate upper and lower bounds on finitely many integrals of the form

$$\begin{aligned} \int \limits _{\varPhi }\phi (x){{\mathrm{d}}}\fancyscript{F}(x), \end{aligned}$$
(1)

where \(\phi (x)\) is a suitable test function. A suitable way of estimating upper and lower bounds on such integrals from sample data \(x_i\) \((i\in \mathbb {N}_k)\) is to estimate confidence bounds via bootstrapping.

4.1 Motivation

To motivate the use of constraints in the form of bounds on integrals (1), we offer the following explanations: First of all, discretized marginal constraints are of this form with piecewise constant test functions, as the requirement that \(F_i(\xi _{k})-F_i(\xi _{k-1})=b_k\) \((k=1,\ldots ,\ell )\) for a fixed set of discretization points \(\xi _0<\cdots <\xi _{\ell }\) can be expressed as

$$\begin{aligned} \int \limits _{\varPhi }1_{\{\xi _{k}\le x_i\le \xi _{k-1}\}}{{\mathrm{d}}}\fancyscript{F}(x)=b_k,\quad (k=1,\ldots ,\ell ). \end{aligned}$$
(2)

It is, furthermore, quite natural to relax each of these equality constraints to two inequality constraints

$$\begin{aligned} b^{\ell }_{k,i}\le \int \limits _{\varPhi }1_{\{\xi _{k}\le x_i\le \xi _{k-1}\}}{{\mathrm{d}}}\fancyscript{F}(x)\le b^u_{k,i} \end{aligned}$$

when \(b_k\) is estimated from data.

More generally, constraints of the form \({{\mathrm{P}}}[X\in S_j]\le b^u_j\) for some measurable \(S_j\subseteq \mathbb {R}^n\) of interest can be written as

$$\begin{aligned} \int \limits _{\varPhi } 1_{S_j}(x){{\mathrm{d}}}\fancyscript{F}(x)\le b^u_j. \end{aligned}$$

A collection of \(\ell \) constraints of this form can be relaxed by replacing them by a convex combination

$$\begin{aligned} \int \limits _{\varPhi }\sum _{j=1}^{\ell }w_j 1_{S_j}(x){{\mathrm{d}}}\fancyscript{F}(x)\le \sum _{j=1}^{\ell } w_j b^u_j, \end{aligned}$$

where the weights \(w_j>0\) satisfy \(\sum _j w_j =1\) and express the relative importance of each constituent constraint. Nonnegative test functions thus have a natural interpretation as importance densities in sums-of-constraints relaxations. This allows one to put higher focus on getting the probability mass right in regions where it particularly matters (e.g., values of \(X\) that account for the bulk of the profits of a financial institution), while maximzing the risk in the tails without having to resort to too fine a discretization.

While this suggests to use a piecewise approximation of a prior estimate of the density of \(X\) as a test function, the results are robust under mis-specification of this prior, for as long as \(\phi (x)\) is nonconstant, constraints that involve the integral (1) tend to force the probability weight of \(X\) into the regions where the sample points are denser. To illustrate this, consider a univariate random variable with density \(f(x)=2/3(1+x)\) on \(x\in [0, 1]\) and test function \(\phi (x)=1+a x\) with \(a\in [-1,1]\). Then \(\int _{0}^{1}\phi (x) f(x){{\mathrm{d}}}x=1+5a/9\). The most dispersed probability measure on \([0, 1]\) that satisfies

$$\begin{aligned} \int \limits _{0}^{1}\phi (x){{\mathrm{d}}}\fancyscript{F}(x)=1+\frac{5a}{9} \end{aligned}$$
(3)

has an atom of weight \(4/9\) at \(0\) and an atom of weight \(5/9\) at \(1\) independently of \(a\), as long as \(a\ne 0\). The constraint (3) thus forces more probability mass into the right half of the interval \([0, 1]\), where the unknown (true) density \(f(x)\) has more mass and produces more sample points.

As a second illustration, take the density \(f(x)=3 x^2\) and the same linear test function as above. This time we find \(\int _{0}^{1}\phi (x) f(x){{\mathrm{d}}}x=1+3a/4\), and the most dispersed probability measure on \([0, 1]\) that satisfies

$$\begin{aligned} \int \limits _{0}^{1}\phi (x){{\mathrm{d}}}\fancyscript{F}(x)=1+\frac{3a}{4} \end{aligned}$$

has an atom of weight \(3/4\) at \(0\) and an atom of weight \(1/4\) at \(1\) independently of \(a\ne 0\), with similar conclusions as above, except that the effect is even stronger, correctly reflecting the qualitative features of the density \(f(x)\).

4.2 General Setup and Duality

Let \(\varPhi \) be decomposed into a partition \(\varPhi =\bigcup _{i=1}^k\varXi _i\) of polyhedra \(\varXi _i\) with nonempty interior, chosen as regions in which a reasonable number of data points are available to estimate integrals of the form (1).

Each polyhedron has a primal description in terms of generators,

$$\begin{aligned} \varXi _i={{\mathrm{conv}}}(q^i_1,\ldots ,q^i_{n_i})+{{\mathrm{cone}}}(r^i_1,\ldots ,r^i_{o_i}) \end{aligned}$$

where \({{\mathrm{conv}}}(q^i_1,\ldots ,q^i_{n_i})\) is the polytope with vertices \(q^i_n\in \mathbb {R}^n\), and

$$\begin{aligned} {{\mathrm{cone}}}(r^i_1,\ldots ,r^i_{o_i})=\left\{ \sum _{m=1}^{o_i}\xi _m r^i_m:\, \xi _m\ge 0\;(m\in \mathbb {N}_{o_i})\right\} \end{aligned}$$

is the polyhedral cone with recession directions \(r^i_m\in \mathbb {R}^n\). Each polyhedron also has a dual description in terms of linear inequalities,

$$\begin{aligned} \varXi _i=\bigcap _{j=1}^{k_i}\left\{ x\in \mathbb {R}^n:\, \langle f^i_{j}, x\rangle \ge \ell ^i_j\right\} , \end{aligned}$$

for some vectors \(f^i_j\in \mathbb {R}^n\) and bounds \(\ell ^i_j\in \mathbb {R}\). The main case of interest is where \(\varXi _i\) is either a finite or infinite box in \(\mathbb {R}^n\) with faces parallel to the coordinate axes, or an intersection of such a box with a linear half-space, in which case it is easy to pass between the primal and dual descriptions. Note however that the dual description is preferrable, as the description of a box in \(\mathbb {R}^n\) requires only \(2n\) linear inequalities, while the primal description requires \(2^n\) extreme vertices.

Let us now consider the problem

$$\begin{aligned} \text {(P)}\quad \sup _{\fancyscript{F}\in \fancyscript{M}_{\mathfrak {F}}}\,&\int \limits _{\varPhi }h(x){{\mathrm{d}}}\fancyscript{F}(x)\\ \text {s.t. }&\int \limits _{\varPhi }\phi _{s}(x){{\mathrm{d}}}\fancyscript{F}(x) \le a_{s},\quad (s=1,\ldots ,M),\\ \text {s.t. }&\int \limits _{\varPhi }\psi _{t}(x){{\mathrm{d}}}\fancyscript{F}(x) = b_{t},\quad (t=1,\ldots ,N),\\&\int \limits _{\varPhi }1{{\mathrm{d}}}\fancyscript{F}(x)=1,\\&\fancyscript{F}\ge 0, \end{aligned}$$

where the test functions \(\psi _t\) are piecewise linear on the partition \(\varPhi =\bigcup _{i=1}^k\varXi _i\), and where \(-h(x)\) and the test functions \(\phi _s\) are piecewise linear on the infinite polyhedra of the partition, and either jointly linear, concave, or convex on the finite polyhedra (i.e., polytopes) of the partition. The dual of (P) is

$$\begin{aligned} \text {(D)}\quad \inf _{(y,z)\in \mathbb {R}^{M+N+1}} \,&\sum _{s=1}^{M}a_s y_s+\sum _{t=1}^{N}b_t z_t + z_0,\nonumber \\ \text {s.t. }&\sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0{{\mathrm{1}}}_{\varPhi }(x) -h(x)\ge 0,\; (x\in \varPhi ),\\&y\ge 0.\nonumber \end{aligned}$$
(4)

We remark that (P) is a semi-infinite programming problem with infinitely many variables and finitely many constraints, while (D) is a semi-infinite programming problem with finitely many variables and infinitely many constraints. However, the constraint (4) of (D) can be rewritten as copositivity requirements over the polyhedra \(\varXi _i\),

$$\begin{aligned} \sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0{{\mathrm{1}}}_{\varPhi }(x) -h(x)\ge 0,\quad (x\in \varXi _i), \quad (i=1,\dots ,k). \end{aligned}$$

Next we will see how these copositivity constraints can be handled numerically, often by relaxing all but finitely many constraints. Nesterov’s first-order method can be adapted to solve the resulting problems, see [8, 9, 17].

In what follows, we will use the notation

$$\begin{aligned} \varvec{\varphi }_{y,z}(x)=\sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0-h(x). \end{aligned}$$

4.3 Piecewise Linear Test Functions

The first case we discuss is when \(\phi _s|_{\varXi _i}\) and \(h|_{\varXi _i}\) are jointly linear. Since we furthermore assumed that the functions \(\psi _t|_{\varXi _i}\) are linear, there exist vectors \(v^i_s\in \mathbb {R}^n\), \(w^i_t\in \mathbb {R}^n\), \(g^i\in \mathbb {R}^n\) and constants \(c^i_s\in \mathbb {R}\), \(d^i_t\in \mathbb {R}\) and \(e^i\in \mathbb {R}\) such that

$$\begin{aligned} \phi _s|_{\varXi _i}(x)&=\langle v^i_s, x\rangle +c^i_s,\\ \psi _t|_{\varXi _i}(x)&=\langle w^i_t, x\rangle +d^i_t,\\ h|_{\varXi _i}(x)&=\langle g^i, x\rangle +e^i. \end{aligned}$$

The copositivity condition

$$\begin{aligned} \sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0{{\mathrm{1}}}_{\varPhi }(x) -h(x)\ge 0,\quad (x\in \varXi _i) \end{aligned}$$

can then be written as

$$\begin{aligned}&\langle f^i_j, x\rangle \ge \ell ^i_j,\quad (j=1,\ldots ,k_i)\Longrightarrow \\&\qquad \qquad \qquad \qquad \left\langle \sum _{s=1}^{M}y_s v^i_s+\sum _{t=1}^{N}z_t w^i_t-g^i\;,\; x\right\rangle \ge e^i-\sum _{s=1}^{M}y_s c^i_s - \sum _{t=1}^{N}z_t d^i_t - z_0. \end{aligned}$$

By Farkas’ Lemma, this is equivalent to the constraints

$$\begin{aligned} \sum _{s=1}^{M}y_s v^i_s+\sum _{t=1}^{N}z_t w^i_t-g^i&=\sum _{j=1}^{k_i}\lambda ^i_j f^i_j,\end{aligned}$$
(5)
$$\begin{aligned} e^i-\sum _{s=1}^{M}y_s c^i_s - \sum _{t=1}^{N}z_t d^i_t - z_0&\le \sum _{j=1}^{k_i}\lambda ^i_j \ell ^i_j,\end{aligned}$$
(6)
$$\begin{aligned} \lambda ^i_j&\ge 0,\quad (j=1,\ldots ,k_i), \end{aligned}$$
(7)

where \(\lambda ^i_j\) are additional auxiliary decision variables.

Thus, if all test functions are linear on all polyhedral pieces \(\varXi _i\), then the dual (D) can be solved as a linear programming problem with \(M+N+1+\sum _{i=1}^k k_i\) variables and \(k(n+1)\) linear constraints, plus bound constraints on \(y\) and the \(\lambda ^i_j\). More generally, if some but not all polyhedra correspond to jointly linear test function pieces, then jointly linear pieces can be treated as discussed above, while other pieces can be treated as discussed below.

Let us briefly comment on numerical implementations, further details of which are described in the second author’s thesis [17]: An important case of the above described framework corresponds to a discretized marginal problem in which \(\phi _s(x)\) are piecewise constant functions chosen as follows for \(s=(i,j)\), \((\iota =1,\ldots ,n; j=1,\ldots ,m)\): Introduce \(m+1\) breakpoints \(\xi ^{\iota }_0<\xi ^{\iota }_1<\cdots <\xi ^{\iota }_m\) along each coordinate axis \(\iota \), and consider the infinite slabs

$$\begin{aligned} S_{\iota ,j}=\left\{ x\in \mathbb {R}^n:\, \xi ^{\iota }_{j-1}\le x_{\iota }\le \xi ^{\iota }_{j}\right\} , \quad (j=1,\ldots ,m). \end{aligned}$$

Then choose \(\phi _{\iota ,j}(x)=1_{S_{\iota ,j}}(x)\), the indicator function of slab \(S_{\iota ,j}\). We remark that this approach corresponds to discretizing the constraints of the Marginal Problem described in Sect. 3, but not to discretizing the probability measures over which we maximize the aggregated risk.

While the number of test functions is \(nm\) and thus linear in the problem dimension, the number of polyhedra to consider is exponentially large, as all intersections of the form

$$\begin{aligned} \varXi _{\iota ,\mathbf {j}}=\bigcap _{\iota =1}^n S_{\iota , j_{\iota }} \end{aligned}$$

for the \(m^n\) possible choices of \(\mathrm{{j}}\in \mathbb {N}_{m}^n\) have to be treated separately. In addition, in VaR applications \(h(x)\) is taken as the indicator function of an affine half-space \(\{x:\sum x_\iota \ge \tau \}\) for a suitably chosen threshold \(\tau \), and for CVaR applications \(h(x)\) is chosen as the piecewise linear function \(h(x)=\max (0, \sum x_\iota -\tau )\). Thus, polyhedra \(\varXi _{\iota ,\mathbf {j}}\) that meet the affine hyperplane \(\{x:\,\sum x_\iota =\tau \}\) are further sliced into two separate polyhedra. A straightforward application of the above described LP framework would thus lead to an LP with exponentially many constraints and variables. Note however that the constraints (5)–(7) now read

$$\begin{aligned} g^i&=\sum _{j=1}^{k_i}\lambda ^i_j f^i_j,\end{aligned}$$
(8)
$$\begin{aligned} e^i-\sum _{s=1}^{M}y_s c^i_s - z_0&\le \sum _{j=1}^{k_i}\lambda ^i_j \ell ^i_j,\end{aligned}$$
(9)
$$\begin{aligned} \lambda ^i_j&\ge 0,\quad (j=1,\ldots ,k_i), \end{aligned}$$
(10)

as \(v^i_s=0\) and no test functions \(\psi _t(x)\) were used, with when \(\varXi _i\subseteq \{x:\, \sum x_{\iota }\ge \tau \}\) and \(g^i=0\) otherwise. That is, the vector that appears in the left-hand side of Constraint (8) is fixed by the polyhedron \(\varXi _i\) alone and does not depend on the decision variables \(y,z_0\). Since \(z_0\) is to be chosen as small as possible in an optimal solution of (D), the constraint (9) has to be made as slack as possible. Therefore, the optimal values of \(\lambda ^i_j\) are also fixed by the polyhedron \(\varXi _i\) alone and are identifiable by solving the small-scale LP

$$\begin{aligned} (\lambda ^{i}_j)^*=\arg \max _{\lambda }\;&\sum _{j=1}^{k_i}\lambda ^{i}_j\ell ^{i}_j\\ \text {s.t. }-g^i&=\sum _{j=1}^{k_i}\lambda ^i_j f^i_j,\\ \lambda ^i_j&\ge 0,\quad (j=1,\ldots ,k_i). \end{aligned}$$

In other words, when the polyhedron \(\varXi _i\) is considered for the first time, the variables \((\lambda ^i_j)^*\) can be determined once and for all, after which the constraints (8)–(10) can be replaced by

$$\begin{aligned} e^i-\sum _s y_{s}c^i_{s}-z_0\le C_i, \end{aligned}$$

where \(C_i=\sum _{j=1}^{k_i}(\lambda ^i_j)^* \ell ^i_j\), and where the sum on the left-hand side only extends over the \(n\) indices \(s\) that correspond to test functions that are nonzero on \(\varXi _i\). Thus, only the \(nm+1\) decision variables \((y,z_0)\) are needed to solve (D). Furthermore, the exponentially many constraints correspond to an extremely sparse constraint matrix, making the dual of (D) an ideal candidate to apply the simplex algorithm with delayed column generation. A similar approach is possible for the situation where \(\phi _s\) is of the form

$$\begin{aligned} \phi _{s}(x)=1_{S_{s}}(x)\times \left( \langle v_{s}, x\rangle + c_{s}\right) , \end{aligned}$$

for all \(s=(\iota ,j)\). The advantage of using test functions of this form is that fewer breakpoints \(\xi _{\iota ,j}\) are needed to constrain the distribution appropriately.

4.4 Piecewise Convex Test Functions

When \(\phi _s|_{\varXi _i}\) and \(-h|_{\varXi _i}\) are jointly convex, then \(\varvec{\varphi }_{y,z}(x)\) is convex. The copositivity constraint

$$\begin{aligned} \sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0{{\mathrm{1}}}_{\varPhi }(x) -h(x)\ge 0,\quad (x\in \varXi _i) \end{aligned}$$

can then be written as

$$\begin{aligned} \langle f^i_j, x\rangle \ge \ell ^i_j,\quad (j=1,\ldots ,k_i)\Longrightarrow \varvec{\varphi }_{y,z}(x)\ge 0, \end{aligned}$$

and by Farkas’ Theorem (see ,e.g., [11]), this condition is equivalent to

$$\begin{aligned} \varvec{\varphi }_{y,z}(x)+\sum _{j=1}^{k_i}\lambda ^i_j\left( \ell ^i_j-\langle f^i_j, x\rangle \right)&\ge 0,\quad (x\in \mathbb {R}^n),\\ \lambda ^i_j&\ge 0,\quad (j=1,\ldots ,k_i),\nonumber \end{aligned}$$
(11)

where \(\lambda ^i_j\) are once again auxiliary decision variables. While (11) does not reduce to finitely many constraints, the validity of this condition can be checked numerically by globally minimizing the convex function \(\varvec{\varphi }_{y,z}(x)+\sum _{j=1}^{k_i}\lambda ^i_j\left( \ell ^i_j-\langle f^i_j, x\rangle \right) \). The constraint (11) can then be enforced explicitly if a line-search method is used to solve the dual (D).

4.5 Piecewise Concave Test Functions

When \(\phi _s|_{\varXi _i}\) and \(-h|_{\varXi _i}\) are jointly concave but not linear, then \(\varvec{\varphi }_{y,z}(x)\) is concave and \(\varXi _i={{\mathrm{conv}}}(q^i_1,\ldots ,q^i_{n_i})\) is a polytope. The copositivity constraint

$$\begin{aligned} \sum _{s=1}^{M}y_s\phi _s(x)+\sum _{t=1}^{N}z_t\psi _t(x)+z_0{{\mathrm{1}}}_{\varPhi }(x) -h(x)\ge 0,\quad (x\in \varXi _i) \end{aligned}$$
(12)

can then be written as

$$\begin{aligned} \varvec{\varphi }_{y,z}(q^i_j)\ge 0,\quad (j=1,\ldots ,n_i). \end{aligned}$$

Thus, (12) can be replaced by \(n_i\) linear inequality constraints on the decision variables \(y_s\) and \(z_t\).

4.6 Piecewise Polynomial Test Functions

Another case that can be treated via finitely many constraints is when \(\phi _s|_{\varXi _i}\), \(\psi _t|_{\varXi _i}\), and \(h|_{\varXi _i}\) are jointly polynomial. The approach of Lasserre [7] and Parrilo [10] can be applied to turn the copositivity constraint

$$\begin{aligned} \langle f^i_j, x\rangle \ge \ell ^i_j,\quad (j=1,\ldots ,k_i)\Longrightarrow \varvec{\varphi }_{y,z}(x)\ge 0, \end{aligned}$$

into finitely many linear matrix inequalities. However, this approach is generally limited to low-dimensional applications.

5 Conclusions

Our analysis shows that a wide range of duality relations in use in quantitative risk management can be understood from the single perspective of a generalized duality relation discussed in Sect. 2. An interesting class of special cases is provided by formulating a finite number of constraints in the form of bounds on integrals. The duals of such models are semi-inifinite optimization problems that can often be reformulated as finite optimization problems, by making use of standard results on copositivity.