Distributionally robust shortfall risk optimization model and its approximation
 723 Downloads
 1 Citations
Abstract
Utilitybased shortfall risk measures (SR) have received increasing attention over the past few years for their potential to quantify the risk of large tail losses more effectively than conditional value at risk. In this paper, we consider a distributionally robust version of the shortfall risk measure (DRSR) where the true probability distribution is unknown and the worst distribution from an ambiguity set of distributions is used to calculate the SR. We start by showing that the DRSR is a convex risk measure and under some special circumstance a coherent risk measure. We then move on to study an optimization problem with the objective of minimizing the DRSR of a random function and investigate numerical tractability of the optimization problem with the ambiguity set being constructed through \(\phi \)divergence ball and Kantorovich ball. In the case when the nominal distribution in the balls is an empirical distribution constructed through iid samples, we quantify convergence of the ambiguity sets to the true probability distribution as the sample size increases under the Kantorovich metric and consequently the optimal values of the corresponding DRSR problems. Specifically, we show that the error of the optimal value is linearly bounded by the error of each of the approximate ambiguity sets and subsequently derive a confidence interval of the optimal value under each of the approximation schemes. Some preliminary numerical test results are reported for the proposed modeling and computational schemes.
Keywords
DRSR Kantorovich metric \(\phi \)divergence ball Kantorovich ball Quantitative convergence analysisMathematics Subject Classification
90C15 90C47 90C31 91B301 Introduction
Quantitative measure of risk is a key element for financial institutions and regulatory authorities. It provides a way to compare different financial positions. A financial position can be mathematically characterized by a random variable \(Z: (\varOmega ,\mathscr {F},P) \rightarrow \mathrm{I\!R}\), where \(\varOmega \) is a sample space with sigma algebra \(\mathscr {F}\) and P is a probability measure. A risk measure \(\rho \) assigns to Z a number that signifies the risk of the position. A good risk measure should have some virtues, such as being sensitive to excessive losses, penalizing concentration and encouraging diversification, and supporting dynamically consistent risk managements over multiple horizons [15].
Artzner et al. [1] considered the axiomatic characterizations of risk measures and first introduced the concept of coherent risk measure, which satisfies: (a) positive homogeneity (\(\rho (\alpha Z)=\alpha \rho (Z)\) for \(\alpha \ge 0\)); (b) subadditivity (\(\rho (Z+Y) \le \rho (Z)+ \rho (Y)\)); (c) monotonicity (if \(Z \ge Y\), then \(\rho (Z) \le \rho (Y)\)); (d) translation invariance (if \(m \in \mathrm{I\!R}\), then \(\rho (Z+m)=\rho (Z)m\)). Frittelli and Rosazza Gianin [12], Heath [17] and Föllmer and Schied [9] extended the notion of coherent risk measure to convex risk measure by replacing positive homogeneity and subadditivity with convexity, that is, \(\rho (\alpha Z+(1\alpha ) Y) \le \alpha \rho (Z) +(1\alpha ) \rho (Y)\), for all \(\alpha \in [0,1]\). Obviously positive homogeneity and subadditivity imply convexity but not vice versa. In other words, a coherent risk measure is a convex risk measure but conversely it may not be true.
A wellknown coherent risk measure is conditional value at risk (CVaR) defined by \(\text{ CVaR }_{\alpha }(Z):=\frac{1}{\alpha } \int _0^{\alpha } \text{ VaR }_{\lambda } (Z) d \lambda \), where \(\text{ VaR }_{\lambda } (Z)\) denotes the value at risk (VaR) which in this context is the smallest amount of cash that needs to be added to Z such that the probability of the financial position falling into a loss does not exceed a specified level \(\lambda \), that is, \(\text{ VaR }_{\lambda }(Z):=\inf \{t \in \mathrm{I\!R}: P(Z+t < 0) \le \lambda \}\). In a financial context, CVaR has a number of advantages over the commonly used VaR, and CVaR has been proposed as the primary tool for banking capital regulation in the draft Basel III standard [2]. However, CVaR has a couple of deficiencies.
Dunkel and Weber [7] are perhaps the first to discuss the computational aspects of SR. They characterized SR as a stochastic root finding problem and proposed the stochastic approximation (SA) method combined with importance sampling techniques to calculate it. Hu and Zhang [18] proposed an alternative approach by reformulating SR as the optimal value of a stochastic optimization problem and applying the wellknown sample average approximation (SAA) method to solve the latter when either the true probability distribution is unknown or it is prohibitively expensive to compute the expected value of the underlying random functions. A detailed asymptotic analysis of the optimal values obtained from solving the sample average approximated problem was also provided.
As far as we are concerned, the main contribution of the paper can be summarized as follows. First, we demonstrate that DRSR is the worstcase SR (Proposition 1) and hence it is a convex risk measure. Second, we investigate tractability of (DRSRP) by considering particular cases where the ambiguity set \({\mathcal {P}}\) is constructed respectively through \(\phi \)divergence ball and Kantorovich ball. Since the structure of \({\mathcal {P}}\) often involves sample data, we analyse convergence of the ambiguity set as the sample size increases (Propositions 3 and 5). To quantify how the errors arising from the ambiguity set propagate to the optimal value of (DRSRP), we then show under some moderate conditions that the error of the optimal value is linearly bounded by the error of the ambiguity set and subsequently derive finite sample guarantee (Theorem 1) and confidence intervals for the optimal value of (DRSRP) associated with the ambiguity sets (Theorem 2 and Corollary 1). Finally, as an application, we apply the (DRSRP) model to a portfolio management problem and carry out various outofsample tests on the numerical schemes for the (DRSRP) model with simulated data and real data (Sect. 5).
The rest of the paper is organised as follows. In Sect. 2, we present the properties of DRSR, that is, it is a convex risk measure and it is the worstcase SR. In Sect. 3, we derive the formulation of (DRSRP) when the ambiguity set is constructed through \(\phi \)divergence ball and Kantorovich ball and then establish the convergence of ambiguity sets as sample size increases. In Sect. 4, the finite sample guarantees on the quality of the optimal solutions and convergence of the optimal values as the sample size increases are discussed. In Sect. 5, we report results of numerical experiments.
2 Properties of DRSR
Proposition 1
Proof
Remark 1
 (i)The relationship established in (6) means that DRSR is the worstcase SR. This observation allows one to calculate DRSR via SR for each \(P\in {\mathcal {P}}\) if it is easy to do so. Moreover, Giesecke et al. [15] showed that SR is a coherent risk measure if and only if the loss function l takes a specific form:where \([z]_\) denotes the negative part of z and \([z]_+\) denotes the positive part. In this case, the SR gives rise to an expectile, see [3, Theorem 4.9]. Using this result, we can easily show through equation (6) that \(\text{ DRSR }\) is a coherent risk measure when l takes the specific form in that the operation \(\sup _{P\in {\mathcal {P}}}\) preserves positive homogeneity and subadditivity.$$\begin{aligned} l(z):=\lambda \alpha [z]_ +\beta [z]_+, \,\beta \ge \alpha \ge 0, \end{aligned}$$
 (ii)
The restriction of Z to \(L^{\infty }\) implies that the support^{1} of the probability distribution of Z is bounded. This condition may be relaxed to the case when there exist \(t_l, t_u \in \mathrm{I\!R}\) such that \(\sup _{P \in {\mathcal {P}}} {\mathbb {E}}_P[l(Zt_l)] >\lambda \) and \(\sup _{P \in {\mathcal {P}}} {\mathbb {E}}_P[l(Zt_u)] < \lambda \), see [18].
We now move on to discuss the property of DRSR when it is applied to a random function. This is to pave a way for us to develop full investigation on (DRSRP) in Sects. 3 and 4. To this end, we need to make some assumptions on the random function \(c(\cdot ,\cdot )\) and the loss function \(l(\cdot )\). Throughout this section, we use \(\varXi \) to denote the image space of random variable \(\xi (\omega )\) and \(\mathscr {P}(\varXi )\) to denote the set of all probability measures defined on the measurable space \((\varXi , \mathscr {B})\) with Borel sigma algebra \(\mathscr {B}\). To ease notation, we will use \(\xi \) to denote either the random vector \(\xi (\omega )\) or an element of \(\mathrm{I\!R}^k\) depending on the context.
Assumption 1
The proposition below summarises some important properties of \(l(c(x,\xi )t)\) and \(\displaystyle \sup \nolimits _{P\in {\mathcal {P}}}{\mathbb {E}}_P[l(c(x,\xi )t)]\lambda \) as a function of (x, t).
Proposition 2
 (i)
Under Assumption 1 (b) and (c), \(g(\cdot ,\cdot ,\xi )\) is convex w.r.t. (x, t) for each fixed \(\xi \in \varXi \), \(g(x,t,\cdot )\) is uniformly Lipschitz continuous w.r.t. \(\xi \) with modulus \(L \kappa \), and v(x, t) is a convex function w.r.t. (x, t).
 (ii)If, in addition, Assumption 1 (a) holds and \(\lambda \) is a prespecified constant in the interior of the range of l, then there exist a point \((x_0,t_0) \in X \times \mathrm{I\!R}\) and a constant \(\eta >0\) such thatand \(\text{(DRSRP) }\) has a finite optimal value.$$\begin{aligned} \displaystyle \sup _{P\in {\mathcal {P}}}{\mathbb {E}}_P[l(c(x_0,\xi )t_0)]\lambda <\eta \end{aligned}$$(7)
Proof

Part (i). It is well known that the composition of a convex function by a monotonic increasing convex function preserves convexity. The remaining claims can also be easily verified.

Part (ii). Since \(c(x,\xi )\) is finite valued and convex in x, it is continuous in x for each fixed \(\xi \). Together with its uniform continuity in \(\xi \), we are able to show that \(c(x,\xi )\) is continuous over \(X\times \varXi \). By the boundedness of X and \(\varXi \), there is a positive constant \(\alpha \) such that \(c(x,\xi )\le \alpha \) for all \((x,\xi )\in X\times \varXi \). With the boundedness of c and the monotonic increasing, convex and nonconstant property of l, we can easily show Part (ii) analogous to the proof of the first part of Proposition 1. We omit the details. \(\square \)
3 Structure of (DRSRP’) and approximation of the ambiguity set
In the literature of distributionally robust optimization, various statistical methods have been proposed to build ambiguity sets based on available information of the underlying uncertainty, see for instance [27, 28] and the references therein. Here we consider \(\phi \)divergence ball and Kantorovich ball approaches and discuss tractable formulations of the corresponding (DRSRP’).
3.1 Ambiguity set constructed through \(\phi \)divergence
Let us now consider the case that the only available information about the random vector \(\xi \) is its empirical data and the size of such data is limited (not very large). In stochastic programming, a wellknown approach in such situation is to use empirical distribution constructed through the data to approximate the true probability distribution. However, if the sample size is not big enough or there is a reason from computational point of view to use a small size of empirical data (e.g., in multistage decisionmaking problems), then the quality of such approximation may be compromised. \(\phi \)divergence is subsequently proposed to address this dilemma.
 (a)
KullbackLeibler: \(I_{\phi _{KL}}(p,q)=\sum _i p_i \log \left( \frac{p_i}{q_i}\right) \) with \(\phi _{KL}(t)=t \log tt+1\);
 (b)
Burg entropy: \(I_{\phi _B}(p,q)=\sum _i q_i \log \left( \frac{q_i}{p_i}\right) \) with \(\phi _B(t)=\log t+t1\);
 (c)
Jdivergence: \(I_{\phi _J}(p,q)=\sum _i (p_iq_i) \log \left( \frac{p_i}{q_i}\right) \) with \(\phi _J(t)=(t1) \log t\);
 (d)
\(\chi ^2\)distance: \(I_{\phi _{\chi ^2}}(p,q)=\sum _i \frac{(p_iq_i)^2}{p_i}\) with \(\phi _{\chi ^2}(t)=\frac{1}{t} (t1)^2\);
 (e)
Modified \(\chi ^2\)distance: \(I_{\phi _{m\chi ^2}}(p,q)=\sum _i \frac{(p_iq_i)^2}{q_i}\) with \(\phi _{m\chi ^2}(t)= (t1)^2\);
 (f)
Hellinger distance: \(I_{\phi _H}(p,q)=\sum _i (\sqrt{p_i}\sqrt{q_i})^2\) with \(\phi _H(t)=(\sqrt{t}1)^2\);
 (g)
Variation distance: \(I_{\phi _V}(p,q)=\sum _i p_iq_i\) with \(\phi _V(t)=t1\).
Lemma 1
 (i)
\(I_{\phi _V}(p,q) \le \min \left( \sqrt{2I_{\phi _{KL}}(p,q)}, \sqrt{2I_{\phi _{B}}(p,q)},\sqrt{I_{\phi _{J}}(p,q)}, \sqrt{I_{\phi _{\chi ^2}}(p,q)},\right. \)
\(\left. \sqrt{I_{\phi _{m\chi ^2}}(p,q)}\right) ; \)
 (ii)
\(I_{\phi _H}(p,q) \le I_{\phi _V}(p,q) \le 2\sqrt{I_{\phi _H}(p,q)}\).
We omit the proof as the results can be easily derived by the divergence functions \(\phi \).
It is important to note that the reformulation (11) relies heavily on the discrete structure of the nominal distribution. Note that it is possible to use a continuous distribution for the nominal distribution, in which case the summation in the first constraint of problem (11) will become \({\mathbb {E}}[\phi ^*([l(c(x,\zeta )t)\tau ]/u)]\) (before introducing new variables \(s_i\)). In such a case, we will need to use SAA approach to deal with the expected value.
The reallocation of the probabilities through Voronoi partition provides an effective way to reduce the scenarios of the discretized problem and hence the size of problem (11). It remains to be explained how the ambiguity set approximates the true probability distribution.
Proposition 3
Proof
For general \(\phi \)divergences, we are unable to establish the quantitative convergence as in Proposition 3. However, if \(P^*\) follows a discrete distribution with support \(\{\zeta ^1,\ldots , \zeta ^M\}\), the following qualitative convergence result holds.
Proposition 4
[19, Proposition 2] Suppose that \(\phi (t) \ge 0\) has a unique root at \(t=1\) and the samples are independent and identically distributed from the true distribution \(P^*\). Then \( \mathbb {H}_K({\mathcal {P}}_N^M,P^*)\rightarrow 0, \text{ w.p.1 }, \) as \(N \rightarrow \infty \), where r is defined as in (25).
Note that in [19, Proposition 2] the convergence is established under the total variation metric, since the probability distributions here are discrete, the convergence is equivalent to that under the Kantorovich metric. We refer readers to [19] for the details of the proof.
3.2 Kantorovich ball
Proposition 5
Proof
Let us now estimate the first term in (32), i.e., \(\mathsf {dl}_K(P_N,P^*)\). By the definition of \(r_N(\delta )\), we have with probability \(1\delta \), \(\mathsf {dl}_K(P_N,P^*) \le r_N(\delta )\). The conclusion follows. \(\square \)
Before concluding this section, we note that it is possible to use other statistical methods for constructing the ambiguity sets such as moment conditions and mixture distribution, we omit them due to limitation of the length of the paper, interested readers may find them in [16] and references therein.
4 Convergence of (DRSRP’)
In Sect. 3, we discussed two approaches for constructing the ambiguity set of the (DRSRP’) model, each of which is defined through iid samples.
Theorem 1
 (i)
Suppose the true probability distribution \(P^*\) is discrete, i.e., \(\varXi =\{\zeta ^1,\ldots ,\zeta ^M\}\). Let \({\mathcal {P}}_N^M\) be defined as (10) with r being given as (25), then with \({\mathcal {P}}_N={\mathcal {P}}_N^M\), the finite sample guarantee (37) holds.
 (ii)
Let \({\mathcal {P}}_N\) be defined as in (26) with \(r=r_N(\delta )\) being given in (29). Under condition (27), the finite sample guarantee (37) holds.
Proof
The results follow straightforwardly from (25), (28), (29) and the definition of finite sample guarantee. \(\square \)
We now move on to investigate convergence of \({\vartheta }_N\) and \(S_N\). From the discussion in Sect. 3, we know that \(\mathbb {H}_K({\mathcal {P}}_N,P^*)\rightarrow 0\). However, to broaden the coverage of the convergence results, we present them by considering a slightly more general case with \(P^*\) being replaced by a set \({\mathcal {P}}^*\).
Theorem 2
Proof
Now, we move on to show (39). Let \((x_N,t_N) \in S_N\). Since X and T are compact, there exist a subsequence \(\{(x_{N_k},t_{N_k})\}\) and a point \((\hat{x},\hat{t})\in X\times T\) such that \((x_{N_k},t_{N_k})\rightarrow (\hat{x},\hat{t})\). It follows by (43) and (38) that \( (\hat{x},\hat{t})\in {\mathcal {F}}^*\) and \(\hat{t}= {\vartheta }^*\). This shows \((\hat{x},\hat{t}) \in S^*\). \(\square \)
Theorem 2 is instrumental in that it provides a unified quantitative convergence result for the optimal value of \({\hbox {(DRSRP'N)}}\) in terms of \(\mathbb {H}_K({\mathcal {P}}_N,{\mathcal {P}}^*)\) when \({\mathcal {P}}_N\) is constructed in various ways discussed in Sect. 3. Based on the theorem and some quantitative convergence results about \(\mathbb {H}_K({\mathcal {P}}_N,{\mathcal {P}}^*)\), we can establish confidence intervals for the true optimal value \({\vartheta }^*\) in the following corollary.
Corollary 1
 (i)If \({\mathcal {P}}^*\) comprises the true probability distribution only and \({\mathcal {P}}_N\) is defined by (10), then under conditions of Proposition 3, \( {\vartheta }^* \in [{\vartheta }_N\varTheta , {\vartheta }_N+\varTheta ] \) with probability \(1M\delta \), wherewith \(\varDelta (M,N,\delta )=\min \left( \frac{M}{\sqrt{N}}\left( 2+\sqrt{2 \ln \frac{1}{\delta }}\right) , 4+ \frac{1}{\sqrt{N}}\left( 2+\sqrt{2 \ln \frac{1}{\delta }}\right) \right) \), \(\beta _M\) being defined as in (13) and D being the diameter of \(\varXi \).$$\begin{aligned} \varTheta :=\frac{2 D_XL \kappa }{\eta }[\beta _M+\frac{D}{2} (\max \{2\sqrt{r},r\} +\varDelta (M,N,\delta ))] \end{aligned}$$
 (ii)If \({\mathcal {P}}^*\) comprises the true probability distribution only and \({\mathcal {P}}_N\) is defined by (26), then under conditions of Proposition 5,with probability \(1\delta \).$$\begin{aligned} {\vartheta }^* \in \left[ {\vartheta }_N\frac{4 D_XL \kappa r_N(\delta )}{\eta }, {\vartheta }_N+\frac{4 D_XL \kappa r_N(\delta )}{\eta }\right] \end{aligned}$$
4.1 Extension
Let \(\hat{{\mathcal {F}}}\), \(\hat{S}\) and \(\hat{{\vartheta }}\) denote respectively the feasible set, the set of the optimal solutions and the optimal value of (DRSRCP). Likewise, we define \(\hat{{\mathcal {F}}}_N\), \(\hat{S}_N\) and \(\hat{{\vartheta }}_N\) for its approximate problem \(\text{(DRSRCPN) }\).
Theorem 3
 (i)There is a constant \(C>0\) such thatfor N sufficiently large.$$\begin{aligned} {\mathbb {H}}(\hat{{\mathcal {F}}}_N,\hat{{\mathcal {F}}}) \le C\mathbb {H}_K({\mathcal {P}}_N,{\mathcal {P}}) \end{aligned}$$
 (ii)
\( \displaystyle \lim \nolimits _{N \rightarrow \infty } \hat{{\vartheta }}_N=\hat{{\vartheta }}\,\text{ and }\, \displaystyle \limsup \nolimits _{N\rightarrow \infty } \hat{S}_N =\hat{S}. \)
 (iii)If, in addition, f is Lipschitz continuous with modulus \(\beta \), thenMoreover, if \(\text{(DRSRCP) }\) satisfies the second order growth condition at the optimal solution set \(\hat{S}\), i.e., there exist positive constants \(\alpha \) and \(\varepsilon \) such that$$\begin{aligned} \hat{{\vartheta }}_N\hat{{\vartheta }} \le \beta {\mathbb {H}}(\hat{{\mathcal {F}}}_N,\hat{{\mathcal {F}}}). \end{aligned}$$(46)then$$\begin{aligned} f(x)\hat{{\vartheta }} \ge \alpha d(x,\hat{S})^2,\, \forall x \in \hat{{\mathcal {F}}} \cap (\hat{S}+\varepsilon \mathbb {B}), \end{aligned}$$when N is sufficiently large.$$\begin{aligned} \mathbb {D}(\hat{S}_N,\hat{S}) \le \max \left\{ 2C, \sqrt{8C\beta /\alpha }\right\} \sqrt{\mathbb {H}_K({\mathcal {P}}_N,{\mathcal {P}})} \end{aligned}$$(47)
Proof

Part (i) can be established through an analogous proof of Theorem 2. We omit the details.
 Part (ii). First we rewrite (DRSRCP) and (DRSRCPN) aswhere \(\delta _{\hat{{\mathcal {F}}}}(x)\) is the indicator function of \(\hat{{\mathcal {F}}}\), i.e., \( \delta _{\hat{{\mathcal {F}}}}(x):= \left\{ \begin{array}{ll} 0, &{}\text{ if }\, x\in \hat{{\mathcal {F}}},\\ +\infty , &{}\text{ if }\, x\notin \hat{{\mathcal {F}}}. \end{array} \right. \) Note that the epigraph of \(\delta _{\hat{{\mathcal {F}}}}(\cdot )\) is defined as$$\begin{aligned} \displaystyle \inf _{x \in \mathrm{I\!R}^n} \tilde{f}(x):=f(x)+\delta _{\hat{{\mathcal {F}}}}(x) \quad \text{ and } \quad \displaystyle \inf _{x \in \mathrm{I\!R}^n} \tilde{f}_N(x):=f(x)+\delta _{\hat{{\mathcal {F}}}_N}(x), \end{aligned}$$The convergence of \(\hat{{\mathcal {F}}}_N\) to \(\hat{{\mathcal {F}}}\) implies \( \displaystyle \lim \nolimits _{N \rightarrow \infty } \text{ epi }\,\delta _{\hat{{\mathcal {F}}}_N}(\cdot )=\text{ epi }\,\delta _{\hat{{\mathcal {F}}}}(\cdot ), \) and through [24, Definition 7.39] that \(\delta _{\hat{{\mathcal {F}}}_N}(\cdot )\) epiconverges to \(\delta _{\hat{{\mathcal {F}}}}(\cdot )\). Furthermore, it follows from [24, Theorem 7.46] that \(\tilde{f}_N\) epiconverges to \(\tilde{f}\). Since f is continuous and \(\hat{{\mathcal {F}}}\) and \(\hat{{\mathcal {F}}}_N\) are compact sets, then any sequence \(\{x_N\}\) in \(\hat{S}_N\) has a subsequence converging to \(\overline{x}\). By [6, Proposition 4.6], \( \displaystyle \lim \nolimits _{N \rightarrow \infty } \hat{{\vartheta }}_N=\hat{{\vartheta }} \) and \(\overline{x}\in \hat{S}\).$$\begin{aligned} \text{ epi }\,\delta _{\hat{{\mathcal {F}}}}(\cdot ):=\{(x,\alpha ): \delta _{\hat{{\mathcal {F}}}}(x) \le \alpha \}=\hat{{\mathcal {F}}} \times \mathrm{I\!R}_+. \end{aligned}$$
Analogous to Corollary 1, we can derive confidence intervals and regions for the optimal values with different \({\mathcal {P}}_N\).
5 Application in portfolio optimization
Our main numerical experiments focus on problem (49) with the ambiguity set being defined through the Kantorovich ball. We report the details in Example 1.
Example 1
In the first set of experiments, we investigate the impact of the radius of the Kantorovich ball r on the outofsample performance of the optimal portfolio. For any fixed portfolio \(x_N(r)\) obtained from problem (51), the outofsample performance is defined as \( J(x_N(r)):=\text{ SR }^{P^*}_{l,\lambda }(x_N(r)^T\xi ), \) which can be computed from theoretical point of view since the true probability distribution \(P^*\) is known by design although in the experiment we will generate a set of validation samples of size \(2\times 10^5\) to do the evaluation. Following the same strategy as in [8], we generate the training datasets of cardinality \(N \in \{30,300,3000\}\) to solve problem (51) and then use the same validation samples to evaluate \(J(x_N(r))\). Each of the experiments is carried out through 200 simulation runs.
Figure 1 depicts the tubes between the 20 and \(80\%\) quantiles (shaded areas) and the means (solid lines) of the outofsample performance \(J(x_N(r))\) as a function of radius r, the dashed lines represent the empirical probability of the event \(J(x_N(r)) \le J_N(r)\) with respect to 200 independent runs which is called reliability in Esfahani and Kuhn [8]. It is clear that the reliability is nondecreasing in r and this is because the true probability distribution \(P^*\) is located in \({\mathcal {P}}_N\) more likely as r grows and hence the event \(J(x_N(r)) \le J_N(r)\) happens more likely. The outofsample performance of the portfolio improves (decreases) first and then deteriorates (increases).
Figure 2a shows the tubes between the 20 and \(80\%\) quantiles (shaded areas) and the means (solid lines) of the outofsample performance \(J(x_N)\) as a function of the sample size N based on 200 independent simulation runs, where \(x_N\) is the minimizer of (51) and its SAA counterpart (\(r=0\)). The constant dashed line represents the optimal value of the SAA problem with \(N=10^6\) samples which is regarded as the optimal value of the original problem with the true probability distribution. It is observed that the DRO model (51) outperforms the SAA model in terms of outofsample performance. Figure 2b depicts the optimal values of the DRO model and the SAA counterpart, which is the insample estimate of the obtained portfolio performance. Both of the approaches display asymptotic consistency, which is consistent with the outofsample and insample results. Figure 2c describes the empirical probability of the event \(J(x_N) \le J_N\) with respect to 200 independent runs, where \(x_N\) is the optimal value of the DRO model or SAA model, and \(J_N\) are the optimal value of the corresponding problems. It is clear that the performance of the DRO model is better than that of the SAA model.
Example 2
In the last experiment, we evaluate the performance of problem (49) with the ambiguity set being constructed through the KLdivergence ball and the Kantorovich ball, we have also undertaken tests on problem (49) with 10 stocks (Apple Inc., Amazon.com, Inc., Baidu Inc., Costco Wholesale Corporation, DISH Network Corp., eBay Inc., Fox Inc., Alphabet Inc Class A, Marriott International Inc., QUALCOMM Inc.) where their historical data are collected from National Association of Securities Deal Automated Quotations (NASDAQ) index over 4 years (from 3rd May 2011 to 23rd April 2015) with total of 1000 records on the historical stock returns.
We have carried out outofsample tests with a rolling window of 500 days, that is, we use the first 500 data to calculate the optimal portfolio strategy for day 501 and then move on a rolling basis. The radiuses in the two ambiguity sets are selected through the cross validation method. Figure 3 depicts the performance of three models over 500 trading days. It seems that the KLdivergence model and SAA model perform similarly, whereas the Kantorovich model outperforms the both over most of the time period.
Footnotes
 1.
The support of the probability distribution P is the smallest closed set \(C \subset \mathrm{I\!R}\) such that \(P(C)=1\).
Notes
Acknowledgements
We would like to thank Peyman M. Esfahani for sharing with us some programmes for generating Figs. 1 and 2 and instrumental discussions about implementation of the numerical experiments. We would also like to thank three anonymous referees and the Guest Editor for insightful comments which help us significantly strengthen the paper.
References
 1.Artzner, P., Delbaen, F., Eber, J.M., Health, D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Basel Committee on Banking Supervision: Fundamental review of the trading book: a revised market risk framework, Bank for International Settlements (2013). http://www.bis.org/publ/bcbs265.htm
 3.Bellini, F., Bignozzi, V.: On elicitable risk measures. Quant. Finance 15, 725–733 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 4.BenTal, A., den Hertog, D., De Waegenaere, A., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59, 341–357 (2013)CrossRefGoogle Scholar
 5.Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968)zbMATHGoogle Scholar
 6.Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)CrossRefzbMATHGoogle Scholar
 7.Dunkel, J., Weber, S.: Stochastic root finding and efficient estimation of convex risk measures. Oper. Res. 58, 1505–1521 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Esfahani, P.M., Kuhn, D.: Datadriven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. (2017). https://doi.org/10.1007/s1010701711721 zbMATHGoogle Scholar
 9.Föllmer, H., Schied, A.: Convex measures of risk and trading constraints. Finance Stochast. 6, 429–447 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Föllmer, H., Schied, A.: Stochastic FinanceAn Introduction in Discrete Time. Walter de Gruyter, Berlin (2011)CrossRefzbMATHGoogle Scholar
 11.Fournier, N., Guilline, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162, 707–738 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Frittelli, M., Rosazza Gianin, E.: Putting order in risk measures. J. Bank. Finance 26, 1473–1486 (2002)CrossRefGoogle Scholar
 13.Gao, R., Kleywegt, A.J.: Distributionally robust stochastic optimization with Wasserstein distance (2016). arXiv preprint arXiv:1604.02199
 14.Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419–435 (2002)CrossRefzbMATHGoogle Scholar
 15.Giesecke, K., Schmidt, T., Weber, S.: Measuring the risk of large losses. J. Invest. Manag. 6, 1–15 (2008)Google Scholar
 16.Guo, S., Xu, H.: Distributionally Robust Shortfall Risk Optimization Model and Its Approximation (2018). http://www.personal.soton.ac.uk/hx/research/Published/Manuscript/2018/Shaoyan/DRSR20Feb_2018_online.pdf
 17.Heath, D.: Back to the future. Plenary Lecture at the First World Congress of the Bachelier Society, Paris (2000)Google Scholar
 18.Hu, Z., Zhang, D.: Convex risk measures: efficient computations via Monte Carlo (2016). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2758713
 19.Love, D., Bayraksan, G.: Phidivergence constrained ambiguous stochastic programs for datadriven optimization, available on Optimization Online (2016)Google Scholar
 20.Moulton, J.: Robust fragmentation: a datadriven approach to decisionmaking under distributional ambiguity. Ph.D. Dissertation, University of Minnesota (2016)Google Scholar
 21.Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC, Boca Raton (2005)CrossRefzbMATHGoogle Scholar
 22.Pflug, G.C., Pichler, A.: Multistage Stochastic Optimization. Springer, Cham (2014)CrossRefzbMATHGoogle Scholar
 23.Robinson, S.M.: An application of error bounds for convex programming in a linear space. SIAM J. Control 13, 271–273 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)CrossRefzbMATHGoogle Scholar
 25.ShaweTaylor, J., Cristianini, N.: Estimating the moments of a random vector with applications. In: Proceedings of GRETSI 2003 Conference, pp. 47–52 (2003)Google Scholar
 26.Weber, S.: Distributioninvariant risk measures, information, and dynamic consistency. Math. Finance 16, 419–442 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 27.Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62, 1358–1376 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Xu, H., Liu, Y., Sun, H.: Distributionally robust optimization with matrix moment constraints: lagrange duality and cuttingplane methods. Math. Program. (2017). https://doi.org/10.1007/s1010701711436 zbMATHGoogle Scholar
 29.Zhao, C., Guan, Y.: Datadriven riskaverse stochastic optimization with Wasserstein metric. Available on Optimization Online (2015)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.