A Review on Ambiguity in Stochastic Portfolio Optimization

In mean-risk portfolio optimization, it is typically assumed that the assets follow a known distribution P0, which is estimated from observed data. Aiming at an investment strategy which is robust against possible misspecification of P0, the portfolio selection problem is solved with respect to the worst-case distribution within a Wasserstein-neighborhood of P0. We review tractable formulations of the portfolio selection problem under model ambiguity, as it is called in the literature. For instance, it is known that high model ambiguity leads to equally-weighted portfolio diversification. However, it often happens that the marginal distributions of the assets can be estimated with high accuracy, whereas the dependence structure between the assets remains ambiguous. This leads to the problem of portfolio selection under dependence uncertainty. We show that in this case portfolio concentration becomes optimal as the uncertainty with respect to the estimated dependence structure increases. Hence, distributionally robust portfolio optimization can have two very distinct implications: Diversification on the one hand and concentration on the other hand.

and the constraints are well known, deterministic functions. This strict assumption has been weakened even in the early work of George Dantzig [9] by allowing some parameters to be unknown, but assuming that they are realizations of known distributions and thus random. Other pioneering works in the area of stochastic programming include Charnes and Cooper [7], Tinter [53] and Wets [57]. More recently, the assumption of known distributions has been replaced by the weaker assumption that only a set of distributions is known which includes the unknown true distribution. We call this set of distributions ambiguity set. Optimizing with respect to the worst case distribution, i.e., the distribution contained in the ambiguity set which yields the worst value of the objective function, leads to distributionally robust decisions.
In this context, the portfolio selection problem is arguably one of the most prominent problems. Introduced by Markowitz in the early 1950s, more than 25000 1 citations of his celebrated work [34] speak for themselves. We refer to Brandt [6] and Steinbach [52] for surveys listing some of the most important contributions in the vast field of portfolio selection.
Adapting the ideas mentioned above, this paper discusses portfolio selection strategies which are robust with respect to misspecification of the assets' distribution. In other words, we account for model uncertainty in the portfolio selection problem. Hence, our approach falls into the field of robust portfolio optimization. Two perceptive reviews of the developments in this area were written by Fabozzi et al. [17] and Kim et al. [28]. The present paper differs from most of the existing literature in that we rely on a purely non-parametric approach which builds on the Wasserstein distance to construct the ambiguity set. Within this setting, we contrast two fundamentally different ways to address the problem: Firstly, we consider the case of model uncertainty with respect to the model for the joint distribution of the portfolio's assets, which we will refer to as model ambiguity. Secondly, we assume that the marginal distribution of the assets are known and the model uncertainty lies solely on the level of the dependence structure. The latter case is known as dependence uncertainty. The gist of our analysis is that when increasing the level of model uncertainty, portfolio diversification becomes optimal, while in the latter case of dependence uncertainty portfolio concentration becomes optimal.
The remainder of this paper is structured as follows. Section 2 introduces ambiguity in stochastic optimization in a general context. The portfolio selection problem under model ambiguity is studied in Section 3, whereas Section 4 looks at dependence uncertainty in the context of portfolio selection. The empirical study in Section 5 compares the two presented approaches. Section 6 concludes.

Ambiguity Sets
The concept that probability models are given by ambiguity sets is quite popular in statistics as there is a plethora of parametric and semi-parametric models, which are based on confidence regions of parameters. The basic construction for parametric models, i.e., probability models of the form (P θ , θ ∈ ), allows us to define an ambiguity set as a subset (P θ , θ ∈ˆ ), which is determined by a parametric estimateθ of θ and some confidence regionˆ around it. Typical examples include: -The scenario distribution stems from a normal distribution N(μ, σ 2 ), where μ and σ are only known to lie within given bounds, i.e., μ 1 ≤ μ ≤ μ 2 and σ 2 1 ≤ σ 2 ≤ σ 2 2 . -The multivariate distributions possess first and second moments, where the mean vector μ satisfies μ 1 ≤ μ ≤ μ 2 (componentwise) and the covariance matrix satisfies 1 2 (in the sense of positive semidefinite ordering).
When it comes to semi-parametric and fully nonparametric ambiguity models, we have seen different approaches emerge in recent years. For instance, ambiguity set containing all distributions with given constraints on the moments are studied e.g. in [10,13,31]. This approach can be extended to ambiguity sets which are characterized, additionally to the moment bounds, by constraints on structural properties auch as symmetry, unimodality or independence patterns, see Hanasusanto et al. [24]. Another field of research deals with sample average approximations, where ambiguity sets appear as confidence regions, see e.g. [5,37,59]. An overview of these and related topics can be found in the book by Consigli, Kuhn and Brandimarte [8].
In contrast to the cited concepts above, this paper relies on a very intuitive approach, where ambiguity sets are given by some baseline model P 0 and a ball around P 0 with respect to some (pseudo-) distance for probability measures. Examples for such distances are: == the Wasserstein distance (here of order 1) 2 -the Fortet-Mourier distance where L(x, y) = max( x r , y r , 1); -the relative entropy (Kullback-Leibler pseudo-distance) where K is a Vapnik-Cervonenkis class, see Shorack and Wellner [51].
More details on the ideas behind the above definitions can be found in [41]. The following facts are well known: The bounded Lipschitz distance metricizes the weak topology of probability measures and therefore the empirical measure converges to the true one a.s. in this distance. If the true measure has a finite first moment, then a.s. convergence also holds with respect to the Wasserstein distance. Existence of higher moments is required for the convergence in Fortet-Mourier distance. It was shown in [40] that the Fortet-Mourier distance is a special case of the Wasserstein distance with a different metric on the real line. In contrast, the relative entropy and the VC distance generate stronger topologies implying that the empirical measure does not converge to the true one in these topologies. For this reason, these distances are excluded from our considerations. If a first moment exists, then the a.s. convergence of the empirical measure to the true measure in Wasserstein distance is an easy consequence of the law of large numbers and the dominated convergence theorem. However, rates of convergence have only been established recently, see Fournier and Guilin [21]. While the relative entropy as a concept for closedness of models was introduced by Hansen and Sargent [25], Wasserstein neighborhoods were considered for the first time by Pflug and Wozabal [44] and recently followed by Esfahani and Kuhn [16] as well as Gao and Klowegt [22] and Ji and Lejeune [26]. In the remainder of this paper we restrict our considerations to ambiguity sets defined by Wasserstein neighborhoods. In particular, we set the distance d(x, y) = ||x − y|| 1 (1). This choice is motivated by the fact that the expected return as well as the Average-Value-at-Risk functional, defined below, are Lipschitz continuous with respect to this distance. This property will prove to be useful when we discuss the portfolio selection problem below.
In all the approaches cited above, the model uncertainty aspect in decision making is taken into account by the following concept: While for a single model P one considers the standard stochastic program where Q is a cost function depending on the decision x and the random part ξ , in case of an ambiguity set P one solves the minimax stochastic program Such a problem formulation is called risk-neutral. As an extension, we might also look at the risk-averse formulation where R is some risk-functional. The existence of a solution of (3) is related to the existence of a saddle point of f (x, P ) : If (x * , P * ) is a saddle point, then x * is the minimizer of x → max P f (x, P ) and P * ∈ argmax P f (x * , P ). The converse is not true: by searching for all saddle points, one in general does not obtain all possible solutions of the minimax problem. In order to characterize all solutions of the minimax problem as saddle points, one has to impose stronger global conditions, such as quasi-convexity and quasi-concavity: A function (x, P ) → f (x, P ) is quasi convex-concave if {x : f (x, P ) ≤ c} is convex or empty for all P ∈ P and all c ∈ R and at the same time {P : f (x, P ) ≥ c} is convex for all x ∈ X and all c ∈ R. For such functions, Sion's famous saddle point Theorem holds, which is given in Appendix B.

Portfolio Selection Under Model Ambiguity
We start this section by reviewing the classical mean-risk framework, which goes back to Markowitz [34] and his mean-variance approach. In the literature, alternatives to Markowitz' choice of the variance as a risk functional have been proposed. For example, the semi-variance [35], the mean absolute deviation [29], the Value-at-Risk [4], the Average-Value-at-Risk [47] or, more generally, convex risk measures [19] as well as coherent risk measures [3] were proposed.
In the remainder of this paper, let the random vector ξ = (ξ 1 , . . . , ξ m ) describe the random return of the m risky assets in the capital market under consideration. Let the decision variable x = (x 1, . . . , x m ) ∈ X := x ∈ R m + : m j =1 x j = 1 denote the proportion of the portfolio to be invested in the m risky assets. Notice that we do not allow for short-selling. Moreover, assume that λ ∈ R + quantifies the investor's risk-aversion and R is her preferred risk functional. The portfolio selection problem can then be formulated as the following single-stage stochastic program where P refers to the known probability distribution of the random vector ξ representing the asset returns. To put problem (4) into words, we aim to maximize the risk adjusted expected return of the portfolio, where the risk adjustment is controlled by the risk aversion coefficient λ as well as the risk functional R. Together λ and R are assumed to capture the investor's risk preferences. Notice that for a fixed probability space ( , σ, μ) the risk functional R : ( , σ, μ) → R assigns a real value to the random variable X, which represent the random future losses. To be more precise, R(X) quantifies the riskiness of X which means high values of R indicate more risk and hence a less desirable situation. We choose the Average-Value-at-Risk as a risk functional throughout most of our analysis: where F −1 X denotes the inverse distribution function of the random variable X. Notice that this definition implies that values of α are typically chosen close to 1 as we consider the upper tail of the loss distribution. As shown in Rockafellar and Uryasev [47], the AV@R allows to reformulate problem (4) as a linear program in case P is a discrete probability distribution.
In practice, the distribution P of the future returns is typically not known. However, statistical methods allow us to come up with an -hopefully accurate -estimate P 0 of the true, unknown probability distribution P . Therefore, it seems reasonable to assume that the decision maker takes the available information in form of P 0 into account, while incorporating the ambiguity with respect to P 0 into her decision. We model this situation by defining the ambiguity set as the set of distributions, whose Wasserstein distance to the so-called reference distribution P 0 does not exceed a certain threshold κ. That is, we define the ambiguity set as where P (R m ) denotes the space of all Borel probability measures on R m . We can now formulate the distributionally robust counterpart of the stochastic program (4) as the following maximin problem As stated in Wozabal [58], problem (5) is algorithmically intractable for the following three reasons: Firstly, it contains infinitely many constraints. Secondly, the solutions of the maximin problem (5) are elements of an infinite dimensional space. Lastly, solving for a saddle point of the objective function is usually harder than simply finding a maximum or a minimum. Nevertheless, three distinct approaches to solve problem (5) overcome the mentioned difficulties: Pflug and Wozabal [44], who first considered the problem, restricted their analysis to all the discrete measures with atoms identical to those of P 0 , which is assumed to be the empirical measureP n constructed from n observations of the random vector ξ. They then proposed a successive convex programming solution method to solve the resulting semi-infinite program. As an alternative, Pflug et al. [42] directly characterized the worst case distributions Q , which satisfies d W (P 0 , Q ) = κ. Hence, problem (5) can be solved by a linear program. Recently, Esfahani and Kuhn [16] derived a tractable reformulation of (5) as an LP by relying on state-of-the-art global optimization techniques. Loosely speaking, the authors change the location of the atoms of the empirical measureP n rather than their probability weights, as done in [44]. Figure 1 illustrates this difference. In the remaining part of this section, we review and compare these three approaches to solve problem (5).
Proceeding in chronological order, we start our literature review with the paper by Pflug and Wozabal [44]. The authors proposed the following successive convex programming algorithm: 1. Set i = 0 and P 0 = P n . 2. Solve the outer problem and call the optimizer (x i , t i ). 3. Set Y x = (x i ) ξ and solve the inner problem call the optimizer P i and let P i+1 = P i ∪ {P i } . Fig. 1 The left panel illustrates an empirical measureP 3 in R 2 with support points (ξ 1 , ξ 2 , ξ 3 ). The middle panel illustrates how [44] find the worst case probability measure, namely by changing the probability weights (q 1 , q 2 , q 3 ) of the support points (ξ 1 , ξ 2 , ξ 3 ). The right panel illustrates the approach in [42] as well as [16]: they compute the worst case distribution by changing the support points (ξ 1 while leaving their probability weights unchanged 4. If either P i+1 = P i or the optimal value of the inner problem equals t i , then a saddle point of problem (5) is found and the algorithm stops. Otherwise, set i = i + 1 and go to step 2.
As P n only contains finitely many objects, the outer problem can be solved as an LP. The inner problem however has a more complicated structure. As indicated above, [44] restrict their analysis to probability measures Q which live on the same n points as the empirical measureP n . Hence, in the inner problem one optimizes over the probability weights (q i ) i=1,...,n of the n sample points, which have equal weight under the empirical measurê P n . Still, the inner problem is non-convex due to bilinear terms defining the Wasserstein distance and the AV@R. We refer to section 4.2 of the follow-up paper by Wozabal [58], which proposes a method to approximately solve this program rewriting it in terms of differences of convex (D.C.) functions.
Let us now come to the paper by Pflug, Pichler and Wozabal [42], who explicitly derived the worst case distribution of Q ∈ B κ (P 0 ) for fixed portfolio weight x. In order to adapt the more general results in [42], notice that the maximin problem (5) is equivalent to the following minimax problem where A α is a mixture of AV@R's with different α's, i.e., it has the Choquet representation: We denote by Q the worst case distribution in the Wasserstein-ball B κ (P 0 ) for a fixed Applying the result in [42] to A α , we find that the random vector of the asset returns ξ Q = ξ Q 1 , . . . , ξ Q m under the worst case distribution Q is given by . . , m} : |x j | = ||x|| ∞ . Setting the reference distribution P 0 to the empirical distributionP n , problem (6) can be written as the following linear program s.t.: Notice that a very important implication of the above is that under high model ambiguity (i.e., κ → ∞) portfolio diversification becomes optimal. We refer to [42] for a detailed and more general proof of this result. It should be mentioned that this result is confirmed by Esfahani and Kuhn [16], which we review next.
In the above discussion of [44], we have seen that quite evolved techniques are needed to solve problem (5), when restricting the feasible set to discrete measures with the same number of atoms as P 0 , fixing the location of these atoms and optimizing over their probability. But what happens when one is more ambitious and instead fixes the probability and optimizes over the location of the atoms? Esfahani and Kuhn [16] show that this leads to a tractable reformulation of the problem as an LP. We only restate their result here and refer to the original paper [16] for their derivation of more general results than needed in our context. First, we provide yet another equivalent reformulation of the original problem (5): . Assuming that P 0 =P n and that the support of the true distribution P of ξ is contained in the set := {ξ ∈ R m : Cξ ≤ d} , where C and d are of appropriate size, Esfahani and Kuhn [16] show that the above minimax problem can be written as the following linear program Comparing the three discussed approaches, it is clear that solving the LP (7) takes the least computational effort as the worst case distribution is known a priori. The LP (8) given yields the same solution as (7), while being computationally considerably more expensive. 3 Still the results in Esfahani and Kuhn [16] allow us to solve the problem surprisingly fast given that the worst case distribution is not computed a priori. Lastly, we should mention that the results in [44] cannot keep up. The SCP algorithm is computationally very intense and a large sample size n is required if the algorithm should produce results similar to those of the other two approaches. Here the follow-up paper by Wozabal [58] should be mentioned, which sightly improves the requirement of a large sample size n but does so at the cost of even more computational complexity.
We refer to Section 5, where we present an empirical study based on problem (7) or equivalently (8) and consider the optimal portfolio decomposition when the level κ of model ambiguity increases.

Portfolio Selection under Dependence Uncertainty
In many cases of practical interest, one might be able to determine the distribution of a portfolio's individual assets but fail to estimate their dependence structure accurately. In such a situation only the dependence structure of the asset returns is ambiguous and not their joint distribution. This situation is known as dependence uncertainty. The main motivation to study this topic comes from risk management, as it might lie in the nature of an application at hand, that the dependence structure of different risks cannot be estimated with high accuracy. Still the dependence structure is crucial when it comes to aggregating the individual risks. We refer to Aas and Puccetti [1] for an illustrative example. Since risk measurement is the corner stone of any portfolio selection strategy, the methods and results concerning dependence uncertainty are particularly relevant in our context. As it turns out, computing bounds for aggregated risks with given marginal distributions is a challenging task on its own. Therefore, we provide a short literature review and some examples in the following.

Background
First of all, let us introduce the notion of copulas. The intriguing yet simple idea underlying copulas is that they allow us to split the modeling of multivariate random variables into two parts: Firstly, the modeling of the marginal distributions and secondly, the modeling of the dependence between the univariate random variables. The latter is done via copula functions. Thus, in the presence of dependence uncertainty the first task is straight forward whereas the second task is complicated due to the lack of information. To put it in a nutshell: dependence uncertainty means marginals known and copula unknown.
Formally, let H be an m-dimensional cumulative distribution function with margins F 1 , F 2 , . . . , F m . Then there exists a function C, which is called m-copula such that This basic connection is known as Sklar's Theorem. For a precise definition of copulas and an introduction to the topic, the reader is referred to Nelsen [38]. Let us consider a simple example to understand the essence of dependence uncertainty and its connection to the concept of copulas: What is the Average-Value-at-Risk Given that we have no information regarding the copula linking U and V , the best we can do is compute bounds for AV@R α (U + V ): for any bivariate copula C. Note that these bounds are very far apart. This issue is addressed below. Let us first discuss the upper bound. A nice property of the AV@R is that the copula which yields the upper bound turns out to be the comonotonic copula M. This property holds true in arbitrary dimensions due to the fact that the AV@R is subadditive, see [39]. Intuitively, it is clear that perfect positive dependence implies no diversification benefit when trying to minimize the overall risk of your portfolio. Hence, this dependence should be the least favored by any investor. The AV@R reflects this, whereas other risk functionals, like the Value-at-Risk (V@R), do not. In fact, finding upper bounds for the V@R is much harder. Makarov [33] and Rüschendorf [48] solved this problem in two dimensions, showing that the following copula is the one maximizing the V@R with confidence level α: As can been seen, counter-monotonicity in the upper tail is needed for the construction. This is precisely the reason why the result cannot be generalized to higher dimensions and leads us directly to the discussion of the difficulties that arise when computing lower bounds in dimensions strictly larger than 2. As a matter of fact, the concept of countermonotonicity does not naturally extend to higher dimensions. To give some intuition, perfect negative dependence between two random variables implies that whenever one random variable moves into one direction, the other one moves into the opposite direction. Arguably, considering perfect negative dependence between three random variables is not as simple since it is not clear when three direction stand in maximal contrast. More formally, this is reflected by the fact that the multivariate extension of the lower Fréchet Hoeffding bound W (u 1 , . . . , u n ) = max n i=1 u i − n, 0 ceases to be a copula, see [38]. We refer to Wang and Wang [55] for a discussion of possible ways to define perfect negative dependence in arbitrary dimensions.
In the context of dependence uncertainty, the best and worst case value of the V@R as well as the worst case value of the AV@R are in general unknown. Nevertheless, these bounds have been extensively studied in recent years and partial solutions have been obtained, see Puccetti and Wang [46] for an overview. Most importantly, Puccetti and Rüschendorf [45] introduced the so-called rearrangement algorithm which is a fast procedure to numerically compute the bounds of interest. Under quite restrictive assumptions, analytical bounds can be computed based on the notion of complete mixability, which was introduced in Wang and Wang [54].
In practical applications, these lower and upper bounds cease to be useful as they are too far apart. Thus, the case where additional dependence information is available, received some attention in recent years. The literature often refers to this topic as partial dependence uncertainty in contrast to complete dependence uncertainty, which we discussed above. Bounds for aggregated risks have been computed in the case where (a) the copula is given on a subset of the unit cube, (b) bounds for the copula itself are given or (c) bounds for the variance of the aggregated risks are given. We refer to the survey by Rüschendorf [50] and references therein. Most interestingly, we want to point out the work by Lux and Papapantoleon [32], who among other intriguing results derived V@R-bounds for the case where the copula is known to be close in the sense of a statistical distance to some reference copula.
This short detor to the existing literature on dependence uncertainty should serve as a motivation to model the situation where the marginal distribution of the individual assets is assumed to be known, whereas their dependence structure (i.e., the copula linking them) is only partially known in the context of portfolio selection. The fact, that the combination of portfolio optimization with dependence uncertainty is of practical and theoretical interest, has of course been noted by others before. For instance, Kakouris and Rustem [27] introduce an investment strategy which is robust against possible misspecification of the chosen parametric copula family. On the contrary, the more involved framework by Doan et al. [12] does not need any parametric assumptions. The authors derive AV@R bounds for the Fréchet class of discrete, multivariate and overlapping marginals with finite support. Optimizing these bounds leads to an investment strategy based on a minimum of distributional assumptions.
We, however, take a different and novel approach to the portfolio selection problem under dependence uncertainty, which is very much in the spirit of Section 3: where C denotes the set of all m-dimensional copulas and the ambiguity sets C (C 0 ) := {C ∈ C : d W (C 0 , C) ≤ } refers to the set of m-copulas which are close to some reference copula C 0 with respect to the Wasserstein distance. Since the copula has no impact on the expected value of x ξ , we omit the letter C in the superscript of the first ξ in (10). We use the Wasserstein distance as a distance measure between copulas in order to guarantee that problem (10) is defined similarly to the portfolio selection problem under model ambiguity introduced in Eq. 4. In fact, the only difference is that in (10) we assume that the marginal distributions of the asset returns are known whereas in (4) this assumption is not made. Still the question may arise why the Wasserstein distance is a meaningful distance measure for copulas. The reason lies in the following fact: Suppose that the pairs of random variables (X 1 , X 2 ) resp. (Y 1 , Y 2 ) have the same marginals F 1 and F 2 but differ in their copulas C X and C Y . If h is a Lipschitz function with two arguments and Lipschitz constant L and both quantile functions F −1 1 and F −1 2 are Lipschitz with constant K, then More generally, take a distribution P on R m with non-Lipschitz quantile functions, which are assumed not to grow faster then −x γ near zero and (1 − x) γ near 1, then one may define β-Lipschitz functions h, which satisfy for β < 1. If β < 1/γ , then P → h dP is, for fixed marginals, Lipschitz in the Wasserstein distance of the copula.
Concerning the choice of the risk functional R, we want to point out that the AV@R is aggregation-robust, see Embrechts et al. [14], which implies that the AV@R is less sensitive to model uncertainty at the level of the dependence structure than, for instance, the Valueat-Risk. Hence, the AV@R will be of special interest in our subsequent analysis.

Complete Dependence Uncertainty
Before discussing how to approach problem (10) computationally, we first want to consider the case of complete dependence uncertainty: Assume that the degree of dependence uncertainty, given by the radius of the ambiguity set in (10), converges to infinity. Then the ambiguity set C (C 0 ) extends to the set of all copulas C. Thus, we obtain the problem In the following we show that the above problem can be simplified significantly when we restrict our considerations to (a) subadditive, (b) comonotone additive and (c) positively homogeneous risk functionals R. A risk functional R is said to be (a) subadditive, if for all random variables X and Y , R(X + Y ) ≤ R(X) + R(Y ).
(b) comonotone additive, if for any comonotonic 5 random variables X and Y it holds that R(X + Y ) = R(X) + R(Y ). (c) positively homogeneous, if for all random variables X and all constants c ≥ 0 it holds that R (cX) = cR (X) .
We refer to [43] for a detailed discussion of these and other properties of risk functionals. In the following we use the notation ξ j to denote the j -th component of the m−dimensional (random) vector ξ .

Proposition 1
If the risk functional R is subadditive, comonotone additive and positively homogeneous, then for any j ∈ argmax j E ξ j − λR(−ξ j ).
Proof First notice that for all copulas C. Hence, we have that max C∈C R −x ξ C = R −x ξ M and we obtain for any j ∈ argmax j ∈{1,...,m} E ξ j − λR(−ξ j ). In the last equality we use the fact that X is the m-dimensional unit simplex.
Note that due to the high precision of modern data processors the argmax set will in practice only contain one element. Thus, we find that complete dependence uncertainty implies that concentration of the portfolio into one single asset is optimal. At first, this insight might be somewhat counterintuitive since putting all the capital into a single asset does not seem to be a robust decision. In fact, the decision when facing high model uncertainty on the level of the joint distribution could not differ more from the decision which is optimal under complete dependence uncertainty: the first implies portfolio diversification, the latter portfolio concentration. However, looking at it from the technical side, on the one hand subadditivity rewards portfolio diversification but on the other hand comonotone additivity does not attribute any diversification benefits in case of comonotone losses. Hence, without any knowledge of the dependence between the asset returns, the worst case is that all asset prices rise and fall simultaneously. Basing our decision on this worst case, portfolio diversification cannot be beneficial. Rather, investing into a single asset, namely one of those which performs best, is optimal.
Alternatively, we can approach the result from an information theoretical point of view: a decision is better the more informed it is. The equal weighted portfolio investment strategy requires no information at all. Thus, equally-weighted portfolio diversification is optimal in case of high model ambiguity and suboptimal under complete dependence uncertainty. Portfolio concentration, however, requires full information on the marginal distributions to determine which asset is the best. Therefore, it is not surprising that portfolio concentration is optimal when the model uncertainty lies solely on the level of the dependence structure.
We conclude that subadditivity, comonotone additivity and positively homogeneity are indeed reasonable properties for risk functionals in the portfolio selection problem under dependence uncertainty. Let us therefore take a closer look at risk functionals with these three properties. Spectral risk measures, introduced by Acerbi [2], and distortion risk measures, which developed from research on premium principles by Wang [56], are well-known families of risk functionals with these properties as they are coherent (in the sense of Artzner et al. [3]) and comonotone additive and law invariant, see [20,49]. Arguably, the most relevant representative of these families is the AV@R, as it can be seen as the basic building block for spectral risk measures, see Kusuoka [30].
Nevertheless, well known risk functionals violate at least one of the mentioned properties: the variance (which has neither of these properties), the standard deviation (which is not comonotone additive), the Value-at-Risk (which is not subadditive) and expectiles (which are in general not comonotone additive), see Emmer et al. [15]. The questions arises what happens if one of the latter risk functionals is chosen in the portfolio selection problem (11) under complete dependence uncertainty. The following example addresses this question.
Example 1 Let some portfolio be composed of two assets. We represent the two assets by random variables U and W , where W = 2V 2 and U and V are both Uniform[0,1]. Thus, the second asset promises a higher return at a higher risk. For 0 ≤ x ≤ 1, the portfolio with weights 1 − x resp. x has return where C denotes the copula linking U and V . Assuming the copula C is unknown leads to the following portfolio selection problem: We analyze four different risk measures R: i) the variance (Var), ii) the standard deviation (Std), iii) the Average-Value-at-Risk (AV@R) and iv) the Value-at-Risk (V@R). In case i) and ii), is maximal. According to Theorem 5.25 in [36], this means that U and V 2 are comonotone, i.e., the worst case copula is given by the comonotonic copula M and we can set V = U . The same statement holds in case iii) where max C∈C AV@R(Y C x ) = AV@R(Y M x ). In case iv), it was mentioned above that where the worst case copula C α is given in Eq. 9. Note that C α depends on the confidence level α but not on the portfolio weight x. Problem (12) can now be solved explicitly for the four different choices of R: The optimal portfolio decomposition for the four different risk functionals is illustrated in Fig. 2 as a function of λ: For low risk aversion λ the portfolio is concentrated in second asset which promises a higher return, whereas in case of high risk aversion, the portfolio is concentrated in the less risky first asset. As indicated by Proposition 1, when using the AV@R as a risk measure, the optimal portfolio is concentrated for all λ (see the second panel from the bottom in Fig. 2). The panel at the bottom of Fig. 2 shows that the same also holds true when using the V@R. In case i) and ii) we observe that the portfolio is composed for certain values of risk aversion. For instance, in case ii) λ = √ 695/49 ≈ 0.54 implies that portfolio diversification into half/half is optimal.
This example shows us that independently of the chosen risk functionals, optimal portfolios under complete dependence uncertainty are typically concentrated. Only for specific values of the risk aversion coefficient λ it might happen that the portfolio is composed. For subadditive, comonotone additive and positively homogeneous risk functionals, composed portfolios can even be ruled out a priori, which constitutes another advantage of this class of risk measures. In fact, we can go even further and conjecture that for any risk functional the optimal portfolio under complete dependence uncertainty is never composed of more than two assets.
Summing up, portfolio concentration is a robust decision when facing complete dependence uncertainty and the choice of subadditive, comonotone additive and positively homogeneous risk functionals is particularly convenient in this context.

Partial Dependence Uncertainty
Let us now discuss problem (10) for > 0. In particular, we assume that a reference copula C 0 is given and we consider an ambiguity set of copulas which are close to this reference copula. In the spirit of Section 3, we avoid all kinds of parametric assumptions and choose the empirical copulaĈ n to be the reference copula C 0 . The empirical m-copulaĈ n generated by n points is defined aŝ with u 1 , . . . , u m ∈ [0, 1] and U i,j = nF j (ξ i,j )/(n + 1) for all i = 1, . . . , n and j = 1, . . . , m, whereF j denotes the empirical marginal distribution function of asset j , the matrix (ξ i,j ) i=1,...,n;j =1,...m contains n observations of the m asset returns and the scaling Fig. 2 The optimal portfolio decomposition of the portfolio selection problem under complete dependence uncertainty, which is given in Eq. 12, is shown for four different risk functionals R as a function of the risk aversion λ ∈ [0, 1] and for fixed α = 0.95. For a fixed λ, the bright area corresponds to the fraction x which should be invested in the second asset whereas the dark area gives the optimal weight 1 − x of the first asset factor n/(n + 1) is only introduced to avoid potential problems at the boundary of [0, 1] m , see Genest et al. [23]. Various conditions under whichĈ n is a consistent estimator of the true underlying copula are given in Fermanian et al. [18]. In the formulation of problem (10) the ambiguity set C (C 0 ) contains infinitely many objects for all > 0. This reminds us of the more general, yet similar problem (5) and the ambiguity set B κ (P 0 ). As shown in [16,42] and discussed in Section 3, when setting P 0 =P n , the worst case distribution also lives on n points. Thus, we conjecture that when setting C 0 =Ĉ n , the worst case copula also lives on n points. We therefore restrict our considerations to the smaller set C ,n (Ĉ n ) := C ∈ C n : d W (C,Ĉ n ) ≤ of all empirical m-copulas resulting from n observations which are close toĈ n . This restriction as well as choosing the AV@R as a risk functional, leads to the following problem: In order to solve this problem, we have to understand the objects in the set C ,n (Ĉ n ). From definition (13) we know that any empirical copula C ∈ C n is determined by its support points U i,j = nF j (ξ i,j )/(n + 1), where i = 1, . . . , n and j = 1, . . . , m. The construction in (13) guarantees that the margins of the distribution induced by (U i,j ) are uniformly distributed on [0, 1]. Figure 3 shows an example of points U i,j i=1,...,n;j =1,2 generating  Fig. 3 The left panel shows a scatter plot of points (U i,j ) i=1,. . . ,10,j=1,2 generating the empirical copulaĈ n with n = 10 plotted in the right panel an empirical bivariate copula. This graph illustrates the so-called chess-tower property, which is generally satisfied by all points U i,j i=1,...,n;j =1,...,m generating an empirical copula C ∈ C n : For fixed j ∈ {1, . . . , m} and fixed k ∈ {1, . . . , n} there exists a unique i ∈ {1, . . . , n} such that U i,j = k/(n + 1), since U i,j = nF j (ξ i,j )/(n + 1) = R i,j /(n + 1) where (R i,j ) denotes the ranks corresponding the the observations (ξ i,j ). This implies that there are (m − 1)!n! elements in the set C n .
Hence, solving problem (14) for fixed portfolio weights x amounts to a combinatorial problem. Solving it by enumeration requires to compute the Wasserstein distance d W (C,Ĉ n ) for all (m − 1)!n! empirical copulas C in C n . Even for fixed portfolio weights x, this task is computationally infeasible for reasonable values of m and n. However, we can propose a heuristic, which allows us to rapidly compute a lower bound C L ∈ C ,n (Ĉ n ) and an upper bound C U ∈ C n \ C ,n (Ĉ n ) such that (15) for all x ∈ X. An illustration of the "true" worst case copula and the corresponding bounds can be seen in Fig. 8 in the appendix. Let us sketch the main idea behind the proposed heuristic: First, note that the discrete counterpart M n of the comonotonic copula M is defined by the points U i,j = i/(n + 1) for all i and j , i.e., M n lives on the main diagonal of the unit cube. Therefore, our heuristic iteratively transports points to the main diagonal of the unit cube starting with the smallest points in the lower tail of the copula until the bound of the maximal feasible Wasserstein distance is exceeded. The last feasible discrete copula is set to be C L and the first infeasible copula defines C U . Appendix A describes this heuristic for the case m = 2. The procedure extends naturally to higher dimensions m. We omit a further discussion here.

Empirical Study: Concentration Versus Diversification
In the example studied in this section, the asset universe consists of the following six indices: S&P 500, TOPIX, FTSE China B35, EURO STOXX 50, FTSE 100 and NIFTY 500. The   Fig. 4 The graph on the left shows six equity indices in 2016, which have been normalized to 1 on January 1, 2016. The table on the right shows the expected value as well as the 95%-AV@R of the corresponding daily losses markets corresponding to these indices are USA, Japan, China, Eurozone, UK and India. We look at daily returns from the year 2016 (i.e., January 1, 2016 to December 31, 2016) computed from the prices in USD 7 of the indices which we normalized to one at the initial time. Figure 4 shows these normalized prices and summarizes the expected value and the AV@R of the corresponding returns. The data ξ i,j i=1,...,n;j =1,...,m , consisting of n = 260 observations of daily returns of m = 6 assets, is retrieved from the Thomson Reuters datastream licensed at the University of Vienna.
The aim of this empirical study is to graphically compare the two distinct approaches in Sections 3 and 4: Firstly, we solve the portfolio selection problem under model ambiguity as given in (5) relying on the methods of Pflug et el. [42] and Esfahani anf Kuhn [16]. Thus, we solve the LP (7) and (8) given ξ, α and λ for different values of κ. It should be mentioned once more that the two LPs yield indeed the same solution. Studying the optimal portfolio decomposition given by the portfolio weights x as the level of model ambiguity κ increases, we expect to find that portfolio diversification becomes optimal. Indeed, this is confirmed by the right plot in Fig. 5, in which we set α = 0.95, λ = 10 and ξ as discussed in the previous paragraph and plot the optimal portfolio decomposition x for different values of κ. For small values of κ, which means that the investor trusts in the empirical distribution function computed from the observed returns, the right plot in Fig. 5 indicates that investing more than half of the capital into the S&P 500 is optimal. This fraction decreases to 1/6 as the level κ of model ambiguity increases and stands in contrast to the left plot of Fig. 5, which illustrates the optimal portfolio decomposition under dependence uncertainty. Here, we assume that the empirical marginal distribution functions indeed provide an accurate model for the marginal returns, whereas the dependence structure is ambiguous and only known to be within an −neighborhood of the empirical copula computed from the observations. In other words, we consider the maximin problem (14). If is close to zero, we are in the same situation as discussed before when κ ≈ 0. For sufficiently large however, the worst case dependence structure is comonotonicity and portfolio concentration becomes optimal, as proven in Section 4.2. In our example we find that, as increases, more In the graph on the right, the optimal portfolio decomposition is shown as a function of the level κ of model ambiguity. The graph on the left shows an approximation to the optimal portfolio decomposition as the degree of dependence uncertainty increases from right to left and more capital should be invested into the S&P 500, which is in our example the best performing asset. It should be mentioned that we have used the heuristic algorithm presented in Section 4.3 and Appendix A to generate the left graph in Fig. 5. In fact, this graph displays the portfolio weights x which are optimal with respect to the feasible lower bound C L in Eq. 15. The lower bound C L corresponds to the upper bound shown in the left graph of Fig. 6, which plots the value of the objective function in Eq. 14 as a function of . The absolute difference between the upper and the lower bound of the objective function's value is 0.0051 ± 0.0037 when averaged over 20 different values of ∈ (10 −4 , 10 −1 ). Hence, we conclude that the results of the proposed heuristic algorithm need to treated carefully when a fixed degree of dependence uncertainty is considered. Nevertheless, the heuristic allows us to determine the structure of the optimal portfolio decomposition as the degree of dependence uncertainty increases.  Figure 5 sums up the discussion of model ambiguity and dependence uncertainty in this paper. In the center of the Figure, we can see the optimal portfolio decomposition when we assume that the returns follow a known distribution P 0 . Aiming at an investment strategy which is robust with respect to this distributional assumption, we can go into two different directions. On the one hand, we can assume that the true distribution lies in a κ−neighborhood of the distribution P 0 and optimize the portfolio with respect to the worst case distribution within the κ−neighborhood. As shown in the right plot of Fig. 5, the higher the level of model ambiguity κ gets, the more diversified the portfolio becomes. On the other hand, we might only consider ambiguity with respect to the dependence structure of the distribution P 0 while fixing the marginal distributions. This leads to portfolio optimization under dependence uncertainty and is shown in the left plot of Fig. 5: The higher the degree of dependence uncertainty gets, the more concentrated the optimal portfolio becomes. Although the set-up of these two approaches is very similar as all parametric assumptions are avoided and the definition of the ambiguity set relies in both cases on the Wasserstein distance, the implication could not be more dissimilar.  Fig. 8 The support points of the initial copulaĈ n from Fig. 3 are compared with those of the modified copulas C L and C U , which are computed by Algorithm 1, and to those of the "true" worst case copula C for fixed = 0.003 such that P (K) ≥ 1 − 2 for all P ∈ P. Since P 0 is Borel, there is a compact set K such that P 0 (K ) ≥ 1 − . Let
Let K = x : d(x, K ) ≤ /η . Then K is compact and for all P ∈ P.
Remark 1 With the same argument one sees that closed Wasserstein balls of copulas are weakly compact, since the set of all probability measures on [0, 1] n with uniform marginals is weakly closed and the intersection of a weakly closed and a weakly compact set is weakly compact.