Robust Portfolio Optimization with Respect to Spectral Risk Measures Under Correlation Uncertainty

This paper proposes a distributionally robust multi-period portfolio model with ambiguity on asset correlations with fixed individual asset return mean and variance. The correlation matrix bounds can be quantified via corresponding confidence intervals based on historical data. We employ a general class of coherent risk measures namely the spectral risk measure, which includes the popular measure conditional value-at-risk (CVaR) as a particular case, as our objective function. Specific choices of spectral risk measure permit flexibility for capturing risk preferences of different investors. A semi-analytical solution is derived for our model. The prominent stochastic dual dynamic programming (SDDP) algorithm adapted with intricate modifications is developed as a numerical method under the discrete distribution setting. In particular, our new formulation accounts for the unknown worst-case distribution in each iteration. We verify the convergence property of this algorithm under the setting of finite scenarios. Our results show that the optimal solution favours a certain degree of anti-diversification due to dependence ambiguity and exhibits its protection ability during the financial crisis period.

more sophisticated models are established, including models that incorporate various practical constraints. For instance, constraints on transaction costs are imposed to capture the costs in buying and selling the assets and constraints on maximum holding proportion for each asset to cover the investor perference on risk diversification. A summary of portfolio optimization models can be found in [6], for example. The role of numerical optimization methods to solve these models has also been an active research area. Instead of considering single-period models, a more realistic class of multi-period models are studied as they take the complete investment horizon into consideration with the possibility of adjusting one's investment position at certain times regularly. Under some specific settings, the multi-period problem can be solved analytically (see, for example, [31] and [18]).
Risk measures and related theories have undergone rapid development in recent years as attempts to incorporate the effect due to risk aversion on investment decision. This consideration of risk measures has become more crucial after the 2008 financial crisis, after which investors have developed preference for robust portfolios. In addition to the prevalent risk measure conditional value-at-risk (CVaR) discussed in [52], which has been widely adopted in financial sector as required in Basel Accord III, a more general class of risk measures, namely, the spectral risk measure, is proposed in [1] and further discussed in [11] and [7] amongst others. This class includes some novel risk measures in the literature such as second-order superquantile [51] and Gini-shortfall [17]. Readers may also refer to [46] and [55] for relevant discussion on such risk measures. In this paper, we focus on this general class of spectral risk measures under which many different well-known risk measures are covered.
While the mean-variance model is convenient, variance is a symmetric measure of volatility and it does not provide effective quantification for downside risk. In modern portfolio theory, rational and prudent investors are cautious about downside losses at the cost of sacrificing partially the upward potentials of the associated investments. The use of alternative risk measures for portfolio models has become increasingly attractive. An extensive amount of research effort has been devoted to construct implementable risk-averse portfolio models. Fundamental theories for mean-risk model are developed in [53] and the context under portfolio optimization is discussed in [42]. Consideration of high-order conditional risk measures based upon the scenario generation method is recently investigated in [24] followed by [2] which extends the scope to the use of entropic value-at-risk.
With the growing concerns for reliability of parameter estimation, distributionally robust portfolio models are developed to protect investors from suffering tremendous losses due to misspecification of model parameters (see [15]). Two types of uncertainty set construction prevail in the literature, namely distance-type uncertainty and moment-type uncertainty. For the former, it can be constructed using the empirical distribution as the reference probability measure through specific metrics; a thorough discussion using Wasserstein distance can be found in [5]. For the latter, moment information from the observed data is captured. In this paper, we consider a multi-period problem with the objective function being the worst-case spectral risk measure, characterized by moment-type uncertainty. To be more specific, the moment uncertainty set is quantified by the estimated individual asset's mean and variance while the correlation is quantified by a specific interval, say for instance, a relevant confidence interval so as to capture the model uncertainty. Unlike most of the papers on moment uncertainty set that focus on an ellipsoidal set on the first two moments (see, for example, [9]), the construction is relatively straight-forward. Ellipsoidal uncertainty set requires the choice of the scale and shape parameters that are hard to interpret compared to the use of confidence intervals. A recent discussion on a deliberate construction appears in [38], which relies on estimates provided by experts. Moreover, the existing ellipsoidal set cannot capture the correlation uncertainty between assets. A similar multi-period portfolio problem with ellipsoidal uncertainty sets is investigated in [35] while the single-period of the problem using CVaR objective is covered in [38]. In this line of research, related works also include [30,33,34] and [36], amongst others.
The motivation of correlation ambiguity emerges from the concern about accuracy in estimating correlation coefficients. It is inevitable that correlation affects investment decisions tremendously. It is well known that obtaining accurate estimation of highdimensional correlation matrix is especially challenging. Also, through experiments, [14] observed a lack of confidence in decision making under correlation uncertainty. It implies that aversion towards correlation uncertainty inheres in the portfolio selection process without which the resulting strategies tend to be mistakenly optimistic. Thus, we are interested in the correlation ambiguity in portfolio models. Fouque et al. [16] studied the situation with two risky assets and one risk-free asset in continuous-time economy while the multiple-asset case in mean-variance framework is studied in [28]. [25] considered a single-period setting with normally distributed payoffs.
Multi-stage optimization is typically challenging. In the discrete setting, it is obvious that the number of scenarios grows exponentially as time proceeds. That is, if we have N scenarios for each period, we would have N T scenarios over these T periods, which is computationally intractable in general. Under the stagewise independence assumption of the randomness, one can circumvent the problem with the help of stochastic dual dynamic programming (SDDP) method, pioneered in [43]. The method is summarized and analyzed in [54] for the linear case and the risk averse version is described in [56]. Recently, [23] proposed a variant for linear stochastic programs with optimality cut selection steps to enhance computational efficiency. For robust stochastic linear programs, [20] developed a robust version of SDDP, which involves both upper and lower linear approximations of the recourse function. Timedependent variants could be found in [37] and [4]. The convergence proof of the method for risk neutral and risk averse case in general convex setting appear in [21] and [22], respectively.
In this paper, we consider a multi-period distributionally robust portfolio optimization problem with the spectral risk measure objective under correlation ambiguity. The use of spectral risk measure permits flexibility on preferences from different investors. Our numerical method under discrete setting is considered with historical scenarios. In this area of active research, [26] investigated the risk neutral case with l ∞ uncertainty set while [47] studied the one with l 2 uncertainty set. Duque et al. [12] discussed the use of Wasserstein distance ambiguity set in linear stochastic programs with convergence analysis. To the best of our knowledge, multi-period portfolio optimization problem with moment-type uncertainty set has not been investigated in the literature. Moment information crucial to investors, such as the mean and variance of random returns, can be captured explicitly while it is difficult to capture correlation ambiguity via distance-type uncertainty set formulations.
The contribution of this paper is three-fold. First, we obtain a semi-analytical solution for the problem, which exploits the explicit formula of worst-case risk measures presented in [32]. Different from [35], our model works with general spectral risk measures and we include the no short-selling constraint in our model. This is a realistic assumption since short selling is not always allowed. Moreover, the worst-case distribution cannot be obtained analytically as in their formulation. Despite all the complexities, our solution can still be computed efficiently via interior-point methods. Second, to incorporate the support of return vectors, we apply the scenario-based approach to our problem and propose a variant of SDDP to solve, which considers the moment-type uncertainty set in the form of correlation ambiguity. This numerical method can incorporate lower bounds for the random return vectors via the generated scenarios. The use of SDDP in distributionally robust optimization using l 2 distance appeared recently in [47] and has been investigated in other contexts. We provide the convergence proof of our proposed algorithm, which extends the previous works [21] and [22] to the distributionally robust context. In the numerical experiment, our formulation performs better than the l 2 formulation in the 2008 crisis during which the market experienced structural changes. Our model with correlation uncertainty set can provide a more secured decision if the correlation structure of the assets changes significantly. Third, we observe that the use approximation of spectral function introduced in [57] can be potentially useful under the distributionally robust situation in the sense that the computational time per iteration would reduce with a reasonable marginal error in optimal value. It is noteworthy that although formulation is developed under a portfolio optimization framework, the algorithm and the theoretical results can be readily extended to a more general setting with linear constraints, which can be readily modified to cover applications such as operation planning problem in [55] and inventory problem in [3].
The remainder of the paper is organized as follows: Sect. 2 introduces the problem, including the notion of spectral risk measures and construction of uncertainty set. A semi-analytic solution is derived in Sect. 3. Section 4 discusses the method to obtain worst-case distribution under uncertainty based on discrete setting, which would be used in the numerical SDDP method. The convergence of the method would be verified in Sect. 5. Some computational results are demonstrated in Sect. 6 and the paper is concluded in Sect. 7. All the proofs are provided in the Supplementary material.

Multi-stage Portfolio Optimization Problem Formulation
We consider a multi-period problem with finite scenarios and T stages. Assume that there are n risky assets under investment consideration with individual mean μ i and variance σ 2 i for i = 1, . . . , n. The sample space at time t is given by Ω t = R n for t ∈ T = {1, . . . , T }. Stagewise independence of return vectors is assumed. Denote {F t } as a filtration accumulated up to time t ∈ T and r t as the random price ratio vector at time t (i.e., asset price at time t divided by that at time t − 1). For simplicity, we also call r t as the return vector as in the literature. Let W t be the wealth at time t

Wealth process
Decisions Fig. 1 Illustration of Portfolio Horizon and the initial investment amount W 0 be, without loss of generality, standardized as 1.
The multi-stage portfolio optimization problem considered is formulated as follows: where x t is the investment decision at time t, P t is the uncertainty set, a subset of probability measures on (Ω t , F t ), and ρ t is a chosen coherent risk mapping conditioned on F t−1 . The first and second constraints describe the investment process: the wealth obtained from the previous period is invested in the next period. The third constraint restricts that short selling is not allowed. Figure 1 illustrates the investment process.

Remark 1
In problem (1), we use a recursive (i.e., nested) risk measure as our objective. This facilitates the use of dynamic programming to solve the problem. Note that if a policy satisfies the associated dynamic equations in (6)-(7), then such a policy is also time consistent (see [49] and [55] for relevant discussions).
Different risk measures are proposed in the literature to capture the risks involved. Spectral risk measures is a class of risk measures which covers many of the commonly used risk measures. For a random variable Z , we denote F Z as its distribution function.

Definition 1
Let φ : [0, 1) → R be a non-negative and non-decreasing function satisfying 1 0 φ( p)dp = 1. The spectral risk measure based on the spectral function φ is defined as where F −1 Z ( p) = inf{t ∈ R|F Z (t) ≥ p} is the left-continuous quantile function.

Remark 2
The choice of the space of random variables such that the spectral risk measure is finite is a complicated issue, which is discussed in [48]. In this paper, we assume that φ 2 is finite (see Theorem 1). Together with the second moment finiteness of random return vector under our construction of uncertainty set, the finiteness is always guaranteed. Also, this is guaranteed for the finite number of scenarios considered in the numerical method.
To see that this general class of risk measures covers many existing choices, consider φ( p) = (1 − α) −1 1( p ∈ [α, 1)). The measure specified in (2) gives the conditional value-at-risk of level α (CVaR α ), where 1(A) is the indicator funcion taking value one on A and zero otherwise. Another equivalent formulation for CVaR α is given as It is well-known that spectral risk measures are coherent risk measures, which satisfy convexity, monotonicity, translation equivariance and positive homogeneity axioms. The uncertainty sets P t are constructed using moment information which are fixed at time 0. We assume that the first two individual moments are known with partial information being used for correlation. To avoid notational ambiguity, we remove the dependence on t and consider the random vector ξ = (ξ 1 , . . . , ξ n ). Then, the uncertainty set is defined as where D is the set of all probability distributions and C π is a symmetric (positive semidefinite) correlation matrix. The two-sided bounds for C π are interpreted elementwisely, which capture the correlation ambiguity. The diagonal entries of C and C are, by default, set to be 1. It is easy to observe that the set P is convex (see Supplementary material for a proof). Inspired by [35], due to the recursive nature of the defined risk measure and the monotonicity analyzed in [59], we can rewrite (1) in terms of dynamic programming equations. The equivalence of the multi-stage problem and the dynamic programming formulation is detailed in [49]. First, define the constraint sets With the stagewise independence assumption, the dynamic equations are established recursively as

Semi-analytical Solution
In this section, we derive a semi-analytical solution for problem (1) with uncertainty set (4). An explicit solution is difficult to obtain due to the constraint x t ≥ 0. The problem without such a constraint and with a different construction of the uncertainty set has been studied by [35]. Their uncertainty set formulation as some ellipsoidal set is not compatible with the one we consider (see Remark 3). Our formulation explicitly addresses correlation ambiguity, which naturally emerges in decision making processes. Also, our model incorporates the practical no short-selling constraints. These complicate the analysis but the solution obtained can be efficiently computed.
To obtain the solution, we first derive the worst-case spectral risk measure formula.
It is easy to observe that C is convex, bounded and closed (with respect to norm topology in R n 2 ). Adopting the above notation, for any vector a ∈ R n , we have We can then define v(a) = min C∈C Var(a ξ ) and v(a) = max C∈C Var(a ξ ), where the existence is guaranteed by the continuity of the variance function in C and the compactness of C.
Theorem 1 Let a ∈ R n be a fixed vector. Then, the worst-case spectral risk measure defined in (2) using spectral function φ with finite φ 2 2 can be written as 3 We can compare the use of ellipsoidal sets in the literature and our choice of uncertainty set in terms of worst-case distribution. We assume that the mean is known and Σ 0 is some reference covariance matrix. Consider [9,35]. The worst-case distribution over ρ φ (−a ξ ) will have a covariance matrix kΣ 0 . From this, we can see that the correlation structure is indeed fixed by the input covariance matrix Σ 0 . In [38], the ellipsoidal uncertainty set is used as where S is the number of scenarios. The covariance matrix of the worst-case distribution is given by (1 + δ √ 2/(S − 1))Σ 0 . This also shows that the correlation matrix remains the same. Neither P 1 nor P 2 can address our concern over correlation uncertainty. Moreover, the choice of the parameter in the uncertainty set, i.e. k and δ, may not have a direct interpretation while the use of confidence interval can be understood easily, especially for portfolio managers.
With the above results, we can make use of the dynamic programming equations to obtain the expressions for the solution. We first give a lemma, which would be useful throughout the derivation of the semi-analytical solution. It concerns the following minimization problem for any κ > 0.
Lemma 1 Assume that the problem P(1), namely P(γ ) with γ = 1, has an optimal solution x * . Then, γ x * is an optimal solution to the problem P(γ ) for any γ > 0. Moreover, the dual solution λ * associated to the constraint x ≥ 0 is the same for any γ > 0.
We then state the main theorem of this section, which gives an expression for the solution under uncertainty set of the form P. For simplicity, all the uncertainty sets P t in (1) are assumed to be the same so that we can omit the subscript t. Otherwise, the parameters μ, σ i and C in the uncertainty set P t depend on t, which are specified at time 0, and our formulation is still implementable. For a correlation matrix C, define . It is possible that the constant A t (C, μ) is not real as the expression inside the square root can be negative. In this case, the value of A * t (μ) is set to be positive infinity. Note that we incorporate the nonnegativity constraint on x t in our derivation since short selling of some assets may be restricted. Instead of having a fully analytical result for each problem in the dynamic programming equations (see, for example, Corollary 2.2 of [10] and Theorem 2 of [35]), our formula includes the Lagrangian dual variables. The solution also applies to the problem without no short-selling constraint by setting all λ t = 0 which gives a simpler formula. Technical details for the proof are provided, followed by the theorem. (1) with uncertainty set of the form P defined in (4). Let λ * t be the optimal dual variable associated to the constraint x t ≥ 0 in (6). Assume that for t = 0, . . . , T − 1,

Theorem 2 Consider problem
Remark 4 Assumption (a) secures the recurrent use of minimax theorem and subsequent arguments. Indeed, if assumption (a) is violated for some C ∈ C, thenÂ * t would be negative. For instance, the function in the proof, would not be convex in x and convave in C due to the negative constant. Hence, the minimax theorem fails to apply and the arguments could not follow. For assumption (b), the violation of it would lead to infeasibility of the infimum problem and hence the optimal value could be considered as negative infinity. These two assumptions can be checked via the optimal value of (9) (see Remark 5 and the discussion followed after it).

Remark 5
We can also make the following observations based on Theorem 2. First, Lemma 1 ensures thatÂ * t involving λ * t−1 could be treated as constant so that it can be taken out from supremum and infimum operators. Second, we notice that if the uncertainty sets are the same independent of t, the optimal trading strategy would be a static strategy, meaning that investors only invest in the beginning and do not adjust the portfolio throughout the period.
The solution to the corresponding single-period problem can be reformulated as a semidefinite programming (SDP) problem discussed in [38]. From Theorem 2, we can obtain the optimal solution and value by solving the reduced single-period problem as: The solution of this optimization problem can be computed efficiently using interiorpoint methods.

The Numerical Algorithm Under Practical Setting
The solution we obtain from the previous section is based on Ω t = R n . Although it is common in the literature (see, for example, [35] and [38]), this may not be realistic for the following reasons. First, the price ratio vector r t has a natural lower bound of 0. It is natural to impose the constraint of r t ≥ 0 almost surely. In particular, r t = 0 corresponds to a total loss for the investment. Second, it is natural to restrict the support of the return vector r t on a compact domain. For instance, if r t represents the daily return, it would be realistic that r t would not deviate a lot from 1. Thus, with a more confined support, this may avoid over-conservative investment strategy. However, in the presence of the support information, tractable reformulation of the worst-case risk measure may not be possible. For instance, in [13], they only provide an upper bound for a single-period portfolio optimization problem. In particular, for moment-based ambiguity set, the reformulation of the worst-case expectation typically results in a semi-infinite program and may require specialized numerical algorithm such as cutting-plane methods to solve (see, for example, [9,19,39] and [41]).
A common way to handle the situation that we do not have a tractable reformulation, in distributionally robust optimization literature, is through discrete scenarios and this has been successfully employed in various applications (see, for example, [8,27,44,47] and [58]). Our work is the first to address the multi-period portfolio optimization under correlation uncertainty with scenario-based approach and we propose a variant of SDDP to solve the problem. Specifically, we assume that Ω t = {ω 1 , . . . , ω N t } and consider the uncertainty set of the form where Σ = σ Cσ , Σ = σ Cσ and r i correspond to the return scenarios for i ∈ {1, . . . , N }.
Although it may not be realistic to consider Ω t = R n , the semi-analytical solution derived, which could be obtained efficiently, serves as a valid upper bound when we have support information. It also provides us a way to verify our numerical method proposed. In the following two sections, we discuss and analyze a numerical method that can solve the problem under P d . In Sect. 4.1, we first discuss our methodology to obtain the worst-case distribution, which will be useful in a part of the proposed numerical method in Sect. 4.2.

Worst-Case Distribution in a Discrete Setting
The numerical method we introduced is based on a finite number of scenarios and therefore, we are interested in spectral risk measures under discrete distributions; see [57] for instance. Consider a probability space (Ω, F, P) and a random variable Z . Suppose that Z takes finite realizations Z 1 , . . . , . . Fig. 2 Cumulative distribution function and quantile function for discrete distributions respectively. Without loss of generality, assume that for some index i, we can combine them and consider N − 1 losses only. As mentioned in Remark 2, ρ φ (Z ) is finite. We set F −1 Figure 2 shows the corresponding cumulative distribution function and quantile function.
Define the points y 0 = 0 and y i = i j=1 p j for i = 1, . . . , N . As the quantile function is piecewise constant, it is possible to write the spectral risk measure defined in (2) as Our next task is to find a distribution p = ( p 1 , . . . , p N ) such that the function ρ φ is maximized. The following lemma shows that the objective function is concave and hence, any convex optimization solver can be deployed to obtain the worst-case distribution. Then, ψ( p) is concave.
Lemma 2 shows that we can employ conventional convex optimization solvers to obtain the global maximizer. However, it may be costly if the number of scenarios N is large. For a special case of spectral risk measure CVaR, the convex problem can be reduced to a linear programming (LP) problem. Notice that . . , N is a convex compact set. To determine the worst-case distribution, we need to maximize p under the moment constraints as well, which can be cast into following maximization problem.
where we remark that moment constraints on p are linear. To reformulate it into an LP problem, let The computational cost can be significantly reduced by transforming the general convex program into an LP problem. The mixed-CVaR measure defined as For a more general spectral function φ, we may want to approximate the spectral function by step functions so that the resulting risk measure becomes the mixed-CVaR measure. The l 2 approximation is discussed and applied in [57]. We will apply the approximation and examine its performance in this paper on portfolio problem.

Remark 6
It is worthwhile to notice that in Step 3 of the algorithm proposed (Algorithm 4) in [57], the authors suggested using a simple sorting to maintain the order of newly generated iteration points. However, this involves a delicate choice of step sizes. For gradient projection method, it can be modified by replacing the sorting component by a quadratic programming problem. The choice of step sizes would impose less influence to the optimization as long as a sufficiently large number of iterations are permitted. The details of the approximation and one particular example is provided in the Supplementary material (Section B) due to space limitations.

The Numerical SDDP Method
In this section, we present a variant of SDDP to solve (1) under finite scenarios. It should be noted that the algorithm proposed is not restricted to this particular problem. For instance, the method works for inventory problem or unit commitment problem. For the return random variables r t , t = 1, . . . , T , we assume that they take N t realizations. Without loss of generality and for notational simplicity, we assume that N t = N is fixed. Denote r i t , i = 1, . . . , N be a particular realization of r t , which can be represented using a scenario tree with a total number of N T scenarios. The SDDP algorithm consists of two crucial operations, namely the forward pass and the backward pass. The two operations would repeat alternatively until certain termination criteria are fulfilled. In the forward pass, a trial solution is obtained, which will be used later in the backward pass. For the first iteration, this refers to the search of any feasible solutions. For instance, one could start with the equally-weighted portfolio or the semi-analytical solution. Assume that we have an approximation Q(x t−1 ) of Q t (x t−1 ), which is obtained in the backward pass. We then sample uniformly from all possible scenarios for a path (r 1 , r 2 , . . . , r T ). To obtain a trial point, for t = 0, . . . , T − 1, we solve forwardly in time for x t via the problem of wherex t−1 is the solution of the problem at t − 1 and we let r 0x −1 = W 0 . Indeed, the approximation functions constructed are piecewise linear taking the form Q t+1 (x t ) = max l∈L {α l t+1 + (β l t+1 ) x t }, for some index set L. Then, (11) can be recast as A trial point is thus, generated and denoted as {x 1 ,x 2 , . . . ,x T −1 }.
In the backward pass, we obtain linear approximations of the functions Q t (x t−1 ) by proceeding backwardly in time, from t = T to t = 0. The idea is the use of subgradient inequality at the trial points. Givenx t−1 , we solve, for i = 1, . . . , N , and denote the optimal value as Q i t . Notice that Q i T = −(r i T ) x T −1 . Similar to the case in forward pass, for t = T − 1, . . . , 1, we can recast (13) as Denote (π i t , μ i t ) be the corresponding optimal dual solutions to the equality and inequality constraints respectively. More precisely, for t = T − 1, . . . , 1, and let π i T = 1 for i = 1, . . . , N . Next, we need to obtain the worst-case distribution based on the losses { Q i t } as discussed in previous section. Denote the optimal weight The coefficients for the linear approximation function are computed as The optimal value of the first stage problem (14) with t = 0 is denoted as z k , where k is the iteration number. We would verify later in Lemma 6 in Sect. 5 that z k is an improving lower bound for the problem as k increases. Concerning the termination criterion of the algorithm, it is well known that maintaining an upper bound for risk averse problems in each iteration is difficult (see, for instance, [56]). Hence, we resort to the maximum iteration number and stabilization of the lower bound. To be precise, we fix K as the maximum number of iterations and as the lower bound tolerance. The complete algorithm is summarized in Algorithm 1. Compute α k t and β k t using (16).

10
Update Update the lower bound z k by solving (14) with t = 0. Set k = k + 1.

Convergence Property
In this section, we verify the convergence property of the algorithm. The analysis is established mainly based on the results of [21] and [22]. One should notice that in [21], their contribution is establishing the convergence of multistage risk neutral programs while [22] extends this to the risk averse setting. However, in [22], the proof is based on the direct use of dual representation of coherent risk measure, which implies the existence of a reference probability while for the current problem, such a reference probability does not exist. Therefore, some modification is necessary in order to justify the convergence of the algorithm.

Recourse Function and Subgradient
To guarantee the convergence of the algorithm, we utilize some fundamental properties of the recourse function Q t (x t−1 ) defined in (7). Recall that we have N scenarios in each future time point and they are positive in value. It follows that for any feasible policy {x 0 , . . . , x t−1 } and realization of {r 1 , . . . , r t }, the feasibility set X t (x t−1 , r t ) is nonempty, convex and compact. We suppress the dependence of ρ on φ for simplicity. Next, define, for t = 1, . . . , T , c t = max i=1, ..., N r i t ∞ and C t = t j=1 c j with c 0 = C 0 = 1. Under finite scenarios, we have c t and C T being finite. The sets X t = [0, C t ] n ⊆ R n are also nonempty, convex and compact. With initial wealth W 0 = 1, we can restrict feasible sets X t on X t .

Lemma 3
The recourse functions Q t (x t−1 ) for t = 1, . . . , T are finite and convex on X t .

Remark 7
For any given decision x i ∈ X i for i = 1, . . . , t − 1 and random return vector, the feasibility set X t is always non-empty. As described in [22], this follows that Q t (x t−1 ) is continuous on X t . Indeed, SDDP method utilizes the convexity of recourse functions Q t (x t−1 ) and constructs lower approximations for them. The principle is similar to the popular cutting-plane method. The construction of lower approximations is based on subgradient inequality and the following lemma provides a justification for the computation of the coefficients in (16). The lemma extends Proposition 1 in [47] to the use of risk measure instead of expectation. (Z (x, ω)). Let Z : R n × Ω → Z be a convex function in x for fixed ω ∈ Ω and z(x, ω) ∈ ∂ Z (x, ω) for somex ∈ dom(ψ). Assume that for ρ φ (Z (x, ω)), φ * i is the optimal weight corresponding to the worstcase distribution P * ∈ P (identified as an R N vector). Then, g :

Convergence Analysis
In this subsection, we shall verify the theoretical properties of the algorithm, including the convergence. The two assumptions imposed throughout the analysis are stated below. Assumption 2 Every node has a positive probability of being selected in forward pass, i.e.P(r k t = r i t ) > 0 for t = 1, . . . , T , i = 1, . . . , N and k ∈ N, wherer k t is the random variable representing the selection process. Also,r k t is independent of the previous selection process.
We first give a lemma, followed by a corollary that can reduce significantly the computational cost in the backward pass. Without further specification, we shall omit P-a.s. in the proof for simplicity. Notice by the fact that the feasibility sets are nonempty for any implementable policies and realizations, the coefficients α l t and β l t are finite. Recall the dual problem of (14).
Lemma 5 WithP-almost surely, for t = 1, . . . , T , α l t = 0 and β l t ≤ 0 for all l ∈ N. Corollary 1 WithP-almost surely, the dual optimal solutions are the same within each time step, i.e. π i t = π j t for t = 1, . . . , T − 1 and i, j = 1, . . . , N . Making use of Lemma 5 and Corollary 1, we do not need to compute the coefficient α as it is always equal to zero. Moreover, instead of solving N problems to obtain Q i t and π i t , we need to solve only one program. By rescaling, we obtain all other Q i t while the dual solutions are the same. By doing so, the computational cost in backward pass can be substantially reduced.
As a final step, we now justify the convergence of the algorithm: The lower approximation property is going to be verified and the a.s. convergence is established at the end. Throughout the algorithm, we introduce the following two functions at iteration k, namely Function (17) is the constructed approximation function of the original recourse functions Q t (x t−1 ) while function (18) is the optimization problem in each stage by replacing the unknown recourse function Q t (x t−1 ) by its current approximation. We emphasize that the notation Q i t (see description above (14)) should not be confused with the function defined in (18). Precisely, We may denote the value of ρ under distribution P as ρ P where necessary. Lemma 6 shows that the algorithm constructs lower approximation to the recourse function and Lemma 7 is a technical lemma which helps develop a crucial inequality in the convergence proof.
With the help of previous two lemmas, we can develop the convergence property of the proposed algorithm, which demonstrates the point-wise convergence of the approximation function (17) to the true recourse function and hence, the convergence to the optimal solution. Theorem 3 Consider Algorithm 1. The following two statements holdP-almost surely:

Computational Results
In this section, we first demonstrate the use of approximation and the semi-analytical solutions. Next, we compare problem (1) with the one without correlation ambiguity under simulation. That is, we assume that the first two moments are known completely and the same algorithm can be applied to solve by equating the lower and upper correlation bounds. Finally, we apply the model to a real data set. The distributional robust model, as expected, performs better under shocks. All the computations are performed using MATLAB in a computer using 3.60 GHz Intel Core i7 with 16 GB RAM and the time is presented in second.

Convergence and Approximation
To visualize the convergence, we consider a case with small numbers of stages. In particular, we consider T = 1 and T = 2, where the upper bound can be computed by evaluating the objective function directly using the trial solution x k 1 in each iteration. Interior-point methods are used to solve LP problems with the termination criterion being the stabilization of the gap between lower and upper bound. The gap tolerance is set to be 10 −6 . We use 45 stocks from the Hang Seng Index (HSI) with relatively longer data history in Hong Kong with the daily historical data obtained from either 26/03/2008 or 27/05/2017 to 27/05/2019. The data is obtained from Yahoo! Finance through the MATLAB function hist_stock_data developed by [50]. The returns are computed using closing prices. The individual asset mean and variance are estimated using sample quantities while the correlation bounds are quantified by 95% confidence intervals.
The spectral risk measure adopted in this example is the exponential type, which is proposed in [11] and followed by [7]. The spectral function of this type is defined as φ( p) = h(1 − e −h ) −1 e −h(1− p) for h > 0. A larger value of h corresponds to a higher degree of risk aversion. Figure 3 shows the convergence of the case h = 2 with N = 2739 return scenarios and T = 1 while the graphs of other testing cases are similar. To examine the solution quality between the one using spectral risk measure and the one using the approximation, the solution obtained via approximation is used to perform the full evaluation of the objective in spectral risk measure. Relative error is used as a criterion for measuring the performance, which is computed as the absolute difference between the two obtained objective function values divided by the true value. Table 1 summarizes the results under different settings identified as a triple (T , h, N ).
We observe that in all the testing cases, convergence is achieved after a relatively small number of iterations. This corroborates the convergence result discussed in Theorem 3. For the computational time, as expected, the use of approximation can remarkably reduce the computational effort by around 6 to 10 times. This can be elucidated by the use of LP solver for CVaR measures while the general spectral risk measure resorts to convex optimization solver. By exploiting the linear structure, LP solver could perform better in time. Though the computational time is much reduced, the solution remains reasonably accurate, with relative errors staying in the order of 10 −5 . We also remark that the resulting solutions exhibit sparsity. The solutions in all the cases contain less than 20 non-zero weights and in most cases, the number of nonzero weights is around 10. The under-diversification result agrees with recent literature, for example, [45]. Conventionally, robust portfolios are thought to be conservative with a large degree of diversification. Indeed, uncertainty in individual asset performance usually leads to a well diversified portfolio as a result of our confident view towards the estimated correlation structure. However, when we consider the risk associated to the unknown correlation structure, anti-diversification would be counter-intuitively more desirable since the assets, in the worst-case scenario, may fluctuate in a similar fashion. Focusing on a few assets appears to be a more robust choice. In other words, ignoring ambiguity in the correlation matrix concerned may oftentimes result in a misleading conclusion that a more diversified portfolio is robust and risk averse.
As we could expect, by increasing the number of scenarios, it would lead to an increment in the objective value. In the Supplementary material, we conduct a simulation study, based on the same data set, to study the effect of number of scenarios. We also examine the convergence under various choices of the initial guess (with results provided in the Supplementary material). We observe that our algorithm is relatively insensitive to the choice of initial guess. Finally, note that the semi-analytical solution provides an upper bound to the discretized problem. Indeed, in this experiment, we  (11.97 times) also observe that the optimal value from the numerical scheme is always less than the one given by the semi-analytical solution. We further investigate the difference between the semi-analytical solution and the numerical scheme if we have extra support information on the return vector in Sect. 6.2.

Use of Support Information
In order to compare the semi-analytical solution and the solution obtained from the numerical method, we conduct a simulation study to examine the effect of support information. We assume that there are 5 assets with mean 1, standard deviations {0.1, 0.05, 0.09, 0.1, 0.08} and correlation matrix The lower (upper) correlation bounds are constructed by subtracting (adding) 0.1 from (to) the correlation matrix C.
In the experiment, we first generate N 1 scenarios from uniform distribution on (1 − δ 1 , 1 + δ 1 ) entry-wisely. We use uniform distribution to confine the support of the return vector on a bounded region. Next, we add additional N 2 − N 1 scenarios generated from uniform distribution on (1 − δ 2 , 1 + δ 2 ) for some δ 2 > δ 1 . That is, we enlarge the scenario set and the support of the scenarios. We repeat this process to generate 4 sets of scenarios with N 1 = 1000, N 2 = 2000, N 3 = 4000, N 4 = 8000, δ 1 = 0.12, δ 2 = 0.15, δ 3 = 0.2 and δ 4 = 0.3. We solve the resulting problem using the 4 generated sets of scenarios by our proposed numerical method with T ∈ {1, 2, 5} and we use the approximation of exponential type spectral risk measure with h ∈ {2, 10} as the objective. We record the results after 200 iterations, where we observe the lower bound is already stabilized. Table 2 presents the results of the simulation study. Each column presents one generated scenario set with total number of scenarios N and support (1 − δ, 1 + δ). The last column shows the optimal value from the semi-analytical solution via the SDP reformulation. For each pair of (T , h), we present the terminal value from the algorithm and compute the l 1 norm difference between the solution obtained from the algorithm and the one by the semi-analytical solution. We observe that when we enlarge the support (i.e., increase δ), the optimal value, as well as the optimal solution, approach the one obtained from the semi-analytical solution. Also, the optimal value from the semi-analytical solution always serves as an upper bound. We notice that if there exists support information (i.e., δ is small), the optimal solution could behave differently from the one from the semi-analytical solution.

Use of Correlation Ambiguity
In order to examine the importance of correlation ambiguity, we perform a simulation study on three different models: (a) model (1) which excludes correlation ambiguity (complete knowledge in second moments), (b) model (1) which includes correlation ambiguity and (c) a simple single-period global minimum variance model using sample estimates. In this study, we use multivariate normal distribution for scenario generation under Setting I, which is provided in the Supplementary material. First batch of 2000 scenarios based on the settings are generated, which are considered as historical data and they are used to obtain the investment decision. Afterwards, second batch of 10,000 future scenarios are generated and the terminal wealth under three different decisions are compared to investigate the performance. The second batch of scenarios is generated using the worst-case correlation matrix obtained when solving problem (1). For the model setting, we consider T = 21 as a monthly trading exercise such that the period is long enough to observe the alteration in correlation structure. The obtained portfolio weight is adopted throughout the whole period. We employ the approximation of exponential type spectral risk measure with h = 10, which is given by 0.3546E(Z ) + 0.6454 CVaR 0.884 (Z ). The histogram of the simulated terminal wealth is provided in Fig. 4. For other values of h, similar patterns are observed.
We observe that the terminal wealth using the model with correlation ambiguity result in a right shift of the empirical terminal wealth distribution when compared with the other two models. This demonstrates the superiority of the model using correlation ambiguity in terms of protection under change in correlation structure. Similar observations can also be found under other settings and the corresponding results are summarized in the Supplementary material. If the period T is not long Fig. 4 Simulation study under setting I enough, the advantage of our proposal tends to become less noticeable, which is reasonable as the model aims at protecting investors from changes in the underlying correlation structure amongst individual assets. To summarize, this study indicates the importance of correlation ambiguity in portfolio optimization model.

Real Data Application
This section presents a real data study with reference to that presented in [29]. We use historical data of six major indices representing different parts of the world, namely Nikkei225, FTSE100, Nasdaq, DAX30, Sensex and Bovespa, from 01/11/1998 to 16/06/2019. The indices are normalized to 1 on 01/07/2003. Figure 5 displays the historical paths of the selected six indices. We compare the following three models: the proposed one with correlation ambiguity, single-period global minimum variance model and distributionally robust model with the expectation objective under l 2 norm in [47] with the parameter r = 0.05 indicating the maximum l 2 distance from nomial probability. The parameter is chosen with reference to the three choices (0.033, 0.066, 0.133) in [47]. We also test for some other choices of r and the results are presented in the Supplementary material.
First, we compare the overall performance of the proposed model under different values of parameter h. To examine the protective effect under a shock in financial market, prices prior to 04/01/2007 are used for training purpose and the trading ends on 31/12/2013. We also perform another analysis using a longer period, which is from 01/07/2003 to 16/06/2019 and the results are demonstrated in the Supplementary material. The approximation of exponential type spectral risk measure with three different parameters h = 2, 10, 15 are employed, which are given in Table 3. The numerical result is based on the monthly rebalancing portfolio management strategy. At the first day of each month, a T period problem (1) is solved with correlation ambiguity using all previously observed data with T being the number of calendar days. Number of calendar days is used instead of trading days since the non-trading days vary with each index.
The result is shown in Fig. 6. The upper panel indicates the wealth process with unit initial wealth while the lower panel gives the difference between the reference model h = 15 and the other two choices. We observe that the three models perform similarly with profits at the end of the testing period. However, it could be noticed that h = 15 eventually outperforms the other two models, which is reasonable since this is the most risk-averse one when compared with the other two. The difference between h = 10 and h = 15 is not significantly distinguishable mainly due to the similarity of the approximated risk measures. For the model with h = 2, it is natural that, under a huge market downturn, as a result of a smaller degree of risk aversion, the model ends up with the poorest performance among the three, while it is surprising that the difference is more distinct at the end of testing period, where an upward trend starts. In particular, we examine the portfolios starting from 2013 for the three models. We notice that from Fig. 7, the weight on BOVESPA for that model is larger than the other two models most of the time. Indeed, the arithmetic return over the whole period of the last index is the largest among all indices while it performs the poorest in terms of index value normalized at the beginning of 2013. This could be served as one possible explanation for such a deviation.
The out-of-sample performance statistics are summarized in Table 3. We observe that the overall mean using model h = 15 is the largest the variance is the lowest among the three. Also, the mean-to-standard deviation (SD) ratio outperforms the other two models. Another possible measure for out-of-sample performance is the  The weights distribution in each period for different choices of h drawdown over the period, which is defined as the current value subtracting the historical high and normalized by the historical high. The normalization is used to ensure fair comparison in relative sense. This can measure how shocks may adversely affect the wealth performance. The colored regions indicate that model using h = 15 has the smallest drawdown in Fig. 8. It is observable that the shaded region covers the financial crisis and the concomitant recovery period, which indicates that the model has a strong protective power against shocks.
Finally, we compare our proposed models with the aforementioned two models. The single-period global minimum variance portfolio is computed at the beginning of each month while the multi-period model with the same value of T in [47] is employed. Figure 9 shows the computational result. The upper panel demonstrates the wealth process while the lower panel gives the differences. Each column represents a choice of h described previously: h = 15 for the left-most column, followed by h = 10 and h = 2. As expected, both risk-averse model h = 10 and h = 15 outperform the other two models. In particular, a larger deviation can be observed in the comparison with single-period global minimum variance model, implying the importance of robustness. The advantage of our proposed model over the one in [47] using l 2 norm could be explained by the change in dependence structure during the period, which coincides with our intention for proposing the model. More importantly, their model uses expectation in the objective function while we use a risk measure. The greater level of risk averseness could also account for the edge. On the rightmost column, we observe that the model with h = 2 underperforms all the other two models. The global minimum variance model appears to be a less aggressive model which contributes to this result. For the second one, it performs better since they do not have a moment constraint. In our construction of uncertainty set, the means of the returns are fixed while in their construction, moment constraints are not required. Therefore, the flexibility in moment offered from their model yields a better result. . Lower panel shows the difference between our model and the remaining two models in percentage with respect to initial unit investment in the following order: out-performance of our model compared with DRO Exp and that compared with GMVP

Conclusion
Distributionally robust portfolio optimization originates from the uncertainty in the correlation structure amongst different assets, hence the return distributions. Such kind of models is of surging interest due to their protection against unfavorable outcomes. In this paper, we study the multi-period portfolio problem under correlation uncertainty. While previous works focus mainly on mean-variance or utility objective, we incorporate the use of spectral risk measure in our model so that investors can incorporate their risk preference in the portfolio optimization exercise. To achieve the goal, we first extend the work [35] by considering the momenttype uncertainty set instead of ellipsoidal set. The use of the existing ellipsoidal sets cannot capture explicitly the correlation ambiguity, which is the key concern in this paper.Also, our semi-analytical solution is derived for spectral risk measure objective, which includes their mean-CVaR objective as a special case. Second, we discuss a numerical SDDP method to solve the problem under the discrete distribution setting. The method relies on convex optimization subproblems and can be applied to a more general distributionally robust problem with moment constraints with the almost sure convergence of the algorithm being guaranteed. Our work extends previous literature in the convergence of SDDP method to a distributionally robust setting.
In numerical exercises, we demonstrate that the approximation of spectral risk measure is feasible, which can significantly reduce the computational burden of the algorithm. We also illustrate the importance and usefulness of correlation ambiguity in our model. Specifically, consideration of such ambiguity leads to a higher terminal wealth as demonstrated in both the simulation study and the real data analysis. We also observe the robustness of the model during the period of financial shocks through the drawdown plot, which highlights the purpose of our proposed model. While the almost sure convergence of the proposed algorithm is guaranteed, it is noteworthy that historical data is employed in the numerical setting and this may not be able to fully evaluate the worst-case spectral risk measures. Possible future research directions include the convergence of the solution under the discrete grid to the true solution and the possibility of using generated scenarios instead of historical data. The quantification of such a convergence may improve the efficiency of the numerical procedure by reducing the computational burden.