Multiobjective portfolio optimization via Pareto front evolution

Portfolio optimization is about building an investment decision on a set of candidate assets with finite capital. Generally, investors should devise rational compromise to return and risk for their investments. Therefore, it can be cast as a biobjective problem. In this work, both the expected return and conditional value-at-risk (CVaR) are considered as the optimization objectives. Although the objective of CVaR can be optimized with existing techniques such as linear programming optimizers, the involvement of practical constraints induces challenges to exact mathematical methods. Hence, we propose a new algorithm named F-MOEA/D, which is based on a Pareto front evolution strategy and the decomposition based multiobjective evolutionary algorithm. This strategy involves two major components, i.e., constructing local Pareto fronts through exact methods and picking the best one via decomposition approaches. The empirical study shows F-MOEA/D can obtain better approximations of the test instances against several alternative multiobjective evolutionary algorithms with a same time budget. Meanwhile, on two large instances with 7964 and 9090 assets, F-MOEA/D still performs well given that a multiobjective mathematical method does not finish in 7 days.


Introduction
Portfolio optimization models the investment decision mathematically, in which investors are going to allocate their finite capital into a set of assets such as stocks, funds and derivatives. The decision is supposed to devise a compromise to both return and risk. Modern portfolio theory is introduced by Markowitz [1], and his mean-variance (MV) model lays the foundation of the investment research. In the MV model, return and risk of an asset is statistically quantified with its expected return and standard deviation [2,3]. Various alternative risk measures in addition to variance are also considered by Markowitz. However, due to computational problems, Markowitz adopts variance as the risk measure in his analyzes [4]. Variance is not theoretically robust because it evaluates both positive and negative returns as risk when the distributions of returns are asymmetric [5,6]. Therefore, the downside risk measures, only considering relatively poor returns, are developed [7]. The notable down-B Aimin Zhou amzhou@cs.ecnu.edu.cn 1 Shanghai Key Laboratory of Multidimensional Information Processing, School of Computer Science and Technology, East China Normal University, Shanghai, China side risk measures involve semicovariance [8], lower partial moments [4], Value-at-Risk (VaR) [9] and Conditional Valueat-Risk (CVaR) [10].
Among these measures, VaR is popular and it has achieved a high status of being adopted in the industry field. Nevertheless, it meets several significant problems. First, it is unstable with empirical discreteness loss [11]. Secondly, it fails to be coherent [12]. Thirdly, minimizing a VaR is a computational complex task because it is non-convex and has many local minima [13]. On the contrary, these problems are no longer the obstacles when CVaR is employed [11]. Hence, we adopt CVaR as the measure of risk in this work. With a given confidence level 99%, if the VaR of a portfolio is 10,000 dollars, it means that with a 99% chance the loss will not exceed 10,000 dollars. However, the CVaR denotes the conditional expectation of losses above 10,000 dollars. It should be noted that the optimization of CVaR involves the optimization of VaR because these definitions ensure that VaR will never exceeds CVaR.
Basic portfolio problems can be easily solved through mathematical programming methods [10,14]. Notwithstanding, in the real investment, this basic model is extended when various trading restrictions and preferences of investors are introduced as constraints. The efficiency of exact mathemat-ical methods will deteriorate when constraints are included. Further, the deterioration will be catastrophic for large-scale portfolio problems if a cardinality constraint, restricting the number of selected assets, is involved [15]. For these constrained portfolio problems, they have been intensively studied in the past several decades [3,16]. Since they are non-convex in most cases, heuristics, such as evolutionary algorithms (EAs), and hybrid algorithms are thus broadly adopted for them.
In the field of portfolio optimization, researchers are used to devise tailored EAs for specific constraints and models (resulting in different objective functions). Most of the studies focus on the MV model [3,[16][17][18][19]. Chang et al. develop three heuristic algorithms based on genetic algorithms, tabu search and simulated annealing for cardinality constrained portfolio optimization. The results indicate that no algorithm outperformed others [20]. This work is among the first to solve the constrained portfolio problem via EAs. Chen et al. devise a tailored variable representation and initialization scheme for the EA. These not only make the generated solutions naturally satisfy the constraints but also significantly expedite the convergence of the algorithm [21]. Kalayci et al. present a hybrid metaheuristic algorithm, it is composed of elitism mechanism from genetic algorithms, modification rate from artificial bee colony optimization approach and the Gaussian formulation from continuous ant colony optimization. The comparisons against other methods denote the proposed algorithm is efficient [22]. Drezewski et al. design an agent-based co-evolutionary multiobjective algorithm for portfolio problems. This algorithm uses two subpopulation to optimize the objectives and employs a competition mechanism for resources. The results show that the algorithm can find good solutions with a high level of diversity [23].
In the past few years, downside risk measures have been investigated more frequently [7]. EAs are also adapted for these problems. Lwin et al. propose a learning guided nonparametric approach (MODE-GL) for the mean-VaR model. The algorithm adopts a learning mechanism for the selection of assets and it shows the potential for good convergence [13]. Najafi and Mushakhian develop a hybrid algorithm with genetic algorithm and particle swarm optimization algorithm for the portfolio problem, measuring the risk with both semivariance and CVaR. Computational results show that the algorithm approximated the Pareto front (PF) well [24]. Mohuya et al. present a new biobjective fuzzy portfolio model, considering Sharp ratio and VaR as the objectives. Meanwhile, they solve this model through three EAs. According to their observations, multiobjective cellular genetic algorithm is the best among these employed EAs [25]. Konstantinos Liagkouras develops a three-dimensional encoding multiobjective evolutionary algorithm (TDMEA). The algorithm is expected to be efficient on the large-scale problems, because the three-dimensional encoding can keep the length of chromosome invariant. This algorithm is applied to MV, mean-VaR and mean-CVaR model with the cardinality constraint and the results obtained are significant better than two compared well-known MOEAs [26].
Since EAs belong to a heuristic algorithm cluster, these algorithms generally can only get approximate optimal solutions. Therefore, researchers also attempt to improve the accuracy of the EAs with the help of exact mathematical methods.
The hybrid algorithms for portfolio optimization can be divided into two categories regarding the algorithm framework. Moral-Escudero et al. present a hybrid strategy to make use of a genetic algorithm and a quadratic programming optimizer. The genetic algorithm is adopted to tackle the cardinality constraint and the quadratic programming optimizer is implemented for finding the optimal capital weights of the selected assets [27]. Gaspero et al. incorporate a hill climbing strategy with a quadratic programming optimizer. The asset combinations are determined by a hill climbing strategy and the optimal allocations of capital are obtained from the quadratic optimizer [28]. These two hybrid algorithms demand multiple runs for aggregation functions to approximate the PF. These hybrid algorithms based on the single-objective framework may not be efficient enough. On the other hand, Branke et al. propose a hybrid MOEA for the MV model, basing on the multiobjective framework. It is able to approximate the PF once it is implemented. Notwithstanding, their algorithm can only work for the MV model with a continuous PF, because it involves frequent judgements of the intersections for local PFs, which are almost non-existent when the PF is discrete [29].
In this work, we propose a hybrid algorithm for constrained portfolio problems. Specifically, a multiobjective mathematical method, AUGMECON2 [30], and a linear programming optimizer are utilized to construct the local PFs for every determined asset combination. Then, the decomposition approach is applied to rank these local PFs [31]. This proposed MOEA, named as F-MOEA/D, maintains a local PF for each individual. It is different from general pointbased MOEA, preserving only one point for an individual. F-MOEA/D employs an exact mathematical method for finding the optimal capital weights of selected assets. So it might be more accuracy than general MOEAs. On the other hand, it takes the advantages of MOEAs and thus may require less computational resources than an exact mathematical method on large-scale problems.
The rest of this paper is organized as follows. The next section introduces the definitions of multiobjective optimization and CVaR. In the subsequent section, a practical mean-CVaR model is presented followed by which the proposed F-MOEA/D, including the main framework, the decomposition approach and the generate operators for this front-based MOEA is detailed. The empirical results are presented and discussed next. The final section concludes the paper.

Problem formulation
This section introduces the optimization model studied in the paper.

Multiobjective optimization
A multiobjective optimization problem 1 is stated as where is the decision space, F : → R m consists of m real-valued objective functions and R m is the objective space. Since the objective values are vectors rather than scalars, a partial order relation, dominance, is introduced. For two solutions a and b, the solution a is considered to dominate the The optimal (non-dominated) solutions of this problem constitute a Pareto set (PS) and the optimal objective values compose a Pareto front (PF) [14,32].

Risk measures
Variance is an important and widely studied risk measure for the portfolio problem [3,19]. Nonetheless, this risk measure assumes that the distribution for return of assets should be normal. For example, Fig. 1 exemplifies the variance risk measure for a normal distributed asset. According to the perception of risk for investors, the proportion with lower return than the expected return is considered as the risk. This risk measure is applicable when the distribution is symmetrical about the mean line, otherwise the variance can barely depict the downside risk.
VaR measures the maximum downside risk of a portfolio with a given confidence level and CVaR is the expected loss above the VaR. Figure 2 illustrates the VaR and CVaR intuitively. The vertical axis is about the frequency and the horizontal is about the loss. It depicts a distribution of the loss when the confidence level is α and the maximum downside risk is β.
Formally, assume there are n available assets. r denotes the return of the assets and its probability distribution is assumed as p(r) for convenience. Define w = (w 1 , . . . , w n ) T as the  invested portfolio weight. f (w, r) = −r T w is the loss and thus the cumulative distribution function is given as For a given confidence level α, CVaR of the loss is where VaR α = inf{β| (w, β) ≥ α}. Further, Eq. (1) can be approximated by sampling the probability distribution in r [10]. The approximated alteration of Eq. (1) can be stated as where (·) + indicates only the positive numbers are considered in this expectation. Nevertheless, directly minimizing CVaR via Eq. (2) is unachievable with existing mathematical programming methods because it is a conditional expectation instead of a direct formula. Rockafellar and Uryasev; hence, remodel it in a linear formulation [10]. Let Γ be the set of assets, |Γ | = n, and Π be the set of time intervals, |Π | = t. The sampling from real markets generates a collection of normalized return matrix R as With the return matrix Eq. (3), the linear formulation for CVaR is given as where φ and γ i that i ∈ Π are auxiliary variables. They have proved the solutions for these two CVaR models are same [10].

Practical mean-CVaR model
To build a practical mean-CVaR model, a normalized stock market indices over time are included for measuring the loss. Further, four practical constraints are also included [15,18]. They are (i) cardinality constraint: it restricts the number of assets (K ) for composing a portfolio, (ii) floor and ceiling constraint: it determines the minimum and maximum limits of the invested proportion for each asset, (iii) pre-assignment constraint: it denotes which asset must be selected regarding the preference of investors, and (iv) round lot constraint: it demands each asset should be held as multiples of trading lots. Thereby, the practical mean-CVaR model, involving the normalized stock market index ζ = (ζ 1 , . . . , ζ t ) is stated as where the stock market indices, ζ , is considered in the risk CVaR. The return R is inverted for minimization and the vector μ = (μ 1 , . . . , μ n ) T includes the mean values for every rows in the return matrix. Equation (6) requires that all the capital should be invested. Equation (7) is the cardinality constraint (i.e., just K assets are selected) and s = {s 1 , . . . , s n } T is an indicator vector; s i = 1 if the asset i is selected, and s i = 0 otherwise. Equation (8) is the floor and ceiling constraint. Besides, Eq. (9) represents that the asset i must be included in a portfolio if z i = 1. It is a pre-assignment constraint. Then Eq. (10) defines the round lot constraint and τ is a multiple. The multiples for every assets are set to be a same constant, with which 1 is divisible, since it will be really hard to handle flexible ones. In that case, it will be beyond the scope of the main discussion in this work. Finally, Eq. (11), which is the discrete constraint, implies that s i must be binary. In summary, the decision variables in this practical mean-CVaR model are selection variable s (constrained to Eqs. (7), (9) and (11)), weight variable w (constrained to Eqs. (6), (8) and (10)), and auxiliary variables φ and γ . Note that we evaluate a return of one portfolio by the historical data, although there are some alternatives [1,33,34]. Because this article mainly discusses about developing an algorithm for the mean-CVaR model, discussions of different measures for the return will be beyond the scope of this article.

Constructing the local PFs
The main computational burden for the exact mathematical method on the problem (P) is introduced by the integer variables (cardinality constraint), casting it as a mixed-integer problem (MIP). The demanded computational resource of mathematical methods for MIPs, such as branch-and-bound and cut plane methods, grows exponentially when the amount of the integer variables increases [35,36].
According to the problem (P), there are two pivotal variables (s and w) and they are integer and real numbers. For a portfolio, the solution can be presented as (s, w). Fortunately, mathematical methods can still be efficient if the integer variables s, denoting the combinations of selected assets, are handled in advance. For instance, Fig. 3a exemplifies three sub-objective spaces for three different combinations of selected assets, s 1 , s 2 and s 3 on a real-world portfolio problem. A solution with determinate selected assets and  indeterminate weights is presented as (s, ·). The solutions, bundling a variable s with all feasible weights w, will constitute the sub-objective space of (s, ·). The optimal solutions for each sub-objective space will constitute the local PFs, as shown in Fig. 3b. Specifically, the local PFs for variables s 1 and s 2 are two curves, and the local PF for variable s 3 is a data point. For mathematical methods, it is much easier to find the local PFs for the sub-objective spaces than finding the global PF for the original problem (P), when the integer variables are determined in advance.
In particular, when the integer part, variable s, can be found, the problem (P) can be converted into the following problem with an additional ε-constraint. i∈Γ where the objective R is involved as a constraint, Γ ∈ Γ , and |Γ | = K . In the problem (P ε ), since the n-dimensional binary variables s is eliminated, much less computational resource is required when the exact mathematical method is invoked. There are some alternative algorithms for the εconstraint method. Most of the algorithms will approximate the PF via setting a number of different values of ε [14].
These different values of ε are called grids. AUGMECON2 is employed in F-MOEA/D to deal with the problem (P ε ). It is a multiobjective mathematical method based on ε-method [30]. It inherits some novel strategy from AUGMECON, escaping from weakly Pareto optimal solutions and redundant iterations [37]. Further, AUGMECON2 enhances the computational efficiency since many redundant grids can avoided. To solve the problem (P ε ), AUGMECON2 is combined with a linear programming optimizer from Matlab.
According to the discussion above, we can construct an estimation of the local PF with a number of grids when the integer variable s is determined. In the next subsection, the method about ranking the local PFs will be presented.

Ranking the local PFs
The optimum of the problem (P ε ) contains a set of local Pareto optimal solutions rather than a data point. Therefore, we need to rank the corresponding local PFs for the integer variable s appropriately for the environmental selection in MOEAs. Although there should be many different shapes of the local PFs, without loss of generality, we discuss how to perform the elitist selection among local PFs as illustrated in Fig. 4. These curves denote three local PFs for different s and all of them includes infinite points. Performing the elitist selection among these local PFs is tricky for existing algorithms. We exploit the decomposition strategy to decompose the multiobjective problem into several scalar optimization subproblems [31,38].

Fig. 4 Elitist selection among local PFs
Suppose a local PF F is constructed when the integer variable s is determined. In a minimization problem, we have to evaluate all the points of the local PF F for a scalar subproblem. Then the point with the minimal value can stand for the local PF on the subproblem. In this manner, the comparison between the local PFs is converted into the comparison of some representatives and any decomposition method is applicable. In this work, the Tchebycheff approach is used and it is stated as where F is the local PF, F = ( f 1 , . . . , f m ) is an objective vector. λ is a weight vector, λ = (λ 1 , . . . , λ m ) T , and z * i is the optimal value of the ith objective. The right side of Eq. (18) involves two parts. First, all the points F in the local PF F are evaluated by the Tchebycheff approach. Second, the minimal value for the Tchebycheff approach can be determined via traversing the all the points of the local PF. If g(F 1 |λ, z * ) is smaller than g(F 2 |λ, z * ), the local PF F 1 is considered to be better than local PF F 2 . Figure 4 depicts this elitist selection for two direction vectors (v 1 and v 2 ), corresponding to two weight vectors. In Fig. 4a, as for local PF 1 , point A has the smallest Tchebycheff value with the direction vector v 1 . For local PF 2 and PF 3 , points B and C have the smallest Tchebycheff values. Since the smaller g(F|λ, z * ) the better, we can hence conclude the local PF 1 is the best while the point A has the smallest value for the Tchebycheff approach. For the same reason, the local PF 2 is the best in the Fig. 4b.

Framework of F-MOEA/D
MOEA/D is a decomposition-based multiobjective evolutionary algorithm, in which variables are optimized iteratively [31,39,40]. The main idea is using the neighborhood information to assist the search of every individual in the population. At each generation, MOEA/D maintains N solutions x 1 , . . . , x N , where x i is the solution to subproblem i. Assume each subproblem is a minimization problem and the objective function of subproblem is g i (x). According to the distances among weight vectors, a T -neighborhood of each solution can be determined since each one corresponds to a weight vector. Thereby, a solution in the T -neighborhood of solution x i is defined as a T -neighbor of x i . The main loop of a basic MOEA/D framework works as follows.
For each subproblem i: AUGMECON2 and a linear optimizer. As for the inputs, P is the problem data, x i is the integer variable s, and 2 √ N is the number of grids for the estimated local PF. For more details, audiences can refer to [30]. Afterwards, the reference point z and the fitness value F V i for every subproblem i are initialized (line 3). Lines 4-27 describe the main loop of F-MOEA/D. From lines 6-10, the indices of the mating and replacement pool P are determined. Next, new candidates are generated and their local PFs are estimated (lines 11 and 12). The generate operators are detailed in "Representation and generate operators". From lines 13-17, the reference point z is updated. Then, no more than n r subproblems will be updated if the new candidate is better (lines [18][19][20][21][22][23][24][25]. A more complete re-estimation of the local PFs will be performed by setting the grid number as N instead of 2 √ N (line 28).

Representation and generate operators
Various alternative representations can be employed for the binary vector s. We follow and slightly alter the representation and generate operators in [15]. They are tailored for the constrained portfolio optimization and the simulator experiments have demonstrated their effectiveness. Specifically, a n-dimensional continuous vector x is employed for the binary vector s. If L pre-assigned assets are selected, among the assets which are not pre-assigned, the K − L assets with highest values are selected. Then two generate operators are implemented with same chances. First, generate operators for continuous variables can be directly employed here. They are differential evolution (DE) [42] and polynomial mutation [43], which never use prior knowledge. They are defined as follows: where x 1 , x 2 and x 3 are three solutions randomly selected from the population, F is a scaling factor in DE, polymt (u i , η m , p m ) is a polynomial mutation function on x i with an index parameter η m and a mutation probability p m , and x new is the new solution.
Second, a tailored operator that utilizes the known information of the portfolio optimization problem is slightly altered as follows.
O2 Swap the values of x i and x j : x i x j , where the asset i is randomly chosen from the selected assets and the asset j is chosen randomly from the non-selected assets through one of the following strategies S1 Choose an asset which has the highest return (μ j ) S2 Choose an asset which has the lowest CVaR α S3 Choose an asset which has the lowest VaR α .
-NSGA-II [44] is among the most popular MOEAs. It is based on the non-dominated sorting and crowding distance measure. -MOEA/D [31] is a decomposition-based MOEA. It decomposes a multiobjective problem with aggregation approaches and optimizes the decomposed problems simultaneously. -SMS-EMOA [45] is an indicator-based MOEA. It evaluates the generated solutions with hypervolume, thereby this algorithm searches in the direction of the best hypervolume. -Compressed Coding Scheme (CCS) is a tailored representation for MOEAs on portfolio optimization [15]. It maps a real-valued vector to both selection and allocation of assets. In this manner, the dependence among variables is expected to be utilized among the search. CCS has been incorporated into three mainstream MOEA frameworks [39]. In this work, CCS-MOEA/D is adopted. -The efficient learning-guided hybrid multi-objective evolutionary algorithm (MODE-GL) is proposed to solve portfolio optimization problems with real-world constraints [13]. In this work, a learning-guided strategy is devised to promote the algorithm search towards promising regions. -A three-dimensional encoding multiobjective evolutionary algorithm (TDMEA) is devised for the portfolio optimization problem [26]. A coding structure, specially developed to keep the size of chromosome invariant to Maximum replacement size n r 2 The probability that parents from neighborhood p δ 0.9 Parameter for AUGMECON2 Number of grids 100 the size of test instances, makes it efficient for large-scale problems. -AUGMECON2 is a multiobjective mathematical method based on ε-method [30].
Concerning the parameters, they are summarized in Table 1

Benchmark problems
The benchmarks like OR-library 3 are widely used and play a significant role in promoting the study of portfolio optimization. However, a distinct feature of the financial markets nowadays is that the scale of problems raises dramatically year after year. This requires us to tackle large-scale problems, involving thousands candidate assets. Hence, we test the compared algorithms in practical stock markets (Chinese stock markets) nowadays. The data is composed of daily 3 http://people.brunel.ac.uk/~mastjjb/jeb/info.html.

Performance measures
Regarding the performance measures, two metrics are applied, Hypervolume (HV) [46] and Generational Distance (GD) [47], they are easy to use in PlatEMO [48]. HV prefers heuristics because it takes the diversity of solutions into account. Heuristics, like EAs, always pay much attention to the diversity of solutions while exact methods hardly do it [14,49]. It is given as where Q is the set of obtained solutions of algorithms, and hc i is the hypercube bounded by solution i and the reference point r . Better solutions lead to higher HV. On the other hand, GD evaluates the convergence ability of algorithms. In this problem, it tends to prefer exact mathematical programming approaches since exact methods are going to find the optimum if possible yet heuristics search for approximations. It is defined as where Q is the set of obtained solutions of algorithms and d i is the shortest Euclidean distance among solution i and the representatives of PF. Better solutions lead to lower GD. Note that the implementations of HV and GD require the true PFs or reference points. The PFs are obtained by imple-menting AUGMECON2&LP with 100 grids. However, they are unavailable while running the exact method after 7 days for some large-scale problems in this work. For these cases, all the solutions from two MOEAs with 31 independent runs are collected to construct the temporary PFs. Concerning the reference point, it is recommended to set the reference point r slightly dominated by the nadir point of the PF. Specifically, r is set as (1.1, 1.1).

Simulation experiments
In the experimental studies, we discuss about the running time of the exact mathematical method on the constrained portfolio problems. Then the performance of the implemented algorithms are compared numerically and graphically. Besides, algorithms with the best performance in tables will be highlighted with the gray shade. Figure 6 shows the running time of AUGMECON2&LP on D1 to D8. The exact mathematical method is very effective for small problems. It just takes several hundred seconds for addressing D1, D2 and D3. Nevertheless, the required times rise non-linearly. It takes more than 1 h to solve D5 and D6, while the demanded running times raise to 32 h for D7 and D8. Furthermore, the exact mathematical method even can not finish in 7 days for the largest two problems. These computational costs for large-scale problems are unaffordable when the investors are going do some high frequency trading operations. This confirms the necessity of devising efficient heuristic algorithms. Tables 3 and 4 summarize the results of seven algorithms regarding HV and GD. The results of AUGMECON2&LP are not reported in these tables but are plotted in the figures as the true PFs for comparison. Because if the exact mathematical method completes, the HV and GD of it must be the best since it can find the proved optimal solutions for these MIPs.
Concerning HV, F-MOEA/D outperforms other six algorithms on almost all the problems while its performance is similar with CCS-MOEA/D and worse than TDMEA on D 7 . The three MOEAs, CCS-MOEA/D, MODE-GL and TDMEA, tailored for the constrained portfolio optimization outperform the generic MOEAs. This result is expected since the tailored MOEAs either utilize specialized representation nor make use of some priori information, such as the return and risk of assets. Among these tailored MOEAs, CCS-MOEA/D and TDMEA have similar performance while MODE-GL is defeated by them on all problems. The solutions obtained by SMS-EMOA are worst while NSGA-II and MOEA/D perform similarly. It is rational because the CPU time is set as the stop criterion and SMS-EMOA will calculate the performance metrics frequently, occupying lots of computational resources. Moreover, the Wilcoxon ranksum test at 5% significant level is also adopted. Symbols '+, −, ∼' means the performance of the corresponding algo- Time(s) 10 4 Fig. 6 Running times of AUGMECON2&LP for eight instances, from D1 to D8 rithm is better than, worse than or similar to F-MOEA/D. For instance, '0/9/1' denotes the CCS-MOEA/D is worse than F-MOEA/D on 9 problems, and competitive with F-MOEA/D on 1 problem. F-MOEA/D is still the best algorithm regarding the Wilcoxon rank-sum test.
Regarding GD, similar conclusion can be made from the Table 4. The results reconfirm the superiority of F-MOEA/D since it has better performance on almost problems and hardly loses to others in terms of the employed Wilcoxon rank-sum test. The three tailored MOEAs still performs well and their performance is better than three generic MOEAs. Meanwhile, SMS-EMOA still can not compete with other algorithms on all the problems.
In Tables 3 and 4, F-MOEA/D is the best algorithm on most test problems, but it is worse than some MOEAs on a few datasets. There are probably two reasons why F-MOEA/D sometimes fails. (1) Because F-MOEA/D can only ensure the solutions for the linear programming problem are optimal but the optimality for the selection of assets from EAs is not guaranteed. (2) In a same time budget, the number of sampling times for the selection of assets is very limited in F-MOEA/D compared to MOEAs since constructing the local PFs is time consuming. Figure 7 illustrates the obtained efficient fronts of every algorithm, hence, an intuitive comparison can be made. Approximated fronts with median HVs for every algorithm, except MOEA/D and SMS-EMOA, are plotted. MOEA/D and SMS-EMOA are omitted because the performance of three generic MOEAs is similar. Due to the space limit, fronts on 6 problems are plotted. In Fig. 7a-d,

Ablation study
Besides the comparison among peer algorithms, this subsection intends to analyze the swap operator (O2) for the F-MOEA/D. There are three swap strategies based on the priori information of the data, they are devised for searching the asset with the highest return, the lowest CVaR and the lowest VaR, respectively. We hence perform an ablation study on these strategies. Due to the space limitation, the   Table 5 shows the results, the first, second, third and all the swap strategies are eliminated. We denote these methods as The results indicate that the first swap strategy play the most important role, because once it is eliminated the solutions become significant worse compared to the original F-MOEA/D on these 4 problems. The algorithm will get exact benefit from the first strategy because the objective, return R, in the portfolio optimization is a linear function. It means we can directly get the best solution for the return R with a greedy method. On the other hand, since the risk CVaR is not a linear function, a portfolio can not be guaranteed to have the lowest CVaR when each asset in the portfolio has a lowest CVaR. Therefore, the second and third strategy will not make such contributions as the first one. Nevertheless, the results in the table denote the algorithm will perform better if these two strategy are included. Figure 8 illustrates the front obtained by each algorithm. Compared to the original F-MOEA/D, the fronts obtained by the algorithm without the swap operator will distinctly shift to the upper right. Since the upper side implies lower return and the right side denotes higher risk, these results indicate that the swap operator assists the algorithm search for better solutions. Furthermore, on D 7 , the solutions obtained by the algorithm without the first swap strategy are very far away from the high return area, this indicates in the large-scale problem the first swap strategy will help the algorithm search for the high return solutions efficiently.

A bit more experiments with different reproduction operators
So far, we have compared F-MOEA/D with three generic and two tailored MOEAs. In this subsection, comparisons among F-MOEA/D and three variants of MOEA/D, namely MOEA/D-DE [41], MOEA/D-PSO [50] and MOEA/D-GA [31], are also made. These variants have shown excellent performance on some traditional black box problems and the parameters of these reproduction operators are kept the same as in the literature.
These three variants of MOEA/D and F-MOEA/D are implemented for 4 problems, D 1 , D 5 , D 7 and D 10 . Statistical results are provided in Table 6 and graphical results are presented in Fig. 9. In the Table 6, it is easy to tell that all these variants of MOEA/D with different reproduction operators are worse than F-MOEA/D not only on the mean values of HVs but also on the Wilcoxon ranksum-test. In Fig. 9, the solutions of these three variants of MOEA/D roughly approximate the PF on D 1 , but they can hardly reach the PFs, on D 5 , D 7 and D 10 . These results are almost anticipated, because these invoked reproduction operators are generic for black box problems and they are not tailored for portfolio optimization. For the small-scale problem, D1 (N = 475), these reproduction operators are barely workable, but for the large-scale problems, D 5 , D 7 and D 10 (N ≥ 3406), these operators almost fail.

Conclusion
This work models the portfolio problem practically. It characterizes the risk of the problem with a popular and easy to solve measure, CVaR. CVaR is more recognized than the variance measure in the industry filed and it is easier to be solved for existing optimization tools. Further, four real-world constraints are included in the problem. This constrained mean-CVaR model is an MIP and it induces challenges to optimization algorithms. Hence, we propose a Pareto front evolution strategy and it is embedded into MOEA/D. This novel algorithm is named F-MOEA/D and it facilitates the complementary advantages in the exact and heuristic optimization realms. Two parts are involved in this algorithms, i.e., constructing local PF through exact methods and picking the best one via decomposition approaches. Specifically, the local PFs are constructed via AUGME-CON2 when the integer variables are determined by the EAs. Then, the best local PF is picked through the decomposition approach. Experimental studies demonstrate F-MOEA/D is more scalable than the mathematical method when the lat-  ter one even can not be completed in 7 days on large-scale problems. Furthermore, the performance of F-MOEA/D is much better than the alternative MOEAs on most of the problems.
For the future work, we are planning to extend this single-period model to multi-period since it is a fundamental requirement of the investors in the real financial markets. Moreover, we believe this kind of hybrid algorithm is suitable for many practical MIPs. So we will continue to find suitable MIPs and propose efficient algorithms based on the basic idea of uniting exact and heuristic methods.