## Introduction

Conventional methods based on gradients cannot solve nonanalytic optimization problems. Researchers in the past 2 decades have confirmed that evolutionary algorithms (EAs) are effective on black-box optimization problems. Also, the need to simultaneously optimize multiple indicators is very common for engineering optimization, which can be modeled as the multi-objective optimization problems (MOPs) with M objectives as follows:

\begin{aligned} \begin{array}{l} \min F(\mathbf{{x}}) = (f_1 (\mathbf{{x}}),f_2 (\mathbf{{x}}), \ldots ,f_M (\mathbf{{x}})) \\ \text {s.t.}\ \mathbf{{x}} \in X. \end{array} \end{aligned}
(1)

Here, X is the feasible region of D decision variables ($$\mathbf{{x}} = (x_1,\dots ,x_D)$$) [1].

Solving MOPs has two difficulties. First, traditional mathematical methods cannot solve MOPs by gradients. Second, the optimal solutions of an MOP is a solution set due to the conflict of multiple objective functions. The Pareto dominance relationship is a classical and widely used method for multi-objective optimization. It can be explained in this way, for any two solutions of an MOP illustrated by Eq. (1): solution $$S_A$$ is known to Pareto dominate solution $$S_B$$, if and only if $$f_i(S_A) \le f_i(S_B)$$ ($$\forall i \in {1, 2, \ldots ,M}$$) and there exists at least one objective $$f_j$$ ($$j\in {1, 2, \ldots ,M}$$) satisfying $$f_j(S_A)<f_j(S_B)$$. The set of all the Pareto optimal solutions in the decision space is called the Pareto set (PS), and the projection of PS in the objective space is called the Pareto front (PF).

So far, multi-objective evolutionary algorithms (MOEAs) have been a successful tool for solving MOPs [2], because they can output the Pareto set in a single run. MOEAs using the Pareto dominance for their selection are popular. Many Pareto-based MOEAs have their own selection methodology. For example, NSGA-II [3] is based on a fast non-dominated sorting approach for selecting good solutions, and SPEA2 [4] is based on the non-dominated fitness allocation. Aggregation-based algorithms, such as MOEA/D [5], transform an MOP into a number of single-objective optimization problems and optimize those sub-problems simultaneously. Furthermore, the performance measurement can be employed in the selection process of MOEAs: IBEA [6], SMS-EMOA [7], and HypE [8] define their optimization goals in terms of different performance indicators.

As the number of objectives increases, MOPs become many-objective optimization problems (MaOPs), the ability of filtering out good individuals by the Pareto dominance degenerates. To improve the poor performance of the Pareto dominance, many algorithms combine the Pareto dominance-based criterion with additional convergence-related metrics. In those methods, solutions are selected first based on the Pareto dominance, and then based on the convergence-related metric. Those representative approaches include grid-based evolutionary algorithm (GrEA) [9], preference-inspired co-evolutionary algorithm (PICEA-g) [10], knee point driven evolutionary algorithm (KnEA) [11], and many-objective evolutionary algorithm based on directional diversity and favorable convergence (MaOEA-DDFC) [12].

The large number of decision variables of MOPs (termed LSMOPs) pose challenges to existing MOEAs, because the searching space is high-dimensional, which exponentially increases the computation cost. The most straightforward way to deal with LSMOPs is to use the high-dimensional search, like CMOPSO [13, 14] uses competitive particle swarm optimizer [15], which is good at high-dimensional optimization problems. To decrease the searching space, grouping strategies have been widely used in cooperative co-evolutionary (CC) framework for large-scale optimization. In CC framework, decision variables are decomposed into groups as different sub-problems for cooperative co-evaluation strategies. However, most CC algorithms are applied to single-objective optimization, except a few algorithms for MOPs [16]. In fact, the detected information of LSMOPs can further assist the grouping strategies. For example, DRMOS [17, 18], MOEA/DVA [19], LMEA [20], and S3CMAES [21] divide the decision variables into different categories for applying different searching strategies. In addition, the problem transformation method is an alternative way to reduce the high complexity of LSMOPs. For example, LSMOF [22] reduces the number of decision variables greatly by transforming the original LSMOP into a number of weight optimization sub-problems. WOF [23] is another problem transformation method combining with a grouping strategy.

As mentioned above, both the high-dimensional decision and objective space of MOPs increase the hardness and computation cost of solving large-scale MaOPs by existing MOEAs which are specially designed for either MaOPs or large-scale MOPs. To address both issue, we borrow the parallelism of multifactorial optimization (MFO) [24], where a single population is employed to optimize multiple optimization problems simultaneously, for solving large-scale many-objective optimization problems (LSMaOPs). Our proposed algorithm is an two-stage framework combined with the multi-tasking optimization (termed TSMTF).

The rest structure of this paper is as follows: the next section will introduce the related works of LSMaOPs and MFO idea. The detailed the proposed algorithm TSMTF will be shown in the continuous section. Then, the experimental results are presented in the consequent section. Finally, the last section makes the conclusion.

## Related work

### Large-scale multi-objective optimization

To address the problems with a large number of decision variables, some representative MOEAs (CMOPSO [13, 14], S3CMAES [21], MOEA/DVA [19], and LMEA [20], WOF [23]) to reduce the large search space have been proposed. We will discuss them in detail.

CMOPSO [13] employs a competitive swarm optimizer [15]. With the good search ability for high-dimensional space of competitive swarm optimizer, CMOPSO outperforms other MOEAs, but it cannot solve large-scale MOPs. Therefore, it is necessary to further reduce the decision space.

The algorithm in [14] adopts a two-stage framework, unlike the normal particle swarm optimization algorithms focusing on updating velocity, it proposes a new updating strategy to update the particles’ positions with a new competitive mechanism, first pre-updates the particles’ positions by its previous velocity and then learn from the leader’s position. The experiment results show its position updating strategy is effective on LSMOPs.

MOEA/DVA [19], LMEA [20], and S3CMAES [21] are three examples of decision variable analysis-based algorithms. MOEA/DVA [19] solves problems by a clustering method based on the dominance-based relationship, it divides the decision variables into the following three groups:

• Convergence-related variables: the decision variables contribute to convergence.

• Diversity-related variables: the decision variables contribute to diversity.

• Mixed variables: the decision variables contribute to both convergence and diversity.

With those groups, MOEA/DVA re-structures the problem into a set of sub-problems. Then, the convergence-related variables are optimized firstly to get candidate solutions which are close to the PF. Then, MOEA/DVA treats both diversity-related and mixed variables as diversity-related variables to enhance the diversity.

In LMEA, the decision variables are divided into two classes based on the disturbance test on each decision variable: convergence- and diversity-related decision variables. Then, both classes of decision variables are optimized to improve convergence and diversity separately within a population. It also employs a fast non-dominated sorting method [25] to further decrease the computation cost.

S3CMAES [21] divides the decision variables into diversity-related and convergency-related groups by a clustering strategy. After that, the convergency-related variables are divided into sub-groups and each sub-group is optimized independently using covariance matrix adaptation evolution strategy (CMA-ES). S3CMAES shows good performance on LSMOPs. However, CMA-ES is repeatedly employed for each sub-group, which makes its cost high.

However, the decision variable clustering methods in above three MOEAs require a large number of function evaluations, which makes them hard to be used for the real-world applications.

Furthermore, problem transformation methods are an alternative way to deal with LSMOPs. In both [26] and [22], the algorithms change the original problem by introducing a set of weight variables in the decision space. The experimental results show its effectiveness on convergence. For example, a solution $${\varvec{X}}=(x_1,x_2,\ldots ,x_D)$$ for the problem $$f({\varvec{X}})$$ can be transformed to sub-problems $$f(\phi (\omega ,{\varvec{X}}))$$, where $$\phi (\omega ,{\varvec{X}})$$ can be defined as below:

\begin{aligned}&\varphi (\omega ,\mathbf{{X}}) \nonumber \\&\quad =(\omega _1 \chi _1 ,\omega _1 \chi _2 ,\ldots \omega _1 \chi _D ,\ldots ,\omega _k \chi _{D - k + 1} ,\omega _k \chi _{D - k + 2} ,\ldots \omega _k \chi _D ). \end{aligned}
(2)

Thus, the problem with D decision variables is transformed into several k-dimensional problems. Thanks to the reduction of the decision space, the searching cost has been significantly reduced. However, such transformation can be viewed as a lossy dimension reduction, which causes stability issues. Specially in LSMOF [22], its transformed objective function is based on the hypervolume (HV) indicator, which is computationally expensive for MaOPs. Therefore, LSMOF cannot solve LSMaOPs.

By combining grouping method and problem transformation method, WOF [23] also shows good performance on LSMOPs, it divides the decision variables into groups with its grouping strategies, and each group is assigned with a weight. Unlike other grouping-based algorithms, it cannot optimize the sub-populations separately, WOF transforms the original LSMOP into weight optimization problems and optimize the transformed problems with a whole population as LSMOF [22].

MFEA [24] is a parallel method to solve multiple problems at the same time. It defines the following properties for each individual $$p_i$$ in the population P:

• Factorial cost: The factorial cost of $$p_i$$ on task problem j is $$T_i^j$$.

• Factorial rank: In the population, each individual is assigned a rank for each task; for example, in terms of task j, $$p_i$$’s rank is $$R_i^j$$.

• Scalar fitness: Each individual is assigned the fitness $$\varphi _i$$ based its best-performing task.

• Skill factor: Each individual is marked its best-performing task by $$\tau _i = \text {argmin}_j{r_i^j}$$ where $$j \in {1,\ldots ,M}$$.

Thus, multiple tasks can be turned into a one-objective problem that all the problems use the $$\varphi$$ of each individual as the selection criterion. The selection of this algorithm is based on the scalar fitness. In other words, the algorithm concerns more about the best individual in each task. The offspring generation is also based on the skill factor. The parents $$P_a$$ and $$P_b$$ are randomly chosen from the population, if their skill factors are different, they will execute crossover operation in a low probability; otherwise, they will execute the crossover or mutation operations. More details of MFEA can be found in Algorithms 1 and 2.

## Proposed algorithm

### Framework

To address the issue of LSMOF [22] on LSMaOPs, we borrow the main idea of KnEA [11] which optimizes the distance form knee points to the hyperplane to replace the HV calculations of LSMOF in the proposed algorithm. To further reduce the required number of function evaluations, we employ MFEA in TSMTF.

The structure of the proposed algorithm contains two main stages, which is highlighted in Algorithm 3. The first stage aims to find out the solutions with good convergence; the second stage focuses on the diversity improvement; then, the population with good convergence and diversity is embedded in NSGA-III [27]. When the computation budget runs out, the proposed algorithm outputs the obtained optimal solutions.

### Stage 1: Bi-directional weight variable associated multi-tasking strategy

This stage aims to find the better individuals by reducing the dimension of the searching space. One individual $${\varvec{A}}$$ in the decision space in Fig. 1 can be expressed as $$\varvec{A_o} = (x_1,\ldots ,x_n)$$. Then, two directions from the extreme points of the decision space can be generated as: $$\varvec{A_u}={\varvec{O}}+\varvec{A_o}$$, $$\varvec{A_d}={\varvec{T}}-\varvec{A_o}$$. Then, we search best solutions in both directions by optimizing the weights $$\lambda$$ in two sub-problems, as shown in Fig. 2.

In this stage, the proposed algorithm chooses 2R1 directions which are from R1 diverse individuals on the current PF. Since HV is very computationally expensive for MaOPs, it cannot be employed as the objective function of those 2R1 sub-problems. In this work, we set the distance to the estimated ideal point as the objective function of those 2R1 sub-problems.

To simultaneously find the optimal $$\lambda$$ of each sub-problem, we employ MFEA as the optimizer. In this method, after the normalization, the directions can be expressed as $${\varvec{D}} = (d_1,d_2,\ldots ,d_{2R1})$$. We use a weight vector $$\varvec{V_0} = (\lambda _1,\lambda _2,\ldots ,\lambda _{2R1})$$ to represent solutions $$P_1,P_2,\ldots ,P_{2R1}$$, by $$\varvec{V_0}\cdot D$$.

We consider each sub-problem as one task: $$P_i$$ is generated by directions $$d_i$$ and $$\lambda _i$$ ($$\varvec{V_0}\cdot D$$) for optimizing the ith task, which can be seen as an instance of the task $$T_i$$. Thus, $$P_1,P_2,\ldots ,P_{2R1}$$ corresponding to $$T_1,T_2,\ldots ,T_{2R1}$$’s instance. For each instance, there is a function to evaluate its performance, as MFEA, which is called factorial cost. The function is the distance of $$P_i$$ to the estimated ideal point. In this case, the weight vectors are taken for the individuals in MFEA need to be optimized. After sorting the obtained factorial costs, MFEA solves 2R1 sub-problems efficiently, as shown in Algorithm 4.

### Stage 2: Diversity improvement with multi-tasking strategy

This stage aims to improve the diversity. In this stage, first, we choose R2 individuals as the reference points; one of them is the shortest distance to the obtained ideal point and the rest are chosen from the most dispersed individuals. As NSGA-III, each solution is associated with its nearest reference point, the most dispersed individuals are those reference points who has the fewest associate points.

Second, we expect the population evolves to those different reference points to improve diversity. In this case, we use the PBI value [5] as the criterion to evaluate the similarity to each reference point in the objective space. To push the population to those reference points, we consider each PBI distance to each direction as a task. For example, Taking R2 reference individuals: $$L(l_1,l_2,\ldots ,l_{R2})$$ as an example, the multi-tasking population P is to minimize $$PBI(P,l_i)$$, $${i \in (1,\ldots ,{R2})}$$ simultaneously; each individual’s factorial cost is the set of PBI values to those R2 reference individuals. Similar to the first stage, we employ MFEA to optimize those R2 sub-problems.

## Experiments and discussion

### Parameter settings

To test the performance of the proposed algorithm, the DTLZ [28] and LSMOP [29] problems are chosen as the test problems, because their numbers of the decision variables and objective functions are scalable. In the experiment, the decision variables are set to 100D, 200D, and 500D, while the numbers of objectives are set to 3, 5, and 10. To prove the effectiveness of the proposed algorithm in this paper, several existing popular algorithms are employed as the compared algorithm: MOEA/D [5], NSGA-II [3], and NSGA-III [27]. In addition, we choose some other algorithms for MaOPs and large-scale MOPs: KnEA [11] and LMEA [20]. For DTLZ problems, in addition to those algorithms mentioned in this paragraph, we also choose S3CMAES and CMOPSO for the comparison. The settings of those algorithms are recommended values as their original papers. The source codes of the compared algorithms are from PlatEMO [30].

For all the experiments, each algorithm runs 30 times independently for each test problem. In this paper, we use IGD [5] as the indicator to assess the performance of compared algorithms, which has been widely used as a performance indicator. In short, the smaller the IGD is, the better the algorithm performs.

The parameters of the proposed framework are set as follows. For DTLZ problems, the population size for the 3-objective and the 5-objective problems are set to 120 and 150, while that for the 10-objective problems is 200, and the corresponding stopping criteria are 30,000, 50,000, and 50,000 function evaluations. In the 200D decision variables’ situation, the population are set to 200, 300, and 300, with 50,000, 60,000,and 80,000 function evaluations for 3-, 5-, and 10-objective problems respectively. And in the 500D decision variable space, the population are set to 300, 500, and 500, with 100,000, 200,000, and 200,000 function evaluations for 3-, 5-, and 10-objective problems, respectively. As for LSMOPs, for 3-objective problems, the population size is set to 200 and the stopping criteria are set to 50000 function evaluations, for 5-objective problems, the population size is 200, and the evaluation times are set to 100,000, and for 10-objective problems, the size of the population is set to 200, the evaluation times are set to 100,000. The stage 1 uses 60% function evaluations, and the direction size is 10. For the stage 2, the number of the reference points is set to 11. About the specific method of crossover of TSMTF, we adopt the simulated binary crossover (SBX) [31], as for the mutation, we adopt the polynomial mutation [32]. The mutation probability stays at 1/D where D is the number of decision variables and the crossover probability keeps 1, both of the distribution index is set as recommended in [31].

### Effects of stage 1

In this subsection, we discuss the effect of the first stage on the proposed algorithm. For the stage 1, we aim to improve the convergence of the population with a limited number of function evaluations. In this stage, we make use of the bi-directional method to search the individuals, which combines with MFEA to improve the efficiency. We compare TSMTF with the version without MFEA in the stage 1 (termed TSMTF1) on the 3-, 5-, and 10-objective DTLZ problems with 100, 200, and 500 decision variables. In TSMTF1, the weights are optimized serially, it optimizes directions one by one. For each direction, it searches for the best parameters $$\lambda$$ instead of weight vector $${\varvec{V}} = (\lambda _1,\lambda _2,\ldots ,\lambda _H)$$.

As shown in Tables 1, 2 and 3, TSMTF behaves better than TSMTF1 in most cases. In general, the framework embedded with MFEA has tiny advantages in terms of the IGD values on DTLZ1 and DTLZ3. For the other problems, their difference is easy to be distinguished. As the dimension of the decision variable increases, the overall advantage of TSMTF can be kept. Similarly, with the number of the decision variables increase and the number of the objectives remains unchanged, the experimental results do not show any changing trends. The main reason of this phenomena is that MFEA in this stage can reduce the number of function evaluations. In other words, the proposed algorithm is efficient.

### Comparative experiments

In this section, we compare the proposed algorithm with MOEA/D, NSGA-II, NSAG-III, KnEA, LMEA, S3CMAES, and CMOPSO on the large-scale DTLZ problems. To make the comparison clear, we make use of the Wilcoxon rank sum test [33], where TSMTF is the control algorithm. The symbols “+”, “−”, and “$$\approx$$” correspond to the results of “better”, “worse”, and “nearly” compared with TSMTF.

As shown in Tables 4, 5 and 6, in terms of the overall results, it indicates that the proposed framework TSMTF is convergent and performs markedly better than the other algorithms on DTLZ1 and DTLZ3. On DTLZ2 and DTLZ4, and TSMTF also shows good convergence, but it does not show good performance compared with KnEA. We speculate that such situation might be caused by the characteristic of DTLZ2 and DTLZ4. Relatively speaking, LMEA has no advantage in this experiment because of the large amount of calculation on decision variables’ classifications, and the number of calculations in this experiment may be too small to show better performance. S3CMAES is in the same situation. Its computation cost is too high to search each sub-population within a limited computation budget, and its clustering strategy also requires a large number of evaluations. The performance of CMOPSO is similar to the other compared algorithms, and it performs better than TSMTF on DTLZ2 and DTLZ4 occasionally. MOEA/D and NSGA-III perform best among all the algorithms, while the other algorithms also have a small IGD value, which may be own to their advantages on diversity. In NSGA-III and MOEA/D, the diversity of the population is always kept well with uniform weight vectors. In terms of the DTLZ6, the results show that none of the algorithms performs well within a limited number of evaluations. From the above analysis and the results in Table 4, we can draw the conclusion that the TSMTF has a significant advantage in DTLZ1 and DTLZ3 in MaOPs.

As we can see it at a glance, for all the algorithms, DTLZ2 and DTLZ4 are easier than DTLZ1 and DTLZ3. In DTLZ2 and DTLZ4 problems with 100 decision variables, S3CMAES can converge and its performance is better than half of the compared algorithms, while it cannot converge on DTLZ1 and DTLZ3. With the number of decision variables increases, as the experimental results shown, S3CMAES degenerates its performance due to the limitation of function evaluations, because its grouping strategy and searching in sub-populations require a large number of function evaluations. As for CMOPSO, it shows little advantages in this experiments on MaOPs due to its particles’ learning strategy. Similar to the other compared algorithms, it performs the best on DTLZ2 and DTLZ4. Since DTLZ1 and DTLZ3 is so complex that CMOPSO is unable to converge. For LMEA, as shown in Tables 4, 5 and 6, it cannot perform so well, while the other algorithms reached their convergency state. It seems to need extra function evaluations and it behaves worse as the number of decision variables increases. NSGA-II has the same issue. Within a limited number of evaluations, compared with the other algorithms, NSGA-II cannot converge for most MaOPs, but it shows better performance occasionally, while the TSMTF can converge. As for NSGA-III, in DTLZ2, compared with TSMTF, it performs better occasionally, in DTLZ4 with 100D and 200D decision variables, it nearly always performs better than TSMTF, except DTLZ4. Their difference is getting large with the number of decision variables increases. The performance of MOEA/D is similar to NSGA-III, both of them performs better than TSMTF occasionally on DTLZ2 with 200 decision variables. TSMTF shows advantages in DTLZ4 with 100 decision variables, with the number of decision variables increase. As for KnEA, its IGD results are better than TSMTF in terms of several experimental results on DTLZ2 and DTLZ4. TSMTF shows the best performance on DTLZ1 and DTLZ3 in each experiment. On DTLZ6, all algorithms show similar performance.

From the above results on DTLZ problems in Tables 4, 5 and 6, KnEA, LMEA, and TSMTF are good at solving LSMaOPs. Therefore, we further compare them on large-scale benchmark problems LSMOP [34] with different numbers of decision variables and objective functions. The IGD results are shown in Tables 7, 8 and 9

In Tables 7, 8 and 9, in general, in most cases, the proposed framework performs better than the other two algorithms, especially on 10-objective problems in Table 9. TSMTF shows disadvantage in some situations. Three-objective LSMOP4 is a easy-converging problem, and all those algorithms converge to the true PF and TSMTF cannot outperform the other two algorithms. However, for the other 3-objective LSMOP problems, neither KnEA and LMEA cannot converge, while TSMTF has better convergence ability, which make TSMTF a best-performing algorithm.

KnEA performs the best on 3-objective LSMOP problems. The results on LSMOP1, LSMOP2, and LSMOP4 show that KnEA converges in almost all cases. As the number of objectives increases, Tables 7, 8 and 9 show that the advantage of KnEA decreases, but it still performs the best on 10-objective LSMOP4 compared with the other two algorithms. As for LMEA, the results shows that it can converge on LSMOP2 and LSMOP4. LSMOP6 is too hard for LMEA. LMEA is able to obtain the solutions near the true PF in the other four LSMOPs. As the number of decision variables increases, the performance of LMEA is getting worse. The performance of TSMTF tells that it converges to the true PF in almost all cases. When TSMTF is not better than the other two algorithms, the results shows that their performance difference is tiny. While TSMTF performs the best, the results shows that the other two algorithms are far from the true PF.

The results in Tables 1, 2, 3, 4, 5, 6, 7, 8 and 9 show that TSMTF performs the best among all the algorithms. The main reason of the good performance is from the contribution of stage 1. In this stage, we transformed the high-dimensional search space into a low-dimensional space by a number of weight vectors, and then, the population in TSMTF evolves by optimizing those weight vectors. In this way, with the decreasing number of decision variables, the searching cost (number of function evaluations) has been greatly reduced. To further improve the diversity, TSMTF specially increase the diversity of a small number of the individuals with good convergence. Stage 2 can increase the diversity and convergence simultaneously. After stage 2, we employ NSGA-III to make the population more evenly distributed. LMEA costs too many evaluations on decision variable grouping and S3CMAES costs much on the searching in sub-populations, so their performance degenerates when the computation budget is limited. Since KnEA, NSGA-III, CMOPSO, and MOEA/D are easily get trapped in a high-dimensional decision space without any dimension reduction, it cannot find the true PS for most large-scale MaOPs.

## Conclusions and remarks

In this work, we propose a two-stage framework combined with MFEA to address large-scale MaOPs. In stage 1, we make use of the bi-directional search strategy combined with MFEA. In other words, we transformed the MaOP into a multi-tasking optimization problem. After that, it turns to stage 2, which aims to improve the diversity of the population. We apply multi-tasking to a number of sub-problems from the multi-objective optimization problem. At last, the population is optimized as NSGA-III. As the experiments shows, with limited number of function evaluations, the proposed algorithm outperforms compared algorithms on MaOPs, especially on complex problems such as DTLZ1 and DTLZ3. The results verify the effectiveness of this framework.

In general, the TSMTF shows well performance and can still be improved in both stages, and there is still much room for improvement. For example, the transformed objective function in stage 1, we adopt the bi-directional strategy, and we need to think about other effective methods. In addition, the diversity in stage 2 still needs to be improved.