Abstract
With the rising number of largescale multiobjective optimization problems from academia and industries, some evolutionary algorithms (EAs) with different decision variable handling strategies have been proposed in recent years. They mainly emphasize the balance between convergence enhancement and diversity maintenance for multiobjective optimization but ignore the local search tailored for largescale optimization. Consequently, most existing EAs can hardly obtain the global or local optima. To address this issue, we propose an efficient samplingbased offspring generation method for largescale multiobjective optimization, where convergence enhancement and diversity maintenance, together with ad hoc local search, are considered. First, the decision variables are dynamically classified into two types for solving largescale decision space in a divideandconquer manner. Then, a convergencerelated sampling strategy is designed to handle those decision variables related to convergence enhancement. Two additional sampling strategies are proposed for diversity maintenance and local search, respectively. Experimental results on problems with up to 5000 decision variables have indicated the effectiveness of the algorithm in largescale multiobjective optimization.
Introduction
Multiobjective optimization problems (MOPs) are commonly seen in realworld applications, and even many conventional singleobjective optimization problems can also be transferred/optimized into/as MOPs [1]. Specifically, multiple, often conflicting, objectives exist in an MOP, and the problem can be mathematically formulated as
Notably, m and d are the numbers of objectives and decision variables, respectively. The optima of MOPs is a set of nondominated solutions trading off between different objectives [2]. To be more specific, all the optima in the decision space form the Pareto optimal set (PS), and the projection of the PS in the objective space is the Pareto optimal front (PF). Thanks to the populationbased property of evolutionary algorithms (EAs), many researchers have turned to evolutionary multiobjective optimization for obtaining a solution set in a single run [3].
Multiobjective EAs (MOEAs) for solving MOPs with different properties have been designed in recent decades. Most MOEAs have emphasized simplicity since the mid1980s [4], such as nondominated sorting based genetic algorithms (multiobjective GA (MOGA [5]) and NSGA [6]). These MOEAs effectively solved simple MOPs with several decision variables (usually less than ten) and two–three objectives. Two decades ago, the strength Pareto EA (SPEA [7]), along with the publication of the elitist NSGA (NSGAII [8]), the decompositionbased MOEAs (MOEA/D [9]) and the indicatorbased EAs (IBEA [10]), emphasized on efficiency. These MOEAs have shown promising performance in obtaining evenly distributed solution sets on problems with less than 100 decision variables and two–three objectives [11].
As the increase in the number of objectives, MOPs with more than three objectives (known as manyobjective optimization problems or MaOPs) cause the loss of selection pressure for conventional MOEAs [12]. Thus, some environmental selection strategies were designed to distinguish the qualities of candidate solutions in the highdimensional objective space [13]. For instances, some manyobjective EAs have been proposed by modifying the dominance relationship [14], introducing additional reference information [15], or using effective performance indicators [16], e.g., knee pointbased EA (KnEA [17]), reference vector guided EA (RVEA [18]), and fast hypervolumebased EA (HypE [19]). MOEAs above have shown promising performance in solving problems with up to 15 objectives and less than 100 decision variables [20].
In recent years, MOPs with more than 100 decision variables (known as largescale MOP or LSMOP) [21] appeared more frequently in complex industries, e.g., community detection in complex social networks [22] and ratio error estimation of voltage transformer (TREE) [23]. Due to the rapid growth in the number of decision variables, the curse of dimensionality [24] brings more challenges for MOEAs to solve LSMOPs than to solve MaOPs [25]. Some MOEAs tailored for largescale multiobjective optimization are proposed to handle this challenge. Generally, they can be roughly classified into four categories [26]: (1) the cooperative coevolutionary (CC)based MOEAs group the decision variable without specific analysis and solve LSMOPs in a divideandconquer manner [27]; (2) the decision variable analysisbased MOEAs group the decision variables via linking the decision variables to convergence or diversity properties [28, 29], e.g., decision variable analysisbased MOEA (MOEA/DVA [30, 31]), EA with clusterbased grouping (LMEA [32]), and subpopulationsbased approach (S\(^3\)CMAES [33]); (3) the problem reformulationbased approaches form the third category, including the weighted optimizationbased framework (WOF [34]) and the largescale multiobjective optimization framework (LSMOF [35]); and (4) MOEAs of the last category use efficient offspring generation strategies for largescale multiobjective optimization, e.g., competitive swarm optimizerbased EA (LMOCSO [36]) and adaptive directionguided EA (DGEA [37, 38]).
The aforementioned largescale MOEAs have shown promising performance in solving LSMOPs, but they can be further improved by addressing their inherent drawbacks. For instance, the CCbased approaches may incorrectly group the decision variables; the decision variable analysisbased approaches could waste a huge number of function evaluations (FEs); those problem reformulation based approaches could ignore the diversity maintenance to some extent; the generation of candidate offspring solutions could ignore the local search in those efficient offspring generationbased algorithms. Thus, we propose a samplingbased MOEA by taking advantage of the ideas of decision variable analysis, problem reformulation, and efficient offspring generation. The main contributions of this study are summarized as follows:

1.
An efficient variable classification technique is proposed to guide the search in the decision space. Compared with existing exact decision variable analysis methods, the proposed classification approach is applicable to problems with different properties using significantly low computational costs.

2.
Local search is emphasized in the proposed SLSEA, which aims to enhance the capability of MOEAs in searching local/global optima, enabling the modification of conventional MOEAs for efficient largescale multiobjective optimization.
The rest of this paper is organized as follows. The details of the proposed sampling based largescale EA, termed SLSEA, are presented in section “Method”. Next, the settings of different algorithms, tested problems, and the adopted performance indicators are given in “Experimental settings”. Empirical results achieved by SLSEA in comparison with some stateoftheart largescale MOEAs are presented in “Comparative studies”, and conclusions are drawn in “Conclusion”.
Method
In this part, we first demonstrate the main idea of this study, followed by the principles of the proposed variable division approach. Then, the sampling strategies guided by the variable division are elaborated.
Main idea
Inspired by the decision variable analysisbased divideandconquer strategy and the efficient offspring generation approaches, we propose an optimizationbased variable classification approach to handle LSMOPs. Instead of categorizing decision variables into many groups, all the decision variables are divided into two groups in the proposed method. Generally, the first group consists of decision variables related to convergence and the second one related to diversity. All the convergencerelated/diversityrelated decision variables are optimized independently without considering the pairwise variable interactions. Accordingly, three different sampling strategies, following the idea of efficient offspring generation, for convergence enhancement, diversity maintenance, and local search are designed.^{Footnote 1} An illustrative example of the proposed offspring generation strategies on an MOP with five decision variables is presented in Fig. 1, which visually shows the principles of three involved sampling strategies in SLSEA. Assume binary vector \(\textbf{b}=(0,1,0,1,1)\) is a vector for indicating the division result of an MOP with five decision variables. The vector indicates that decision variables \(x_2,x_4,x_5\) are convergencerelated decision variables, while \(x_1,x_3\) are diversityrelated ones.

1.
In Fig. 1a, the convergencerelated sampling perturbs variables \(x_2\), \(x_4\), and \(x_5\) using a norm Gaussian distribution \(\mathcal {N}(0,1)\).

2.
The diversityrelated sampling perturbs the rest variables (i.e., \(x_1\) and \(x_3\), which may be related to diversity) and uses an uniform distribution \(\mathcal {U}(0,1)\), as displayed in Fig. 1b. An uniform distribution is used for sampling the diversityrelated decision variable(s) is expected to enhance the exploration capability of the population.

3.
As for the local searchrelated sampling, we use a Gaussian distribution to restrict the sampling density around the current population, aiming to encourage the exploitation of the population, as shown in Fig. 1c.
With these ad hoc sampling strategies, we are motivated to solve LSMOPs via efficient sampling under the guidance of variable classification. Note that the norm Gaussian distribution is used in two designed sampling strategies due to its popularity with symmetry, coverage, and a predefined center. The symmetry ensures unbiased sampling as we have no prior knowledge of the problem, and the coverage enables the sampling of all possible solutions. Moreover, the predefined center controls the sampling, which is expected to guide the sampling towards the Pareto optimal set. Other distributions with similar properties can also be used for perturbing the three sampling strategies. If the prior knowledge of the problem is given, the multivariate Gaussian distribution can be used for addressing this issue, which could be computationally expensive.
Schema of SLSEA
Algorithm 1 presents the schema of the proposed SLSEA. It mainly consists of five components, i.e., convergence sampling, variable classification, diversity sampling, local sampling, and environmental selection. First, a candidate solution set P of size n and a binary vector set B of size \(n_b\) are randomly initialized (Steps 1–2 in Algorithm 1). Next, B is used to sample a set of convergencerelated solutions \(P_c\) (Steps 5\(\sim \)6 in Algorithm 1). Notably, B is updated using the proposed variable classification approach, during which the newly generated convergencerelated solutions are merged into population \(P_c\). Then, the diversityrelated sampling strategy (Step 7 in Algorithm 1) is adopted to sample a set of diversityrelated candidate solutions \(P_d\) with the guidance of the updated B. Afterwards, a local searchrelated sampling is designed to sample local solutions \(P_l\) around the current population P. Finally, the combination of all the sampled candidate solutions is updated using environmental selection. Notably, different environmental selection strategies in conventional MOEAs can be used in SLSEA, and here, we use the environmental selection in NSGAII due to its efficiency and simplicity. The procedures above are repeated until the maximum number of iterations is met.
Convergencerelated sampling
The detailed procedures of the convergencerelated sampling are given in Algorithm 2 In B, the ith binary vector \(\textbf{b}_i\) denotes an approximation of the accurate variable classifications. It is used to sample some convergencerelated solutions via perturbing those convergencerelated variables only. A total number of \(n_s\times n_b\) candidate solutions are sampled during the sampling, where \(n_s\) candidate solutions are sampled for each binary vector.
In addition to the convergencerelated sampling in the main loop of SLSEA, this sampling is also conducted for updating the binary vectors (Step 3 in Algorithm 3). Thus, a total number of \(2n_s\times n_b\) convergencerelated solutions are sampled in one generation of SLSEA.
Variable classification
In the proposed SLSEA, variable classification is used to update the binary vectors for better approximating the accurate variable classification, refer to Algorithm 3. An illustrative example of an MOP with five decision variables is given in Fig. 2 to show the detailed procedures. During the variable classification, the binary vector set B of size \(n_b\) is regarded as the parent population, and the quality of each binary vector is assessed based on the quality of the sampled solutions using Algorithm 4. The mating selection strategy in NSGAII is applied to select the mating pool for generating offspring \(B'\). Notably, the singlepoint crossover and bitwise mutation in the canonical GA are used for the generation of offspring binary vectors, and readers can refer to literature [8, 39] for more details. Next, a set of convergencerelated solutions are sampled for each offspring binary vector using Algorithm 2, followed by the quality assessment of each offspring binary vector. Finally, \(n_d\) binary vectors are selected from \(B\cup B'\) using the environmental selection strategy in NSGAII. Besides, the convergencerelated solution set is also updated.
As a crucial part of the variable classification, the quality assessment determines the accuracy of variable classification results for guiding the sampling of convergence and diversityrelated solutions. Its detailed procedures are displayed in Algorithm 4 using two criteria:

(1)
Instead of using the average Euclidean distance to measure the convergence degree of a candidate solution set, the gridbased distance given in (2) is used. It could eliminate the effect of neighbor points [40] to tolerate the inaccuracy in quality assessment.
$$\begin{aligned}{} & {} d_i= \lfloor N_d*\frac{\left\Vert {\textbf{f}(\textbf{x}_i)  \textbf{l}}\right\Vert \left\Vert {\textbf{l}}\right\Vert }{\left\Vert {\textbf{u}}\right\Vert \left\Vert {\textbf{l}}\right\Vert }\rfloor ,\end{aligned}$$(2)$$\begin{aligned}{} & {} q_1 = \sum _{i=1}^{N_s}{\text {sign}(\min {(\textbf{f}(\textbf{x}_i)  \textbf{l}))\times d_i}}, \end{aligned}$$(3)where \(\textbf{l}\) and \(\textbf{u}\) are the lower and upper boundaries of the objective vectors, sign(\(*\)) is a sign function, \(\lfloor *\rfloor \) is a floor function, \(\left\Vert {*}\right\Vert \) obtains the length of a vector, and \(N_d\) is the number of solutions in the current solution set. If the ideal point dominates any solution, the gridbased distance will be positive; otherwise, its gridbased distance will be negative. Thus, this solution will reduce the average distance of the entire solution set to encourage the generation of convergencerelated binary vectors.

(2)
The use of \(q_2\) is inspired by the decision variable analysis method in LMEA [32], and a smaller \(q_2\) value is likely to indicate the higher correction to diversity, which can be calculated using the nondominated sorting method [2].
The design of \(q_1\) and \(q_2\) is intuitively designed, aiming to demonstrate the effectiveness of the reformulation process in a relatively straightforward manner. The first objective reflects the average convergence degree of the solution set, which is a simplified version of the convergence degree in LMEA. The second objective is expected to minimize the number of convergencerelated decision variables, aiming to reduce the dimensionality of the subproblem for accelerating largescale multiobjective optimization. Nonetheless, the framework is compatible with other tailored functions of similar purpose.
Diversityrelated sampling
During the variable classification, a set of binary vectors are optimized simultaneously. We take advantage of the binary vectors to sample some candidate solutions for diversity maintenance, as presented in Algorithm 5. We first obtain the bestconverged solution \(\textbf{p}\) \(\in \) \(P\) as the baseline to ensure the convergence of the candidate solutions to be sampled (Step 1 in Algorithm 5). f(p) denotes the Euclidean distance of solution \(\textbf{p}\) in the objective space, which reflects the convergence degree of solution \(\textbf{p}\). By selecting the solution with the best convergence degree, Step 1 of Algorithm 5 is expected to guide the sampling of diversityrelated solutions with good convergence. Then, we sample n candidate solutions for each binary vector by perturbing those diversityrelated decision variables (i.e., elements with zero value in a binary vector). Specifically, a uniform distribution \(\mathcal {U}(0,1)\) is used to generate a random number k for perturbing the decision vector (Step 3 in Algorithm 5), encouraging the sampling of diversityrelated candidate solutions. Finally, all the sampled candidate solutions are merged as population \(P_d\).
Local searchrelated sampling
While the variable classification and diversityrelated sampling aim to deal with convergence enhancement and diversity maintenance respectively, the local searchrelated sampling is expected to conduct local search around existing solutions. The detailed procedures of the local searchrelated sampling are given in Algorithm 6. Specifically, we conduct a local search around \(n_s\) solutions randomly selected from population P. Notably, this selection can also choose some dominated solutions to encourage diversity maintenance since they could be important in global optimization [40, 41]. Then, a value randomly selected from a norm Gaussian distribution is used to perturb each chosen solution. Mathematically, for a candidate solution \(\textbf{x}\), the ith variable of the perturbed solutions satisfies Gaussian distribution \(\mathcal {N}(x_i,1)\) (refer to Fig. 1c). Finally, a total number of \(n_s\times n\) candidate solutions are merged as population \(P_l\) during the local searchrelated sampling.
Experimental settings
We first briefly introduce the adopted performance indicators, followed by the details of the involved parameter settings. Each algorithm is run 20 times on each test problem independently for the Wilcoxon ranksum test [42] at a significance level of 0.05. We use symbols ‘−’, ‘\(+\)’, and ‘\(\approx \)’ to indicate SLSEA is statistically surpassed by, significantly worse than, and statistically tied with the compared algorithm. All the compared algorithms are implemented in PlatEMO v2.6 [43] on a PC with two 2.1 GHz Intel(R) Xeon(R) Gold 6130 CPUs (32 CPU Cores) and 128 GB RAM on the Windows 10 64bit operation system.
Performance indicator
To assess the performance of the compared algorithms on largescale multiobjective optimization, two widely used indicators, namely the IGD indicator [44] and the hypervolume (HV) indicator [45], are used. Both indicators can assess the convergence degree and distribution uniformity of a solution set in multiobjective optimization. Nevertheless, these two performance indicators are different due to the different requirements in reference points.
In the IGD indicator, a set of uniformly distributed points located precisely on the PF are required to give an accurate assessment of the quality of the solution set. By contrast, only a single point close to the nadir point of the PF is needed in HV calculation [46]. Generally, IGD calculation requires prior knowledge about the PF, which is suitable for benchmark problems with known PFs. The IGD\(+\) indicator can also be used for better assessment of the quality, which is weakly Pareto compliant, whereas IGD is Pareto incompliant [47]. Thus, it is suitable for mathematically formulated problems, e.g., DTLZ problems [48], WFG problems [49], and LSMOP problems [50]. However, for a realworld MOP such as a TREE problem [23], having a set of uniformly distributed points on its PF is impractical. Thus, we can adapt the HV indicator to assess the quality of a solution set using an approximation of the real nadir point.
Test problems
In the following experiments, some test instances are selected from three benchmark test suites, i.e., DTLZ [48], LSMOP [50], and TREE [23]. In all these test instances, the number of objectives m is set to two, and the detailed settings are given as follows.

(1)
Three representative DTLZ problems with fully separable variable interactions are tested for ablation studies, i.e., DTLZ1, DTLZ4, and DTLZ7 [48]. DTLZ1 is a multimodal MOP, where conventional MOEAs can hardly obtain converged solutions. DTLZ4 may hinder MOEAs from achieving a welldistributed set of solutions, and DTLZ7 will test the ability of an algorithm to maintain subpopulation in different Pareto optimal regions. Besides, the number of decision variables is set to 5000, and the maximum number of FEs is set to 1\(\times 10^6\).

(2)
Nine LSMOP problems are tested in this study, involving fully separable (LSMOP1, LSMOP5, and LSMOP9), partially separable (LSMOP2 and LSMOP6), or mixed (LSMOP3, LSMOP4, LSMOP7, and LSMOP8) variable interactions [50]. Generally, those LSMOP problems are tailored for examining the performance of EAs in largescale multiobjective optimization, and we set the maximum number of FEs to 200\(\times \) \(d\) for test instances with d decision variables (d ranges in \(\{1000,2000, 5000\}\)).
Parameter settings
We adopt the recommended parameter settings for the compared algorithms that have achieved the best performance reported in the literature for fair comparisons. Moreover, to give an insight into the performance of SLSEA in largescale multiobjective optimization, different variants of SLSEAs are designed as well.
Variants of SLSEA
Several ablation studies are conducted to investigate the effect of different sampling strategies in SLSEA on its performance in largescale multiobjective optimization. Five variants of SLSEA using different combinations of sampling strategies are tested on three DTLZ problems [48], three LSMOP problems [50], and two TREE problems [23]. The first variant without diversity sampling and local searchrelated sampling is called SLSEA\(\setminus \)DL, the second one with only local searchrelated sampling is called SLSEA\(\setminus \)CD, the third one without diversity sampling is called SLSEA\(\setminus \)D, the forth one without local searchrelated sampling is called SLSEA\(\setminus \)L, and the last one with all the functions is named SLSEA. In these five variants, the interaction vector set size \(n_b\) is set to 10 and the sampling size \(n_s\) is set to five according to our empirical results.
SOTA largescale MOEAs
The proposed SLSEA is compared with four representative largescale MOEAs, including LSMOF [35], MOEA/DVA [30], DGEA [37], and LMOCSO [36]. In LSMOF, NSGAII is embedded as the optimizer for the second stage using 50\(\%\) of FEs for fair comparisons, and the population size n is set to 100 for all the compared algorithms. To be more specific, the number of reference solutions r is set to ten. The population size \(n_s\) and the scale factor \(F_m\) of the singleobjective differential evolution algorithm are set to 30 and 0.8, respectively. For MOEA/DVA, the number of sampling solutions in control variable analysis NIC is set to 20, the maximum number of tries required to judge the interaction NIA is set to six, and the MOEA/D based on differential evolution (MOEA/DDE) [51] is used for uniformity optimization. The number of reference solutions r is set to ten in DGEA, and there is no specific parameter in LMOCSO. Regarding SLSEA, the population size for variable classification \(n_b\) is set to ten and the sampling size \(n_s\) is set to five.
Ablation studies
In our proposed SLSEA, two additional sampling strategies, i.e., diversityrelated sampling and local searchrelated sampling, are involved. It is essential to investigate the contribution of each strategy to the performance of SLSEA. We have conducted a series of ablation studies by comparing the performance of different variants of SLSEA in solving LSMOPs. Empirically, we compare the original algorithm (termed SLSEA) with its four variants (refer to “Experimental settings”.C) on eight test instances selected from three benchmark test suites, including three DTLZ problems from conventional artificial MOPs, three LSMOP problems from artificial LSMOPs, and two TREE problems extracted from realworld applications.
The convergence profiles of the five compared algorithms on DTLZ1, DTLZ4, and DTLZ7 with 5000 decision variables are displayed in Fig. 3 in terms of the IGD values. To be more specific, the mean IGD values are noted by different markers, and the corresponding lower and upper boundaries are given by the color bars. SLSEA\(\setminus \)CD has achieved the smallest mean IGD results on the DTLZ1 problem; SLSEA\(\setminus \)D has achieved the best on DTLZ4 problem; SLSEA\(\setminus \)DL performed the best on DTLZ7 problems. All in all, SLSEA is capable of achieving the average performance of these three test problems.
The convergence profiles on three representative LSMOP problems with 5000 decision variables are given in Fig. 4 in terms of the IGD values. Unlike the comparison results on DTLZ problems, SLSEA\(\setminus \)CD has performed the best on all three cases, followed by SLSEA\(\setminus \)DL; SLSEA\(\setminus \)D, SLSEA\(\setminus \)L, and SLSEA performed similarly. For LSMOP problems, the local searchbased sampling is capable of handling the LSMOPs with complex variable interactions.
Since TREE problems are constructed from realworld applications whose true PFs are unknown, we use HV values to show the convergence profiles on TREE1 with 15,000 decision variables and TREE4 with 30,000 decision variables (as shown in Fig. 5). Different from the results on DTLZ and LSMOP problems, SLSEA has performed the best while SLSEA\(\setminus \)DL and SLSEA\(\setminus \)L performed the worst.
In summary, the local searchrelated sampling has contributed significantly to the performance of SLSEA. Due to the specific property of each tested problem, different combinations of sampling strategies have led to different performances. For instance, in the three LSMOP problems, SLSEA without diversity and local search sampling strategies performed the second best, indicating that LSMOPs are not difficult for diversity maintenance. DTLZ1 is a multimodal MOP, where diversity maintenance is essential, and thus SLSEA\(\setminus \)DL has performed the worst due to the loss of diversity maintenance.
Different variants of SLSEAs have performed differently on artificial test instances, where local searchrelated sampling has contributed the most to their performance. Nevertheless, SLSEA has performed the best on two realworld test instances. It can be summarized that local searchrelated sampling is effective in largescale optimization. Besides, existing artificial benchmark problems may be too regular, involving biased preferences and missing realworld properties. Thus, we still use the original version of SLSEA in the following experiments due to its comprehensive consideration of problem properties and robustness.
Comparative studies
The general performance of SLSEA is compared to four SOTA largescale MOEAs on LSMOP and TREE test suites. The computational complexity of the proposed SLSEA and the computation time for the compared algorithms are also presented to indicate its efficiency.
Three different experimental results indicate the overall performance, dynamic behavior, and final results of all the compared algorithms, respectively.
Performance on LSMOP problems
The overall performance is presented in Tables 1 and 2. Specifically, these tables show IGD results and HV results achieved by the five compared MOEAs on nine LSMOP problems with 1000, 2000, and 5000 decision variables, respectively. It can be observed from the first table that SLSEA has achieved the most best results, followed by DGEA, LMOCSO, and LSMOF, while MOEA/DVA has failed to achieve any best result. Notably, MOEA/DVA has consumed four times the maximum number of FEs for decision variable analysis. Thus, it has no additional resources for further optimization, which could explain its unsatisfactory performance. To be more specific, SLSEA has achieved the best results mainly on LSMOP4, LSMOP5, LSMOP6, and LSMOP8; DGEA has performed the best on LSMOP1 and LSMOP2; LSMOF has superior performance mainly on LSMOP7; LMOCSO has shown its advantages on LSMOP3 and LSMOP9. For LSMOP3, LSMOP5, LSMOP7, and LSMOP8, the differences of the achieved IGD results are significantly (e.g., over 100 times on LSMOP3 and LSMOP7) since these problems are challenging for largescale MOEAs to converge to the PFs. For the rest test problems, the compared algorithms have achieved competitive results, since the difficulty of these problems is about the diversity maintenance to obtain evenly distributed results. As for Table 2, LSMOF has achieved the most best results, which can be attributed to the fact that LSMOF has achieved widespread solution sets in the objective space, leading to better HV values in comparison with SLSEA (refer to Fig. 7).
To visually show the dynamic behaviors of the compared algorithms on largescale multiobjective optimization, Fig. 6 presents the convergence profiles of the compared algorithms in terms of IGD values. The convergence profiles of LSMOF and SLSEA are similar on LSMOP1, LSMOP3, LSMOP5, LSMOP6, LSMOP7, and LSMOP8; DGEA converges similarly to SLSEA on LSMOP1, LSMOP2, LSMOP4, and LSMOP6; while SLSEA has shown overall superiority over LMOCSO and MOEA/DVA on all the test instances. It can be concluded that the search behavior of SLSEA is similar to DGEA and LSMOF since they all have effective strategies for dealing with a large number of decision variables. Nevertheless, due to the balanced consideration of convergence enhancement, diversity maintenance, and local search, SLSEA has shown better versatility in solving different LSMOPs. In contrast, LSMOF and DGEA have achieved biased performances.
Moreover, the final nondominated solutions achieved by the five compared algorithms on nine LSMOP problems in the run associated with the best IGD value are displayed in Fig. 7. It can be seen that SLSEA has behaved similarly to DGEA on LSMOP1, LSMOP2, LSMOP6, and LSMOP8, and it has behaved similarly to LSMOF on LSMOP2, LSMOP6, LSMOP 7, and LSMOP 8, which more or less agrees with the convergence profiles in Fig. 6. Both LSMOF and DGEA have used direction vectors in the decision space, while SLSEA has used the guidance of variable classifications. Thus they have shown effectiveness in largescale multiobjective optimization. Nevertheless, due to the differences in offspring generation, SLSEA trends take advantage of LSMOF and DGEA to achieve a balanced performance, leading to its superiority over LSMOF and DGEA.
In the Supplementary Materials, we also give the convergence profiles of the compared algorithms on five representative LSMOP problems with a maximum number of \(5\times 10^6\) (25\(\times \) of the original settings). The results indicate that SLSEA tends to converge fast at the first \(1\times 10^6\) and stagnate around \(4\times 10^6\). Thus, SLSEA is more suitable for solving LSMOPs with limited FEs.
Complexity analysis
As indicated in Algorithm 1, the computational complexity of SLSEA is mainly contributed by four functions, i.e., variable classification, diversityrelated sampling, local searchrelated sampling, and environmental selection. For the first component, the computational complexity in terms of big O notation, as indicated in Algorithm 3, is \(O(2nm)+2\times O(n_b\times (n_s+mn_s+mn^2+mn_b^2)+n_b^2+2\times (2n_b)^2)\), i.e., \(O(mn+mn^2n_b+mn_b^3+n_bn_s+mn_sn_b+n_b^2)\) by eliminating the constant factors, which can be further simplified as \(O(mn^2n_b+mn_b^3+mn_bn_s+n_b^2)\). Algorithms 5 and 6 are similar, and their total computational complexity is \(O(mn+nlgn+n_b(n+1))\) \(+\) \(O(n_s+n_s(n+1))\). As for the last function, the computational complexity of the environmental selection in NSGAII is \(O(m(n+nn_b+nn_s)^2)\) [8]. In summary, the computational complexity of SLSEA is \(O(mn^2n_b+mn_b^3+mn_bn_s+n_b^2)\) \(+\) \(O(mn+nlgn+n_b(n+1))\) \(+\) \(O(n_s+n_s(n+1))\) \(+\) \(O(m(n+nn_b+nn_s)^2)\). It can be simplified to \(O(m(n^2n_b+n_b^3+(n+nn_b+nn_s)^2)+n(lgn+n_b+n_s)+n_b^2)\), i.e., \(O(mn^2(n_b+(1+n_b+n_s)^2)+mn_b^3)\). The main computational complexity comes from the nondominated sorting based environmental selection. It can be improved by using some divideandconquer strategies, e.g., selecting the nondominated solutions from each subpopulation instead of selecting solutions from the combination of all solutions. Besides, if \(n_b\) and \(n_s\) are treated as constants, the final computational complexity of SLSEA will be \(O(mn^2)\) as that of NSGAII. Thus, the efficiency of SLSEA in terms of computational complexity is acceptable.
Table 3 further displays the computation time of the five compared algorithms on two LSMOP problems and two TREE problems. It can be observed that SLSEA has consumed acceptable computation time for largescale multiobjective optimization, which validates its efficiency in terms of computation time.
Conclusion
In this work, we have proposed to use efficient sampling strategies for largescale multiobjective optimization. Three different sampling strategies are designed for convergence enhancement, diversity maintenance, and local search, respectively. Generally, the sampling strategies are economically cheap, robust, and versatile for scaling up evolutionary multiobjective optimization mainly for three reasons. First, the sampling operation is efficient compared to generic crossover and mutation operations. Even when the number of decision variables increases rapidly, the cost for generating promising candidate solutions increases linearly. The computation time comparison has validated its efficiency on LSMOPs with a large number of decision variables. Second, two issues have ensured the robustness of SLSEA in scaling up largescale optimization. On the one hand, the comprehensive consideration of convergence enhancement, diversity maintenance, and local search enables SLSEA to have diverse behaviors of offspring generation. On the other hand, using probability distributions enhances the randomness of the generated offspring solutions for better global optimization. Third, due to the use of different probability distributions for different purposes, the behaviors of SLSEA can be easily adjusted for handling different types of LSMOPs. In other words, if an expert has prior knowledge about the landscape of an LSMOP, a specific sampling strategy can be preferred to achieve tailored behaviors for solving specific LSMOPs. In summary, efficient sampling strategies are promising for largescale multiobjective optimization, and more effective sample strategies are highly desirable.
In addition to the two convergence and diversityrelated sampling strategies, we also emphasize the role of local search in this work. Ablation studies using different variants of SLSEA have indicated the importance of local search, which is somehow ignored in most existing largescale MOEAs. Notably, the variant of SLSEA with local searchrelated sampling has behaved similarly to LSMOF. These two algorithms can converge to a certain degree at the early stage of evolution (refer to [35]). Though the local searchrelated sampling has contributed significantly to the performance of SLSEA, the combination of the three sampling strategies is expected to perform robustly for largescale multiobjective optimization. All in all, the experimental results on various LSMOPs have validated the effectiveness and efficiency of SLSEA.
Notes
The lower and upper boundaries of each decision variable are normalized to 0 and 1, respectively. If any sampled decision value exceeds the lower/upper boundary, it will be truncated within the boundaries.
References
Tian Y, Yang S, Zhang X (2020) An evolutionary multiobjective optimization based fuzzy method for overlapping community detection. IEEE Trans Fuzzy Syst 28(11):2841–2855
Zhang X, Tian Y, Cheng R, Jin Y (2015) An efficient approach to nondominated sorting for evolutionary multiobjective optimization. IEEE Trans Evol Comput 19(2):201–213
Coello CAC (2006) Evolutionary multiobjective optimization: a historical view of the field. IEEE Comput Intell Mag 1(1):28–36
Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. Lawrence Erlbaum Associates. Inc., Publishers
Fonseca CM, Fleming PJ et al (1993) Genetic algorithms for multiobjective optimization: formulation discussion and generalization. 5th International Conference on Genetic Algorithms. Citeseer, San Francisco, CA, pp 416–423
Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3:257–271
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGAII. IEEE Trans Evol Comput 6(2):182–197
Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11:712–731
Zitzler E, Künzli S (2004) Indicatorbased selection in multiobjective search. In: Yao X, Burke EK, Lozano JA, Smith J, MereloGuervós JJ, Bullinaria JA, Rowe JE, Tiňo P, Kabán A, Schwefel HP (eds) Parallel problem solving from nature—PPSN VIII. Springer, Berlin, pp 832–842
Zhou A, Qu B, Li H, Zhao S, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol Comput 1(1):32–49
Li B, Li J, Tang K, Yao X (2015) Manyobjective evolutionary algorithms: a survey. ACM Comput Surv 48(1)
Ishibuchi H, Hitotsuyanagi Y, Tsukamoto N, Nojima Y (2010) Manyobjective test problems to visually examine the behavior of multiobjective evolution in a decision space. In: Schaefer R, Cotta C, Kołodziej J, Rudolph G (eds) Parallel problem solving from nature, PPSN XI. Springer, Berlin, pp 91–100
Yang S, Li M, Liu X, Zheng J (2013) A gridbased evolutionary algorithm for manyobjective optimization. IEEE Trans Evol Comput 17:721–736
Deb K, Jain H (2014) An evolutionary manyobjective optimization algorithm using referencepointbased nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans Evol Comput 18:577–601
Hong W, Tang K, Zhou A, Ishibuchi H, Yao X (2018) A scalable indicatorbased evolutionary algorithm for largescale multiobjective optimization. IEEE Trans Evol Comput 23(3):525–537
Zhang X, Tian Y, Jin Y (2015) A knee point driven evolutionary algorithm for manyobjective optimization. IEEE Trans Evol Comput 19:761–776
Cheng R, Jin Y, Olhofer M, Sendhoff B (2016) A reference vector guided evolutionary algorithm for manyobjective optimization. IEEE Trans Evol Comput 20:773–791
Bader J, Zitzler E (2011) HypE: an algorithm for fast hypervolumebased manyobjective optimization. Evol Comput 19(1):45–76
Chen H, Tian Y, Pedrycz W, Wu G, Wang R, Wang L (2019) Hyperplane assisted evolutionary algorithm for manyobjective optimization problems. IEEE Trans Cybern 50(7):3367–3380
Liu S, Lin Q, Wong KC, Li Q, Tan KC (2021) Evolutionary largescale multiobjective optimization: benchmarks and algorithms. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2021.3099487
Tian Y, Zhang X, Wang C, Jin Y (2019) An evolutionary algorithm for largescale sparse multiobjective optimization problems. IEEE Trans Evol Comput 24(2):380–393
He C, Cheng R, Zhang C, Tian Y, Chen Q, Yao X (2020) Evolutionary largescale multiobjective optimization for ratio error estimation of voltage transformers. IEEE Trans Evol Comput 24(5):868–881
Wang H, Jin Y, Yao X (2016) Diversity assessment in manyobjective optimization. IEEE Trans Cybern 47(6):1510–1522
Qian H, Yu Y (2017) Solving highdimensional multiobjective optimization problems with low effective dimensions. AAAI, vol 31. AAAI Press, New York, pp 875–881
He C, Cheng R (2021) Population sizing of evolutionary largescale multiobjective optimization. In: Ishibuchi H, Zhang Q, Cheng R, Li K, Li H, Wang H, Zhou A (eds) Evolutionary multicriterion optimization. Springer International Publishing, Cham, pp 41–52
Antonio LM, Coello CAC (2017) Coevolutionary multiobjective evolutionary algorithms: a survey of the stateoftheart. IEEE Trans Evol Comput 22:851–865
Liu S, Lin Q, Tian Y, Tan KC (2021) A variable importancebased differential evolution for largescale multiobjective optimization. IEEE Trans Cybern 52(12):13048–13062
Yang X, Zou J, Yang S, Zheng J, Liu Y (2021) A fuzzy decision variables framework for largescale multiobjective optimization. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2021.3118593
Ma X, Liu F, Qi Y, Wang X, Li L, Jiao L, Yin M, Gong M (2016) A multiobjective evolutionary algorithm based on decision variable analyses for multiobjective optimization problems with large scale variables. IEEE Trans Evol Comput 20:275–298
Ma L, Huang M, Yang S, Wang R, Wang X (2021) An adaptive localized decision variable analysis approach to largescale multiobjective and manyobjective optimization. IEEE Trans Cybern 52(7):6684–6696
Zhang X, Tian Y, Jin Y, Cheng R (2016) A decision variable clusteringbased evolutionary algorithm for largescale manyobjective optimization. IEEE Trans Evol Comput 22:97–112
Chen H, Cheng R, Wen J, Li H, Weng J (2020) Solving largescale manyobjective optimization problems by covariance matrix adaptation evolution strategy with scalable small subpopulations. Inf Sci 509:457–469
Zille H, Ishibuchi H, Mostaghim S, Nojima Y (2018) A framework for largescale multiobjective optimization based on problem transformation. IEEE Trans Evol Comput 22:260–275
He C, Li L, Tian Y, Zhang X, Cheng R, Jin Y, Yao X (2019) Accelerating largescale multiobjective optimization via problem reformulation. IEEE Trans Evol Comput 23(6):949–961
Tian Y, Zheng X, Zhang X, Jin Y (2020) Efficient largescale multiobjective optimization based on a competitive swarm optimizer. IEEE Trans Cybern 50(8):3696–3708
He C, Cheng R, Yazdani D (2022) Adaptive offspring generation for evolutionary largescale multiobjective optimization. IEEE Trans Syst Man Cybern 52(2):786–798
Qin S, Sun C, Jin Y, Tan Y, Fieldsend J (2021) Largescale evolutionary multiobjective optimization assisted by directed sampling. IEEE Trans Evol Comput 25(4):724–738
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85
He C, Tian Y, Jin Y, Zhang X, Pan L (2017) A radial space division based manyobjective optimization evolutionary algorithm. Appl Soft Comput 61:603–621
Pan L, He C, Ye T, Su Y, Zhang X (2017) A region division based diversity maintaining approach for manyobjective optimization. Integrat ComputAided Eng 24(3):279–296
Haynes W (2013) Wilcoxon rank sum test. In Encyclopedia of systems biology. Springer, 2354–2355
Tian Y, Cheng R, Zhang X, Jin Y (2017) PlatEMO: a MATLAB platform for evolutionary multiobjective optimization [educational forum]. IEEE Comput Intell Mag 12:73–87
Coello CAC, Cortés NC (2005) Solving multiobjective optimization problems using an artificial immune system. Genet Program Evolvable Mach 6(2):163–190
While L, Hingston P, Barone L, Huband S (2006) A faster algorithm for calculating hypervolume. IEEE Trans Evol Comput 10:29–38
Ishibuchi H, Imada R, Setoguchi Y, Nojima Y (2018) How to specify a reference point in hypervolume calculation for fair performance comparison. Evol Comput 26(3):411–440
Ishibuchi H, Imada R, Masuyama N, Nojima Y (2019) Comparison of hypervolume, igd and igd+ from the viewpoint of optimal distributions of solutions. Evolutionary multicriterion optimization. Springer International Publishing, Cham, pp 332–345
Deb K, Thiele L, Laumanns M, Zitzler E (2005) Scalable test problems for evolutionary multiobjective optimization. Springer, London, pp 105–145
Simon Huband LB, Hingston P, While L (2006) A review of multiobjective test problems and a scalable test problem toolkit. IEEE Trans Evol Comput 10(5):477–506
Cheng R, Jin Y, Olhofer M, Sendhoff B (2017) Test problems for largescale multiobjective and manyobjective optimization. IEEE Trans Cybern 47:4108–4121. https://doi.org/10.1109/TCYB.2016.2600577
Li H, Zhang Q (2009) Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGAII. IEEE Trans Evol Comput 13:284–302
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. U20A20306 and 61906081), and the National Key Research and Development Program of China (No. 2022YFB2403803). Y. Jin is supported by an Alexander von Humboldt Professorship for AI endowed by the German Federal Ministry of Education and Research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
He, C., Li, L., Cheng, R. et al. Evolutionary multiobjective optimization via efficient samplingbased offspring generation. Complex Intell. Syst. (2023). https://doi.org/10.1007/s4074702300990z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s4074702300990z
Keywords
 Largescale optimization
 Multiobjective optimization
 Evolutionary algorithm
 Sampling