Improved SparseEA for sparse large-scale multi-objective optimization problems

Sparse large-scale multi-objective optimization problems (LSMOPs) widely exist in real-world applications, which have the properties of involving a large number of decision variables and sparse Pareto optimal solutions, i.e., most decision variables of these solutions are zero. In recent years, sparse LSMOPs have attracted increasing attentions in the evolutionary computation community. However, all the recently tailored algorithms for sparse LSMOPs put the sparsity detection and maintenance in the first place, where the nonzero variables can hardly be optimized sufficiently within a limited budget of function evaluations. To address this issue, this paper proposes to enhance the connection between real variables and binary variables within the two-layer encoding scheme with the assistance of variable grouping techniques. In this way, more efforts can be devoted to the real part of nonzero variables, achieving the balance between sparsity maintenance and variable optimization. According to the experimental results on eight benchmark problems and three real-world applications, the proposed algorithm is superior over existing state-of-the-art evolutionary algorithms for sparse LSMOPs.


Introduction
Large-scale multi-objective optimization problems, which widely exist in scientific and engineering areas, refer to the problems involving a large number of decision variables and multiple conflicting objectives. For example, the optimization of a vehicle routing problem (VRP) usually consists This paper is an expanded version of our conference paper submitted to EMO2021. B Xingyi Zhang xyzhanghust@gmail.com Yajie Zhang yjzhang17719490727@163.com Ye Tian field910921@gmail.com of hundreds of customers [1], and the number of decision variables in the ratio error estimation of voltage transformers (TREE) can vary from hundreds to even millions [2]. Evolutionary algorithms (EAs), as a population-based optimization method, are capable of obtaining a set of trade-off solutions in a single run and less vulnerable to trap into local optimums. A variety of EAs showing promising performance on different kinds of multi-objective optimization problems (MOPs) have been proposed during the past two decades [3][4][5], however, their performance usually degenerate when they are adopted to tackle LSMOPs. One reason is that the search space expands exponentially with the increasing of the number of decision variables, which is known as the curse of dimensionality [6]. To solve this dilemma, various approaches have been tailored for solving LSMOPs during the past ten years [7], which can be roughly categorized into four types. They are decision variable grouping [8], decision variable analysis [9,10], problem reformulation [11,12], and special offspring generation strategy based EAs [13,14,16,17].
In the field of LSMOPs, there exists a special kind of optimization problem becoming increasingly important in real-world applications and scientific researches, which is known as sparse LSMOPs. The Pareto optimal solutions of such kind of problem are sparse, i.e., most decision variables of the solutions are zero. For example, the portfolio optimization problem is to maximize the expected return and minimize the potential risks, the invested products usually account for only a small portion of all candidate products [19]. In the neural network training problem, to minimize the complexity and error of a network model, many weights should be set to zero [18]. When existing approaches customized for general LSMOPs are employed to solve sparse LSMOPs, few of them can obtain a satisfactory solution set within a limited budget of function evaluations, since they do not consider the sparse nature of Pareto optimal solutions when evolving the population, thus converging slowly on sparse LSMOPs [20].
To fill the gap mentioned above, several multi-objective optimization evolutionary algorithms (MOEAs) have been tailored for sparse LSMOPs in recent years. In SparseEA [21], aiming to maintain the sparsity of generated solutions, a novel population initialization strategy and genetic operators have been proposed. In MOEA/PSL [22], two unsupervised neural networks are used to learn a sparse distribution and a compact representation of the decision variables, thus achieving the approximation of Pareto optimal subspace. In PM-MOEA [25], pattern mining techniques are utilized to mine the sparse distribution of Pareto optimal solutions and thus considerably reduce the search space. In MDR-SAEA [26], the authors propose to use feature selection approach to achieve dimensionality reduction, and apply Kriging-assisted evolutionary algorithms to solve the expensive sparse LSMOPs. In MP-MMEA [27], a multi-population MOEA guiding the search behavior of populations via adaptively updated vectors is proposed to deal with sparse large-scale multi-modal MOPs, where the guiding vectors can not only accelerate convergence in the huge search space, but also differentiate the search direction of each population.
It is necessary to note that all the algorithms enumerated above adopt a two-layer encoding scheme, i.e., each decision variable x i is represented by x i = mask i ×dec i , where i ranges from 1 to D, and D is the number of decision variables. One main purpose of adopting such encoding strategy is to facilitate the detection of the positions of nonzero variables. It can be found that the algorithms in [22,25,26] attempt to find the sparse distribution of decision variables firstly via different dimensionality reduction techniques, and put the optimization of nonzero real variables in the second place. However, paying too much attention to sparsity detection may hinder the optimization of nonzero variables. Figure 1 shows the parallel coordinates plot of the decision variables of solutions obtained by SparseEA, MOEA/PSL, and PM-MOEA on SMOP1 and SMOP3 with 1000 decision variables, where the sparsity of these two problems is set to 0.1, i.e., the last 900 decision variables in the Pareto optimal solu-tions are zero. It can be found that, for SMOP1, even though PM-MOEA detects the positions of nonzero variables more precisely than the other two algorithms, it does not obtain the best IGD value among the three algorithms as expected, since MOEA/PSL optimizes the nonzero real variables better. For SMOP3, when all the three algorithms detect the positions of nonzero variables precisely, PM-MOEA obtains the best IGD value as it optimizes the key variables which affect the function fitness more sufficiently.
Based on the above observations, it can be found that the optimization of nonzero variables is as important as the detection of sparsity. In this paper, we propose to enhance the connection between mask and dec with the assistance of variable grouping techniques. We do not put sparsity maintenance in the first place anymore, on the contrary, we hope to optimize key variables more sufficiently without sacrificing the effect of sparsity maintenance. In this way, the balance between sparsity maintenance and variables optimization can be better achieved. The main contributions of this paper are summarized as follows: • A new algorithm equipping customized genetic operators for sparse LSMOPs is proposed in this paper, in which the connection between real variables and binary variables is enhanced with the assistance of variable grouping techniques. Thus, the real variables are ensured to be optimized as long as the corresponding binary variables are flipped, improving the efficiency of producing sparse Pareto optimal solutions. • Based on the performance evaluation and empirical study results of the conference version of this paper [20], a more comprehensive comparison study is conducted to reveal the merits and drawbacks of the existing stateof-the-art MOEAs and the newly proposed algorithm on benchmark and real-world LSMOPs with sparse Pareto optimal solutions.
The remainder of this paper is organized as follows. We first introduce the existing MOEAs for general LSMOPs and sparse LSMOPs, and then elaborate on our proposed MOEA for sparse LSMOPs. Next, we present the experimental studies on eight benchmark problems and three real-world applications. Finally, we draw the conclusions and outline some future research directions.

Related work
As mentioned in the last section, existing MOEAs for general LSMOPs can be roughly categorized into four different groups. They are based on decision variable grouping [8], decision variable analysis [9,10], problem reformu-  of the decision variables of  solutions obtained by SparseEA,  MOEA/PSL, and PM-MOEA on  SMOP1 and SMOP3 with 1000 variables, the sparsity is set to 0.1. The value in the top right corner of each sub-figure represents the median IGD indicator obtained by the corresponding algorithm over 30 runs lation [11,12], and special offspring generation strategies [13,14,16,17].
The main idea of the algorithms based on decision variable grouping is to divide decision variables into different groups that can be optimized via independent sub-populations in a divide-and-conquer manner. However, their performance can be greatly affected by the adopted variable grouping techniques. For example, random grouping [28] is employed in the third-generation cooperative co-evolutionary differential evolution algorithm (CCGDE3) [8] considering that dividing variables into random groups provides better results than applying a deterministic division scheme when dealing with nonseparable functions. Besides, other variable grouping techniques (e.g., ordered grouping [29], linear grouping [30] and differential grouping [31]) have also shown effectiveness in solving specific LSMOPs.
MOEA/DVA [9] and LMEA [10] are two well-known algorithms belonging to the category based on decision variable analysis. The key component of MOEA/DVA consists of control property analysis and variable linkage analysis, in which the former divides the decision variables into position related variables, distance related variables, and mixed variables, while the latter further divides distance-related variables into smaller subgroups of interacting variables.
Afterwards, variables in each subgroup are optimized independently through a differential evolution based optimizer. In contrast to MOEA/DVA treating mixed variables as diversityrelated variables, LMEA clusters a decision variable as either convergence-related variable or diversity-related variable precisely. Subsequently, convergence optimization strategy and diversity optimization strategy are employed to optimize the corresponding variables alternately.
For the methods based on problem reformulation, two representative algorithms are WOF [11] and LSMOF [12]. WOF divides decision variables into different groups and assigns a weight variable to each group, thus the dimensionality of problems can be greatly reduced by altering variables in the same group at the same time. On the other hand, LSMOF defines a set of reference directions in the decision space and associates them with a number of weight variables to reformulate the original problem into a lowdimensional single-objective optimization problem. After obtaining enough quash-optimal solutions near the Pareto set, LSMOF spreads such solutions over the approximated Pareto set evenly via an embedded differential evolution algorithm.
The last category employs special offspring generation strategies. To improve the search efficiency, LMOCSO [14] suggests a new reproduction operator based on the com-petitive swarm optimizer [15], and an acceleration term is added to the position update mechanism to accelerate the convergence speed. Instead of optimizing decision variables directly, LCSA [16] evolves a population of coefficient vectors, by taking advantage of the inherent knowledge of the population, offsprings can thus be obtained by a linear combination of existing individuals. GLMO [17] embeds variable grouping into mutation operators to improve the quality of generated offsprings, and three new mutation operators are presented, they are Linked Polynomial Mutation, Grouped Polynomial Mutation and Grouped and Linked Polynomial Mutation. DGEA [13] generates promising solutions via constructing direction vectors in the decision space. Specifically, in each iteration, two kinds of direction vectors related to convergence and diversity are constructed adaptively, and offsprings are then produced along each direction vector through sampling the built Gaussian distribution.
Despite that, the above delicate approaches work well on general LSMOPs, their performance usually degenerates when they are applied to solve sparse LSMOPs. One main reason is that few of them consider the sparse nature of Pareto optimal solutions when evolving the population, thus converging slowly on sparse LSMOPs. To fit this gap, several MOEAs customized for sparse LSMOPs have been proposed, which make fully use of the sparsity of problems to speed up the convergence to Pareto optimal sets. In this paper, we divide them into two different categories according to whether the dimensionality reduction techniques are used.
As for the sparse MOEAs without dimensionality reduction techniques, SparseEA [21] has a similar framework to NSGA-II, while the novelties lie in its population initialization strategy and genetic operators, which ensure the sparsity of generated individuals. Specifically, in the population initialization strategy, SparseEA first calculates the fitness scores for each decision variable based on non-dominated sorting [3], and then generates the initial population based on the obtained scores. As for the genetic operators, SparseEA flips zero or nonzero binary variables with the same probability on the basis of fitness scores, however, for the real part of decision variables, SparseEA simply executes conventional genetic operators. Recently, A multi-population evolutionary algorithm, termed MP-MMEA, has been proposed for solving sparse large-scale multi-modal multi-objective optimization problems (MMOPs), MP-MMEA adopts adaptively adjusted guiding vectors to improve both the convergence and diversity of each population, in which the guiding vectors can not only lead the sub-populations to evolve towards sparse Pareto sets efficiently, but also diversify the search direction of each subpopulation in the decision space.
As for the sparse MOEAs based on dimensionality reduction techniques, MOEA/PSL [22] adopts the restricted Boltzmann machine (RBM) [23] and denoising autoencoder (DAE) [24] to learn the sparse distribution and compact representation of decision variables, and regards the combination of the learnt sparse distribution and compact representation as an approximation of the Pareto optimal subspace. Subsequently, genetic operators are conducted in the reduced subspace instead of the original search space, in this way, the huge search space is highly reduced. Similarly, PM-MOEA [25] utilizes data mining techniques to mine the maximum and minimum candidate sets of the nonzero variables in Pareto optimal solutions, and then executes genetic operators on the dimensions determined by the maximum and minimum candidate sets, therefore, the high-dimensional decision space can also be greatly reduced. To address the curse of dimensionality encountered in sparse LSMOPs with expensive functions, MDR-SAEA [26] executes non-dominated sorting based feature selection and mask evolving based feature selection within a multi-stage framework to reduce the search space, and then performs surrogate-assisted optimization for the dimension-reduced problems.

Proposed MOEA for sparse LSMOPs
Up to now, existing MOEAs tailored for sparse LSMOPs put sparsity maintenance in the first place, where the real part of nonzero variables can hardly be optimized sufficiently within a limited budget of function evaluations. Therefore, in this paper, an improved version of SparseEA, termed SparseEA2, is proposed, in which the connection between real variables and binary variables is enhanced with the assistance of variable grouping techniques. Thus ensuring that the real part of nonzero variables can attract more attentions to be optimized more sufficiently, without sacrificing the effect of sparsity maintenance. In this section, we will first introduce SparseEA, and then elaborate on our proposed SparseEA2 specifically. Figure 2 shows the procedure of SparseEA, which is very similar to NSGA-II [3]. The mating pool selection and environmental selection of SparseEA are the same as the counterparts of NSGA-II, while the novelties lie in its population initialization strategy and genetic operators.

SparseEA
Algorithm 1 presents the population initialization strategy of SparseEA, which consists of two steps, i.e., calculating the fitness of decision variables and generating the initial population. In the first step, the real vector Dec is set to a uniformly randomly generated D × D matrix or a D × D matrix of ones according to the types of decision variables, and the binary vector Mask is set to a D × D identity matrix. Here, we note that Dec denotes the decision variables and Mask denotes the mask. Thereafter, a population Q with D solutions is generated by multiplying Dec by Mask. Then, Fig. 2 Procedure of SparseEA non-dominated sorting is executed on Q and the front number of the i-th individual is regarded as the fitness of the i-th decision variable. The fitness of each decision variable can be used to measure its contribution to the objective value, i.e., a smaller fitness of a decision variable indicates a lower probability that the decision variable should be set to zero. In the second step, Dec is first set to a N × D matrix in the same way in the first step, and Mask is set to a N × D matrix of zeros. Then, for each solution, a random number of decision variables are set to 1 according to their fitness. Finally, the initial population P with N solutions is generated via multiplying Dec by Mask.
The other key component of SparseEA, i.e., genetic operator, is presented in Algorithm 2, which is composed of generating the mask of offsprings and generating the dec of offsprings. To be specific, two parents p and q are randomly selected from P to generate an offspring o in each turn. Aiming to generate the binary vector mask of o, a uniformly distributed random number in [0, 1] is firstly generated, if rand() is smaller than 0.5, two decision variables from the nonzero elements in p.mask ∩ q.mask are randomly selected, and the element with bigger fitness is set to 0. Otherwise, two decision variables from the nonzero elements in p.mask ∩ q.mask are randomly selected, and the element with smaller fitness is set to 1. Afterwards, o.mask is mutated by either the following two operations with the Randomly select two decision variables from the nonzero elements in p.mask ∩ q.mask; 9 Set the element with bigger fitness in o.mask to 0; 10 else 11 Randomly select two decision variables from the nonzero elements in p.mask ∩ q.mask; 12 Set the element with smaller fitness in o.mask to 1; attentions have been paid to the generation of dec, i.e., simply performing simulated binary crossover and polynomial mutation based on p.dec and q.dec. Out of this consideration, we have the following two concerns: To explain our concerns clearly, Fig. 3 presents the mutation process of SparseEA. We know that when the uniformly distributed random number is not smaller than 0.5, SpareseEA randomly select two decision variables from the nonzero elements in o.mask and set the one with smaller fitness to 1. Supposing that the randomly selected two decision variables are the first one and the third one, according to the fitness scores of decision variables which keep unchanged during the whole optimization process, the third element which is marked by a star will be flipped to 1, and thus we obtain the binary vector mask . of offspring individual x . . As for the real vector dec, each variable has a equal probability of 1/D to be mutated, and in this example, we present the mutated real variables in the highlighted cells.
It can be observed that since the positions of elements to be mutated in binary vector mask and real vector dec are not consistent with each other, even though the binary variable with smaller fitness (higher probability that the decision variable should be nonzero in the Pareto optimal solutions) can be selected and flipped, its corresponding real variable keeps unchanged during current reproduction process. That is, for solution x, its key nonzero real variable is not optimized at all in current iteration, even if the position of nonzero variable has been found precisely. Besides, despite of that many real variables in dec have been mutated, for the variables that are not related to the nonzero elements in Pareto optimal solutions, the efforts made for them will be in fact a kind of waste.
To address this dilemma, variable grouping techniques are utilized to enhance the connection between mask and dec, thus ensuring when a binary variable is flipped, its corresponding real variable should be optimized at the same time. Specifically, instead of performing crossover and mutation operators on binary vector first, simulated binary crossover is executed on real variables, then we divide the obtained vari- Fig. 4 An example of the mutation process of SparseEA2, in which x and x . are the parent individual and offspring individual, mask, dec, mask . , and dec . are the binary vectors and real vectors of x and x . , respectively. The light gray numbers above the table represent the fitness of each decision variable, and the number with smaller font size in each cell denotes the grouping index ables into different groups and randomly select one group to perform mutation operation on the basis of variable grouping techniques in [17]. Figure 4 shows the mutation process of the proposed SparseEA2, we first divide real variables after simulated binary crossover into different groups (two groups in this example) via ordered grouping [29], then one group of variables is randomly selected. Supposing that the second group of variables is selected, after those variables are changed with the same mutation amount, the binary variables having the same positions to the mutated real variables are picked out, and we call the set of those binary variables as PreMask. Afterwards, a uniformly distributed random number in [0, 1] is generated, and if the random number is smaller than 0.5, two variables from the nonzero elements in PreMask are randomly selected, and the one with bigger fitness is set to 0. Otherwise, two variables from the nonzero elements in PreMask are randomly selected, and the one with smaller fitness is set to 1. Supposing that the variables whose fitness are 5 and 12 are selected, the former one which is marked by a star will be flipped from 0 to 1, as its fitness is smaller.
It can be observed that the connections between mask and dec can indeed be enhanced through the operations elaborated above. As a result, without sacrificing the effect of sparsity maintenance, as long as one binary variable is flipped, its corresponding real variable should be optimized at the same time. Besides, since the attention is only paid to one group of variables each time, the efforts devoted to the mutation process can also be saved, which is very meaningful when only one binary variable in mask is flipped in each iteration. The genetic operators of SparseEA are replaced with the ones elaborated above, and we call the new algorithm as SparseEA2. To validate the effectiveness of SparseEA2, we run SparseEA2 on SMOP1 and SMOP3 with 1000 decision variables. Figure 5 shows the parallel coordinates plot of the decision variables of solutions obtained by SparseEA2, compared to the results as shown in Fig. 1, we see that the first 100 key variables that should be set to nonzero in the Pareto optimal solutions are optimized more sufficiently, and the IGD values attached in the top right corner of each sub-figure are also much smaller than the ones obtained by SparseEA, MOEA/PSL, and PM-MOEA.

Experimental settings for benchmark problems
Algorithms: For CCGDE3, the number of species is set to 2, and random grouping is adopted. For LMEA, the number of selected solutions for decision variables clustering is set to 2, the number of perturbations on each solution is set to 4, and the number of selected solutions for decision variable interaction analysis is set to 5. For WOF-SMPSO, the number of function evaluations for each optimization of the original problem is set to 1000, while the number of evaluations for each optimization of the transformed problem is set to 500, the number of chosen solutions is set to 3, the number of groups is set to 4, and ordered grouping is adopted. For GLMO, the number of groups is set to 4, NSGA-II is adopted as the underlying optimizer and ordered grouping is employed. For PM-MOEA, the population size and the number of generations of its evolutionary pattern mining approach are set to 20 and 10, respectively. For SparseEA2, the number of groups is set to 4, and ordered grouping is used. In LMEA, GLMO, SparseEA, MOEA/PSL, PM-MOEA, and SparseEA2, the simulated binary crossover and polynomial mutation are employed to produce offsprings, the probability of crossover and mutation are set to 1 and 1/D, where D is the number of decision variables, and the distribution index of both crossover and mutation is set to 20. In CCGDE3, the DE operator and polynomial mutation are used for offspring generation, where the control parameters are set to CR = 1, F = 0.5, pm = 1/D, and η = 20. In WOF-SMPSO, the PSO operator and polynomial mutation are employed. Problems: For SMOP1 -SMOP8, the number of objectives is set to 2, the number of decision variables is set to 1000, 2000, and 5000, and the sparsity of Pareto optimal solutions is set to 0.1, which denotes the ratio of nonzero elements in the decision variables.

Stopping criteria and population size:
The maximum number of function evaluations is adopted as the stopping criteria, which is set to 100 × D for each MOEA. The population size is set to 100.

Performance metrics:
The inverted generational distance (IGD) [36] is adopted to measure each obtained solution set, and roughly 10000 reference points on each Pareto front are sampled to calculate the IGD value. We perform 30 independent runs for each MOEA on each problem, and the Wilcoxon rank-sum test with a significance level of 0.05 is adopted to perform statistical analysis. Here, we note that all the experimental studies in this paper are conducted on the PlatEMO 1 [34] Experimental results on benchmark problems Table 1 shows the IGD values obtained by CCGDE3, LMEA, WOF-SMPSO, GLMO, SparseEA, MOEA/PSL, PM-MOEA and the proposed SparseEA2 on SMOP1-SMOP8 with 1000, 2000, and 5000 decision variables over 30 runs. For the 24 benchmark problems, while MOEA/PSL obtains the best results on 4 problems, PM-MOEA and SparseEA2 perform the best on 10 problems, respectively. Based on the Wilcoxon rank-sum test with a significance level of 0.05, compared to SparseEA2, the statistical analysis results of the other seven algorithms are 0/24//0, 0/24/0, 0/23/1, 0/24/0, 1/21/2, 4/17/3, and 9/14/1. Thus, it can be concluded that SparseEA2 exhibits the best performance over the other 7 algorithms on the eight sparse benchmark LSMOPs.
Besides, It is evident that the four MOEAs customized for sparse LSMOPs perform obviously better than the other four MOEAs tailored for general LSMOPs on these 24 benchmark problems. One may doubt that sparse optimization is 1 The source codes of algorithms and problems used in this paper can be found from https://github.com/BIMK/PlatEMO. a special optimization problem, and the algorithms that can solve general LSMOPs should also be able to solve sparse LSMOPs. In fact, this question can be answered from the following three viewpoints. Firstly, existing MOEAs for general LSMOPs mostly generate the initial population in a random manner within the large search space, and the generated initial population is usually far from the sparse optimal Pareto sets. Secondly, without customizing special genetic operators, for each decision variable, existing MOEAs for general LSMOPs usually traverse each legal value with the same probability, which is very inefficient. Thirdly, the computational budget is usually limited, e.g., 100 × D function evaluations for benchmark problems in this paper. Under the above three conditions, existing MOEAs for general LSMOPs can hardly converge to the sparse optimal Pareto fronts within the limited computational budget. Figure 6 shows the Pareto optimal fronts with median IGD values obtained by the eight compared algorithms on SMOP5 and SMOP8 with 5000 decision variables over 30 runs. For SMOP5, we see that SparseEA, MOEA/PSL, PM-MOEA, and SparseEA2 exhibit the best results that have no obvious differences, WOF-SMPSO and GLMO obtain similar results that are worse than the four sparse MOEAs, while LMEA performs worst. For SMOP8, firstly, WOF-SMPSO, GLMO, CCGDE3 and LMEA are significantly outperformed by the other four MOEAs customized for sparse LSMOPs. Secondly, for the four sparse MOEAs, MOEA/PSL and SparseEA2 obtain similar results that are better than the other two algorithms, while SparseEA performs worst within the four sparse MOEAs. Figure 7 shows the Pareto optimal sets with median IGD values obtained by the eight compared algorithms on SMOP8 with 2000 decision variables over 30 runs. Firstly, we see that the decision variables of solutions obtained by CCGDE3 and LMEA cover the whole search space, which explains why these two algorithms obtain the worst results. Secondly, for WOF-SMPSO and GLMO which exhibit slightly worse results than the remaining four sparse MOEAs, even though the decision variables of solutions obtained by them are very close to zero, there still exist huge differences with the sparse Pareto optimal solution set. Thirdly, for the four sparse MOEAs which perform obviously better than the other four Table 1  IGD values obtained  algorithms, most of the decision variables obtained by them are very sparse. Most importantly, the key nonzero variables which affect the function fitness obtained by SparseEA2 are optimized more sufficiently than the other three sparse MOEAs without sacrificing the effect of sparsity maintenance, which can properly explains why SparseEA2 obtains better results than SparseEA, MOEA/PSL, and PM-MOEA on SMOP8.
Here, it is necessary to analyze why the decision variables of solutions obtained by CCGDE3 and LMEA cover the whole search space. On the one hand, the performance of CCGDE3 and LMEA depend heavily on the precision of decision variables grouping or classification, and they work well on problems with separable variables. However, the landscape functions of SMOPs are complicated, a solution can be Pareto optimal when the first fixed number of decision variables are nonzero and the remaining variables are all zero, besides, there exist strong interactions between these variables. On the other hand, CCGDE3 is nontrivial to select the proper cooperators for executing function evaluations and LMEA consumes too many function evaluations to conduct the decision variable analysis. Under the three conditions previously analyzed, the decision variables obtained by these two algorithms are reasonably far from the sparse optimal sets.

Experimental studies on real-world applications
To further validate the superiority of SparseEA2 over the compared MOEAs in solving sparse LSMOPs, in this section, three real-world applications, namely, the neural network training problem [18], the portfolio optimization problem [19], and the sparse signal reconstruction problem [35] are selected to conduct a deeper experimental study.

Experimental settings for real-world problems
Algorithms: The compared algorithms and the corresponding parameter settings of each algorithm are kept the same as the last section. Problems: For each real-world applications, three datasets are used, thus, there are in total nine problems empirically tested in this section. Table 2 presents the details of each problem, where NN, PO, and SR denote the neural network training problem, the portfolio optimization problem, and the sparse signal reconstruction problem, respectively. Stopping criteria and population size: The maximum number of function evaluations is adopted as the stopping criteria, which is set to 2.0 × 10 4 , 4.0× 10 4 and 1.0 × 10 5 for problems with approximately 1000, 2000 and 5000 decision variables. The population size is set to 50. Performance metrics: The HV [37] indicator with a reference point (1,1) is employed to measure the results on real-world applications, and the Wilcoxon rank-sum test with a significance level of 0.05 is also adopted to perform the statistical analysis. Table 3 shows the HV values obtained by CCGDE3, LMEA, WOF-SMPSO, GLMO, SparseEA, MOEA/PSL, PM-MOEA, and the proposed SparseEA2 on the three realworld applications. We see that SparseEA2 exhibits the best results on the nine test instances. Compared to SparseEA2, the statistical analysis results of the other seven algorithms are 0/9/0, 0/9/0, 0/9/0, 0/9/0, 0/8/1, 0/7/2, and 0/8/1. Besides, it can be found that the orders of magnitude of HV values obtained by the sparse MOEAs are mostly e-1, while it is e-2 for the MOEAs tailored for general LSMOPs, which indicates that there exist big differences between general MOEAs and sparse MOEAs in solving real-world applications with sparse optimal solutions. Since the reasons why the algorithms performing well on general LSMOPs are not able to solve sparse LSMOPs have be analyzed in the former section, we do not repeat it anymore. Figure 8 shows the Pareto optimal fronts with median HV values obtained by the eight compared algorithms on the neural network training problem and the sparse signal reconstruction problem with approximately 5000 decision variables over 30 runs. For the neural network training problem, the four sparse MOEAs, i.e., SparseEA, MOEA/PSL, PM-MOEA, and SparseEA2 exhibit obviously better results than the other four algorithms tailored for general LSMOPs. Within the four sparse MOEAs, SparseEA2 obtains the best result. For the sparse signal reconstruction problem, the results obtained by CCGDE3, GLMO, MOEA/PSL, LMEA, and WOF-SMPSO are outperformed by these obtained by the remaining three algorithms. While within the remaining three sparse MOEAs, SparseEA2 obtains the best result, i.e., finding the most sparse signal for the lowest loss. Figure 9 shows the Pareto optimal sets with median HV values obtained by the eight compared algorithms on the portfolio optimization problem with 5000 decision variables over 30 runs. Firstly, for the four MOEAs tailored for general LSMOPs, the Pareto optimal sets obtained by them are not sparse at all. Secondly, within the four MOEAs customized for sparse LSMOPs, the sparsity of the Pareto optimal set obtained by PM-MOEA is much bigger than that of the other three sparse MOEAs, while SparseEA2 obtains the most sparse Pareto optimal set.   Table 4 Runtime (in second) of CCGDE3, LMEA, WOF-SMPSO, GLMO, SparseEA, MOEA/PSL, PM-MOEA, and SparseEA2 on SMOP1-SMOP8, neural network training, portfolio optimization, and sparse signal reconstruction. The least runtime in each row is highlighted, the longest runtime in each row is also shown in bold italic, while the number in each bracket denotes the rank of the different algorithm on the same test instance Problem

Computational efficiency of SparseEA2
Lastly, the computational efficiency of SparseEA2 is compared to the other seven MOEAs. Table 4 lists the runtime (in second) of the eight MOEAs on the benchmark SMOPs and real-world applications. It can be observed that the efficiency of SparseEA2 is worse than the compared algorithms on the neural network training problems and the portfolio optimization problems, competitive to the compared algorithms on benchmark problems with 2000 decision variables and the sparse signal reconstruction problems. Since the initialization strategy of SparseEA2 has a time complexity of O(M D 2 ), where M and D denotes the number of objectives and decision variables, respectively, the average rank of SparseEA2 is reasonably bigger than that of CCGDE3, WOF-SMPSO, and GLMO. Moreover, compared to SparseEA, the ordered grouping with a time complexity of O(D 2 ) is additionally performed in each generation, the runtime of SparsEA2 is thus longer than SparseEA on most test problems. While for some problems such as the sparse signal reconstruction problems, SparseEA2 becomes more efficient since its genetic operators generate sparse solutions more efficient, and a sparse solution usually corresponds to cheap objective evaluations. Even though, the average rank of SparsEA2 is still similar to MOEA/PSL, smaller than PM-MOEA and LMEA. In short, the computational efficiency of SparseEA2 is affordable.

Conclusions and future work
In this paper, we have proposed an improved version of SparseEA for solving sparse LSMOPs. The core idea of the proposed algorithm is to enhance the connection between mask and dec with the assistance of variable grouping techniques, thus ensuring that the real part of a decision variable should be optimized at the same time when its binary part is flipped. Besides, since only one binary variable in mask is flipped each time, there is no need to perform mutation operations on each real variable with the same mutation probability. Through enhancing the connection between mask and dec, SparseEA2 can also avoid wasting efforts on the variables that may not be related to the nonzero elements in the Pareto optimal solutions. In the experimental studies, the proposed algorithm has been compared with four MOEAs tailored for general LSMOPs and three MOEAs customized for sparse LSMOPs on eight benchmark problems as well as three real-world applications with sparse Pareto optimal solutions. The statistical results on those in total 33 test instances have demonstrated the superiority of the proposed algorithm over other MOEAs in solving sparse LSMOPs.
To the best of our knowledge, there are only five MOEAs are specially designed for sparse LSMOPs up to now, where the study on large-scale sparse MOEAs is still in its infancy. Therefore, it deserves more attention in the community of evolutionary computation. For example, we can customize MOEAs for sparse LSMOPs without using the two-layer encoding scheme. Besides, we can adopt more effective environmental selection strategies in SparseEA2 for sparse LSMOPs with many objectives (i.e., many-objective knapsack problems [38]), and we can also combine the proposed algorithm with effective constraint handling techniques for solving sparse constrained LSMOPs (i.e., optimal software product selection problems [39]

Consent for publication Not applicable
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.