Evolutionary Computation for Large-scale Multi-objective Optimization: A Decade of Progresses

Large-scale multi-objective optimization problems (MOPs) that involve a large number of decision variables, have emerged from many real-world applications. While evolutionary algorithms (EAs) have been widely acknowledged as a mainstream method for MOPs, most research progress and successful applications of EAs have been restricted to MOPs with small-scale decision variables. More recently, it has been reported that traditional multi-objective EAs (MOEAs) suffer severe deterioration with the increase of decision variables. As a result, and motivated by the emergence of real-world large-scale MOPs, investigation of MOEAs in this aspect has attracted much more attention in the past decade. This paper reviews the progress of evolutionary computation for large-scale multi-objective optimization from two angles. From the key difficulties of the large-scale MOPs, the scalability analysis is discussed by focusing on the performance of existing MOEAs and the challenges induced by the increase of the number of decision variables. From the perspective of methodology, the large-scale MOEAs are categorized into three classes and introduced respectively: divide and conquer based, dimensionality reduction based and enhanced search-based approaches. Several future research directions are also discussed.


Introduction
Many real-world optimization problems involve multiple objectives, which are called multi-objective optimization problems (MOPs) [1] . The conflict between the multiple objectives makes it impossible to find a single solution that can optimize all objectives simultaneously. Searching for the best trade-off solutions, called Paretooptimal solutions [2] , for MOPs is thus critical to a decision maker. Evolutionary algorithms (EAs) are able to approximate the whole set of Pareto-optimal solutions in a single run due to their population-based nature and do not make particular assumptions about problems like continuity or differentiability. Thus, the use of EAs to deal with MOPs have been extensively studied [3−5] and plenty of powerful multi-objective EAs (MOEA) have been proposed [6−11] .
Scalability of MOEAs, which characterizes the changing trend of algorithm performance with the problem size, is not only a long-standing concern in the evolutionary computation community [12] , but also is critical to whether MOEAs can be applied to broader real-world problems. In the context of MOPs, the number of decision variables and the number of objectives are two key factors for the problem size. Scalability with respect to the number of objectives, referred to as many-objective optimization, has drawn steady attention in past decades and a number of surveys are available [13,14] . On the other hand, although scalability with respect to the number of decision variables has attracted wide attention, there is no review paper in this regard. In recent years, there has been an increasing interest in the scalability of MOEAs with respect to decision variables, as large-scale MOPs (i.e., MOPs with a large number of decision variables) appear widely in various real-world applications. For example, the training of deep neural networks (DNN) needs to take into account both empirical risks (such as training errors) and structural risks (such as network sparseness), while typical industrial DNNs often contain millions of weights to be trained [15−17] . Commercial promo-tion on social networks requires both influence maximization and cost minimization, while the number of nodes involved in real networks (such as Weibo, Facebook, twitter, etc.) can be in the order of millions or even hundreds of millions [18,19] . Problems recently faced in logistics scheduling [20,21] , software engineering [22] , pattern mining [23] , and many other fields have also shown the characteristics of large-scale multi-objective optimization problems, leading to an urgent need for efficient and effective approaches. Nonetheless, most current practice of MOEAs has been restricted in MOPs with small-scale decision variables (normally, no more than 30). More recently, empirical studies have shown a severe decrease in efficiency of traditional MOEAs when increasing the number of decision variables [24,25] . As a result, evolutionary computation for large-scale MOPs has attracted a lot of research effort in the past decade, which mainly focuses on two major problems: 1) What are the difficulties associated with large-scale multi-objective optimization? The scalability of MOEAs is determined jointly by their search mechanisms and the problem properties. An in-depth analysis of the specific difficulties associated with different large-scale MOPs when adopting different MOEAs can benefit the evaluation of algorithms, and thereby facilitate the development of new and improved large-scale MOEAs. To this end, there has been some work to systematically explore the scalability of traditional MOEAs and put forward some discoveries about the challenges faced by the algorithms [26−28] . This paper briefly reviews recent efforts in this aspect.
2) How to enhance the scalability of an MOEA? As mentioned above, research on designing MOEAs with high scalability originates from practical needs. To this end, basic ideas include simplifying large-scale MOPs via divide-and-conquer methodology [29] and dimensionality reduction [30] and improving the search ability of MOEAs by rebalancing the exploration and exploitation [28] . From these three aspects, this paper summarizes the advances in scalable algorithm designs. The main focus is on how characteristics of MOPs and scalability challenges are integrated into the design of highly scalable MOEAs.
To summarize, this paper focuses on the scalability of MOEAs with respect to the number of decision variables and presents an extensive review of recent progresses on evolutionary computation for large-scale multi-objective optimization in the last decade. We first introduce the study on the capabilities of traditional MOEAs to properly scale when increasing the number of decision variables as well as the difficulties encountered by these algorithms, and then categorize recent developments on scalable MOEAs into three classes based on the key idea used: the divide-and-conquer, dimensionality reduction and enhanced search-based approaches.
The remainder of this paper is organized as follows. Section 2 introduces the background of large-scale multiobjective optimization. Section 3 briefly discusses the scalability of traditional MOEAs and challenges associated with large-scale multi-objective optimization. Section 4 summarizes the advances in scalable MOEA designs. Finally, the paper is concluded in Section 5 with some potential directions for future research.

Background of large-scale multiobjective optimization
In this section, the background of large-scale multi-objective optimization is presented, including the problem definition, the Pareto optimality, the scalability measurements, benchmark problems and the concepts of variable dependencies.
An MOP can be mathematically formulated as where the decision vector consists of decision variables, the objective function vector consists of objective functions, and and denote the decision and objective spaces, respectively. Largescale MOPs refer to the subclass of MOPs with a large number of decision variables. In this paper, we define a large-scale MOP as an MOP with and [29] . This is done to highlight the challenges posed by a large number of decision variables to existing MOEAs [24,25] . As stronger scalable MOEAs are developed in the future, the number would increase accordingly. Given an MOP, the concept of Pareto optimality [1,2] is defined as follows. Given two solutions  and their corresponding objective vectors  ,  dominates  (denoted as  ) if and only if  ,  and , Definition 2. A solution is Pareto-optimal if and only if there exists no such that . The set of all Pareto-optimal solutions is called the Pareto set. The corresponding objective vector set of the Pareto set is called the Pareto front. The goal of optimizing a large-scale MOP is to obtain an approximation set to the Pareto front (PF) with two characteristics: 1) High convergence: All solutions in are as close as possible to the Pareto front; 2) High diversity: All solutions in are as diverse as possible in the objective space [31] . To achieve such optimization goal, the hypervolume indicator [32,33] and inverted generational distance (IGD) [34] which consider both convergence and diversity comprehensively are commonly adopted to measure the quality of an approximation set.
The scalability of an MOEA stands for the trend of its solution performance with the number of decision vari-ables. It can be measured by identifying how the performance of the given algorithm changes as the number of decision variables increases and the performance can be described in terms of three aspects: the quality of the obtained approximation set, the search efficiency, and the search behaviors of the given algorithm in the objective space. One way to evaluate the search efficiency is to calculate the computational effort (e.g., the number of objective function evaluations) required by the given algorithm to achieve the desired goal (e.g., an approximation set with expected quality). For example, the study in [24] employs the number of objective function evaluations needed by algorithms to produce a solution set with a hypervolume value which is larger than 90% of the hypervolume of the Pareto front. To study the search behaviors, one can plot the approximation sets obtained at different evolutionary stages in the objective space or the evolutionary curves of quantified quality indicators.
Benchmark problems for assessing the scalability of MOEAs and designing large-scale MOEAs (i.e., MOEAs for large-scale multi-objective optimization) are often expected to be scalable in terms of the number of decision variables while keeping an invariant Pareto front. The Zizler-Deb-Thiele (ZDT) [35] , Deb-Thiele-Laumanns-Zizler (DTLZ) [36] , walking-fish-group (WFG) [37] , CEC 2009 competition [38] and large-scale multi-objective and many-objective optimization problem (LSMOP) [39,40] test function family are commonly used in the specialized literature. These problems are characterized by multimodality, complicated Pareto sets and Pareto fronts, separability between decision variables and correlation between decision variables and objectives, deception and epistasis [12] , posing various difficulties for MOEAs to scale up to largescale MOPs.
Variable dependencies, including separability and control properties that characterize how a decision variable is correlated to the convergence and diversity, are important aspects when considering scalability. A large-scale MOP is separable if and only if each variable can be optimized independently, while there exist correlations between at least two decision variables for non-separable problems [41] . Thus, it is beneficial to solve a large-scale MOP with high separability by decomposing it into several small-scale problems. The separability of some popular benchmarks has been studied recently [42,43] . On the other hand, the decision variables of an MOP can be categorized into three types in terms of how they are mapped to the objective space, i.e., the position variables, the distance variables and the mixed variables [41] . If modifying a variable on its own only results in non-dominated objective function vectors, it is called a position variable. If modifying a variable on its own only results in dominating or dominated objective function vectors, it is called a distance variable. The other decision variables are called mixed variables. Thus, this property allows one to separate the convergence and diversity aspects of sets of solu-tions for an MOP.
It should be noted that although many-objective optimization ( ) and large-scale single-objective optimization ( ) [44−46] are related to large-scale multi-objective optimization, current studies in these two directions do not yield effective MOEAs for large-scale MOPs. On one hand, effective methods for high-dimensional objective space (i.e., many-objective optimization) may encounter degradation in high-dimensional decision space (i.e., large-scale multi-objective optimization) because the mappings between decision variables and objectives can be quite complicated [38,47] . On the other hand, instead of searching for a single optimal solution as that has been done in the large-scale single-objective optimization, MOPs aim at the Pareto set, implying a fundamental difference.

Scalability analysis
Generally, the main difficulty of MOEAs on largescale MOPs lies in that the search space of a problem expands exponentially with the dimensionality. Correspondingly, the landscape of the search space may become more complicated and the conflicts between multiple objectives may be more serious. Such an expanded and complicated search space may quickly exceed the search ability of the existing MOEAs, and thus cannot be fully explored within a reasonable time budget. Therefore, it is critical to consider the capabilities of MOEAs to properly scale when increasing the number of decision variables.
In the multi-objective research community, scalability with respect to the number of decision variables has rarely been considered before 2008. To our knowledge, the only two related works in this direction before that are the comparative study presented in [48] in which the performance of four MOEAs (strength Pareto evolutionary algorthm (SPEA) [49] , memetic-Pareto archive evolution strategy (M-PAES) [50] , Ishibuchi's and Murata's multipleobjective genetic local search (IMMOGLS) [5] and multiple-objective genetic local search (MOGLS) [51] ) are investigated on multi-objective 0/1 knapsack problems with up to 750 decision variables, and a study on the scalability of multi-objective estimation of distribution algorithms (MOEDAs) [26] . An increasing interest in studying the scalability of MOEAs has begun mainly since the systematic experimental studies were presented in [24,25], where eight representative MOEAs (including nondominated sorting genetic algorithm II (NSGA-II) [8] , SPEA2 [52] Pareto envelop based search algorithm II (PESA-II) [53] , Pareto archived evolution strategy (PAES) [54] , one multiobjective particle swarm optimizer (OMOPSO) [55] , multiobjective cellular genetic algorithm (MOCel)l [56] , generalized differential evolution 3 (GDE3) [57] and archive-based hybrid scatter search (AbYSS) [58] ) are examined on largescale benchmark problems with up to 2 048 decision vari-ables. By using the number of objective function evaluations required by an algorithm to reach an acceptable approximation of the Pareto front (i.e., an approximate set with its hypervolume larger than 98% of the hypervolume of the Pareto front) as the evaluation criterion of scalability, this work reveals the severe performance deterioration of the above eight MOEAs when increasing the number of decision variables from 8 to 2 048. The scalability of multi-objective evolutionary algorithm based on decomposition (MOEA/D) is examined in [59] by measuring the closeness of the obtained approximation set to the Pareto front under the same number of objective function evaluations. Unfortunately, the experiment results show that the efficacy of the algorithms examined above decreases as the number of decision variables increases. Although OMOPSO generally performs the best among the eight algorithms examined in terms of the efficiency in reaching the Pareto front, the number of objective function evaluations it requires can increase by more than 1 000 times, when the number of decision variables increases from 8 to 2 048. Thus, these algorithms could hardly scale to large-scale MOPs efficiently.
The deficiency of these traditional MOEAs has thus inspired further exploration of possible challenges faced by these algorithms. For an in-depth analysis of the reasons behind the inadequate scalability of these algorithms, a systematic and comprehensive analysis is presented in [28]. It empirically studies the performance of three traditional MOEAs (NSGA-II, MOEA/D [9] , S metric selection evolutionary multi-objective optimization algorithm (SMS-EMOA) [10] ) and two MOEAs for largescale MOPs (weighted optimization framework-enhanced speed-constrained multi-objective particle swarm optimization (WOF-SMPSO) [60] and random embedding NSGA-II (Re-NSGA)-II [30] ) when applied to solve problems with up to 8 192 decision variables from the ZDT, DTLZ, WFG and CEC 2009 competition benchmark sets. The experiment results suggest two important phenomena. First, it is found that the behaviors of the three traditional MOEAs can be roughly used as a classification indicator for the tested problems. More specifically, the problems can be categorized into three groups based on the analysis: convergence-focused, diversity-type I and diversity-type II problems. The detailed results can be found in [28]. Convergence-focused problems only require MOEAs to have stronger convergence as a set of well-distributed solutions can be obtained relatively easily once several good solutions have been found. Such problems can thus be mitigated using techniques employed in largescale single-objective optimization. Diversity-type I problems require MOEAs to have stronger diversification but ignore the correlation between position and distance functions. Diversity-type II problems pose a great challenge to the balance between diversification and convergence by considering a correlation between position and distance functions. The position and distance functions define the convergence and diversity aspects of sets of solutions for a problem [41] . For diversity-type problems, when the number of decision variables increases, achieving a good spread along the Pareto front becomes more difficult and the diversity of the algorithm deteriorates severely, which is called the diversity loss. Such diversity loss associated with diversity-type I problems is shown to be manageable by adopting NSGA-II, while the diversity loss associated with diversity-type II problems poses a great challenge for the three traditional MOEAs. A similar diversity loss is also reported for large-scale multi-objective distance minimization problems in [27,61] by examining NSGA-II, SPEA2 and MOEA/D. Second, when applying the two recent large-scale MOEAs on these problems, it is found that their performance deteriorates rapidly with the increase of the number of decision variables especially on diversity-type II problems, which highlights the need for further attention to these problems.
The separability of decision variables is a critical property for large-scale multi-objective optimization. It has been widely believed that fully-separable MOPs can be solved simply by optimizing each variable in a fully independent manner. Thus, some knowledge about separability can be conducive to algorithm design and significantly alleviates the curse of dimensionality. Motivated by this, a simple but efficient algorithm called dimension-wise MOEA (DW-MOEA) is presented in [28] so that one can empirically analyze the separability property of a blackbox problem. It first divides the original D-dimensional MOPs into D exclusive 1-dimensional sub-MOPs, and then optimizes each sub-MOP separately. Indeed, experimental studies on popular benchmarks including ZDT, DTLZ and WFG show that DW-MOEA is able to achieve significantly better efficiency than the representative NSGA-II, MOEA/D and SMS-EMOA when applied to most separable MOPs. Thus, DW-MOEA can be easily implemented and employed as a quick attempt to check whether it is worthy of developing more sophisticated algorithms. It should be noted that this does not mean separable MOPs are unworthy of studying. In fact, some of them can be difficult due to other problem characteristics such as multimodality [62] . In [26], the scalability of MOEDAs are examined on a set of boundedly-difficult additively-separable MOPs. Experiment results imply that even when the separability can be identified preliminarily, the examined MOEDAs can scale up exponentially with the number of decision variables due to the combinatorial growth in the number of Pareto-optimal solutions.
To summarize, this section reviews recent studies on scalability analysis and challenge analysis with respect to large-scale multi-objective optimization by taking into account different problem properties. It is expected that such investigations can facilitate the development of new and improved large-scale MOEAs, and can benefit the design and evaluation of algorithms for specific aspects.

Large-scale multi-objective evolutionary algorithms
To improve the scalability of MOEAs, basic ideas include simplifying large-scale MOPs via divide-and-conquer methodology and dimensionality reduction and improving the search ability of MOEAs by rebalancing the exploration and exploitation. Among them, divide-andconquer and dimensionality reduction approaches aim to divide or transform the original search space of an MOP, so that an algorithm only needs to search in one or more subspaces. Since the subspaces are usually low-dimensional, such methods can alleviate the curse of dimensionality caused by the increase in the number of decision variables. On the other hand, to avoid the issue that some Pareto-optimal solutions fall outside the subspaces constructed via space division or transformation, which would result in performance deficiencies, the enhanced search-based directly explore the original search space, but being equipped with enhanced search ability. These strategies have also been adopted to solve some applications of large-scale MOPs, as presented in Table 1. In this section, the progresses of large-scale MOEAs are reviewed from these three classes respectively. Vehicle routing problem [63] 288 Divide and conquer Capacitated arc routing problem [64] 375 Divide and conquer Feature selection [65] 512 Divide and conquer Flight assignment [66] 1 664 Divide and conquer Public transport network design [67] 2 000 Divide and conquer Resource allocation [68] 1 000 000 Divide and conquer Sparse regression [69] 1 080 000 Divide and conquer Engineering design problem [70] 145 Enhanced search Emergency logistics scheduling [71] 250 Enhanced search Software module clustering [72] 401 Enhanced search Service restoration in large-scale distribution systems [73] 15 440 Enhanced search

Large-scale MOEAs based on divideand-conquer methodology
In this type of large-scale MOEAs, cooperative coe-volution [29,74,75] has been one of the most popular approaches. Its main idea is to decompose the original highdimensional decision vector of the MOP into multiple low-dimensional sub-components, which are expected to be solvable more easily. More specifically, each decision variable of the problem is assigned to a species population based on some decomposition strategy and thus each species population optimizes a particular part of the large-scale MOP. After that, to assemble a full solution to the original MOP, the objective function evaluations of any individual from a species population is assigned based on its interactions with individuals from other species populations. Without prior knowledge about the objective functions, a key challenge for this kind of algorithm is how to decompose the problem while keeping the correlations among different species population minimal. If two or more decision variables with high correlation are assigned to different species populations and optimized separately, the search direction of the algorithm may be misleading, resulting in performance degradation. For this reason, various cooperative coevolutionary MOEAs with different decomposition strategies are proposed for largescale multi-objective optimization. Generally, the currently available decomposition approaches for large-scale multi-objective optimization can be divided into two groups: those that decompose a decision vector in a random way, and those that decompose a decision vector based on an in-depth analysis or learning of problem properties. Random decompositionbased methods are mainly proposed for the reason that dividing the decision vector into random groups could provide better results than applying a deterministic scheme, when dealing with non-separable problems [76] . Moreover, this kind of method can be further categorized into static and dynamic methods. Dynamic methods are proposed to further increase the chance of optimizing highly-correlated decision variables together. The pioneering work in this direction starts the research on largescale MOEAs by introducing the first framework of cooperative coevolution adopted within GDE3, namely cooperative coevolutionary GDE3 (CCGDE3) [29] . The decision variables are divided into multiple groups in a random way and the cooperation among the groups takes place in the objective function evaluation, where a random individual is obtained from each species population and then they are combined to form a complete set of solutions. Results on ZDT problems with up to 5 000 decision variables verify the efficiency of CCGDE3. A similar random decomposition-based method adopted within MOEA/D is proposed in [59] and verified on DTLZ problems with up to 1 200 decision variables. However, since it has been shown that ZDT and DTLZ problems are mainly separable, their effectiveness when applied to nonseparable MOPs needs further analysis. A random-based dynamic grouping strategy combined with MOEA/D for large-scale multi-objective optimization is presented in [77]. It contains a random group size selection strategy where a group size is dynamically selected with probability in the evolution process. The efficacy of the resultant algorithm is examined on WFG and CEC 2009 competition problems with up to 1 000 decision variables comparing with two MOEA/D variants.
Analysis-based decomposition approaches aim at extracting some useful knowledge for the design of a suitable decomposition. Two types of variable dependencies are commonly considered in the specialized literature [41] . One is the separability which considers correlations among different decision variables and the other is the control property towards convergence and diversity which considers correlations between decision variables and objective functions. As these properties are often hard to obtain in real-world problems, an empirical test method based on decision variable analyses is integrated into MOEAs, namely MOEA/DVA [78] . It first decomposes the decision vector into two groups based on control properties of decision variables, and then further divides the group consisting of distance variables into several groups via an interdependence variable analysis. In this way, a large-scale MOP can be effectively decomposed into smaller and simpler problems and be conducive for solving separable and partially separable MOPs. However, the analyses are conducted by studying dominance-based relationships between sampled solutions. Thus, it suffers from prohibitively high computation costs and its benefit deteriorates rapidly as the number of decision variables increases. Concretely, the total number of objective function evaluations required for the variable analyses is , where NCA indicates the evaluation number for control analysis, NIA indicates that for interdependence analysis. This makes MOEA/DVA difficult to scale up to large-scale MOPs within a limited computation budget. A new variable analysis method based on clustering to study the control property towards convergence and diversity is presented in [79]. It adopts the k-means method with features measured by the angles between sample solutions and the direction of convergence, where smaller angles indicate more contributions to convergence and larger angles to diversity. This angle-based clustering method is shown to be able to reduce the computation costs consumed for variable analysis and the corresponding large-scale manyobjective evolutionary optimization (LMEA) could provide better scalability. The total number of objective function evaluations required for the variable analyses is , where nSel indicates the number of selected solutions for variable clustering, nPel indicates the number of perturbations on each solution for variable clustering and nCor indicates the number of selected solutions for variable interaction analysis. Nevertheless, the computation costs required for variable analysis also makes it difficult for LMEA to scale up to large-scale MOPs.
Unlike the above cooperative coevolutionary approaches that optimize different groups of decision variables independently, WOF that optimizes the groups in a different way is presented in [60,80]. The main idea of WOF is to identify one promising reduced subspace based on variable grouping and problem transformation. It first decomposes decision variables into several groups, then assigns a weight for each group by applying the same weight value (in terms of multiplication) to every variable in the same group. After that, it adopts a problem transformation scheme to extend the concept of weighting the decision variables [81] to multiple objectives. Therefore, given well-defined grouping and transformation functions, the above weighting scheme is able to find a desirable reduced subspace. This is non-trivial, since such information can be hardly available in advance for realworld problems. To solve this issue, WOF adopts an optimization-based method to search for the best configurations. It formulates the above process as a weighting optimization problem to search for a best weight vector. Due to the variable grouping, the weighting optimization problem can be with low dimensionality and thus the sub-problems face by WOF can be small-scale. It is worth mentioning that instead of optimizing each group of decision variables independently as was done in cooperative coevolution based large-scale MOEAs, the grouping in WOF is to find a set of weight variables. Based on the weighting scheme, WOF optimizes the decision variables as a whole and thus can be applicable to non-separable MOPs. To overcome the drawback of limiting the search space due to the weighting scheme, WOF alternates two different phases of optimization: the weighting optimization step and the normal multi-objective optimization step. The performance of WOF is comprehensively studied by considering different grouping mechanisms, problem transformation functions and basic multi-objective optimizers. More concretely, four grouping mechanisms (random grouping, linear grouping, ordered grouping and differential grouping [82] ), three transformation functions (product transformation, p-value transformation, and interval-intersection transformation) and thee classical MOEAs (SMPSO, NSGA-II, GDE3) are examined to verify the effectiveness of WOF. These WOF-based algorithms are applied to solve ZDT, DTLZ, CEC 2009 competition and WFG problems with 1 000 decision variables and show superiority to state-of-the-art algorithms especially in terms of efficiency. After WOF, the problem transformation scheme is further studied and a novel problem reformulation method by extending the unidirectional search to a bi-directional search is proposed in [83].
There is also some work studying the influence of variable grouping inside genetic operators for large-scale multi-objective optimization. The work shown in [84] investigates the effect of grouping mechanisms inside the mutation operators based on the well-known polynomial mutation [85] . Three types of mutation operators are pro-posed: the linked polynomial, the grouped polynomial and the grouped linked polynomial mutations. The linked polynomial mutation binds together the amount of changes of the mutated variables per individual, assuming that it would be beneficial to alter them by the same amount. The grouped polynomial mutation separates variables into distinct groups and then applies mutation to a group as a whole, assuming that highly correlated decision variables should usually be altered together. The grouped linked polynomial mutation combines the above two concepts. Two popular grouping mechanisms, i.e., the ordered grouping and the differential grouping, are examined and the resultant mutation operators are adopted within NSGA-II and SMPSO. Experiment results conducted on WFG problems with 1 000 decision variables indicate that the proposed mutation operators can significantly improve the performance of existing MOEAs for large-scale multi-objective optimization.
In recent years, with the popularization of computing resources such as cloud computing platforms and highperformance computing servers, some large-scale MOEAs based on distributed or parallel mechanisms have also been proposed. By using distributed resources or parallel approaches to deal with multiple sub-problems obtained by decomposition, the computational efficiency of the algorithm can be increased significantly [86] . A message passing interface-based distributed parallel cooperative coevolutionary MOEA (DPCCMOEA) is presented in [87]. It first decomposes decision variables into several groups based on decision variable analyses similar to that in [78] and then optimizes each group by a species subpopulation. The optimization of these groups is carried out in a distributed platform. Within each species subpopulation, the individuals are further separated to several sets, each of which is assigned to a CPU, for a better parallelism degree. Experimental studies on DTLZ and WFG with 1 000 decision variables show that DP-CCMOEA could achieve a significant improvement in terms of computation time with more than 200× speedups compared with CCGDE3 and MOEA/DVA, when using the Tianhe-2 supercomputer with 360 cores. The work in [88] addresses large-scale MOPs by coevolutionary islands with overlapping search spaces. It follows the basic framework of cooperative coevolutionary MOEAs and each species subpopulation corresponds to an island. Two kinds of coevolution among islands are investigated: coevolution with disjoint islands where individuals in different subpopulations explore disjoint subsets of decision variables and coevolution with overlapping islands where overlapping of optimized decision variables is allowed. Experiments on ZDT problems with up to 2 048 decision variables show the efficiency of the proposed methods, and also show that when increasing the number of islands, the overlapping method significantly outperforms the disjoint one.
Generally, the divide-and-conquer based large-scale MOEAs have shown to be effective when applied to solve most large-scale MOPs where no or weak correlations are involved among decision variables. The problem is that many real-world MOPs have highly complicated correlations among decision variables, and thus the decomposition becomes harder to perform and the benefit of consuming large amounts of computing resources to analyze correlations may be marginal.

Large-scale MOEAs based on dimensionality reduction
The main idea of large-scale MOEAs based on dimensionality reduction is to transform the original decision space into a low-dimensional subspace, so that the original high-dimensional problem can be solved in the low-dimensional space. Two main difficulties in developing this kind of approach are: 1) How to ensure the consistency between the Pareto set of the transformed problem and the Pareto set of the original problem; 2) How to get the solution of the original problem after it is optimized in the transformed space. In this section, some recent dimensionality reduction approaches for large-scale multiobjective optimization are reviewed.
The intrinsic dimension of a problem which characterizes the number of decision variables that would affect objective functions significantly is a critical factor of a large-scale MOP. If most decision variables of a given large-scale MOP do not change objective functions markedly, such problem is said to have low effective dimensionality [30] . For this kind of problem, if an MOEA is able to find their low-dimensional effective subspaces, it is possible to improve the efficiency significantly. Random embedding is a dimensionality reduction technique which can exploit low effective dimensionality without knowing which dimensions are important in advance and its effectiveness and efficiency has been shown when applied to solve large-scale single-objective optimization problems with low intrinsic dimensionality [89,90] . Inspired by this, the random embedding technique is extended to largescale multi-objective optimization in [30]. A general, theoretically-grounded yet simple approach named multi-objective optimization via random embedding (ReMO) is presented. ReMO can scale any derivative-free MOEA to large-scale non-convex MOPs with low effective dimensions using random embedding. It performs the optimization in the low-dimensional effective subspace and evaluates the objective functions of a solution through embedding it into the original high-dimensional search space. Theoretical and experimental results on modified ZDT problems with 30 effective dimensions verify the scalability of ReMO for large-scale MOPs with low intrinsic dimensionality. The drawback is that its performance may deteriorate rapidly as the effective dimensionality becomes higher.
According to the Karush-Kuhn-Tucker condition [91] , the Pareto-optimal solutions of a continuous M-objective optimization problem form a (M-1)-dimensional piecewise continuous manifold under some mild conditions, e.g., the problem is differentiable [92] . That is, the original search space is reducible under some mild conditions since M is always much smaller than the number of decision variables D. Based on this regularity property, an unsupervised learning method to produce a low-dimensional representation of training points from a high-dimensional input space is presented in [93], namely Pareto-optimal subspace learning-based MOEA (MOEA/PSL). It focuses on sparse large-scale MOPs in which most decision variables of the Pareto-optimal solutions are zero. Unsupervised neural networks are adopted to solve sparse large-scale MOPs by learning the Pareto-optimal subspace. A sparse distribution of the decision variables is learned by a restricted Boltzmann machine based on the decision variables of the non-dominated solutions and a compact representation is learned by a denoising autoencoder. The combination of the sparse distribution and compact representation is regarded as an approximation of the Pareto-optimal subspace, which allows the optimization to be performed in the learnt low-dimensional subspace and the offspring solutions to be mapped back to the original search space by the two neural networks. Experiments on sparse large-scale MOPs with 0.1 sparsity and 10 000 decision variables verify the efficiency of MOEA/PSL on this particular kind of large-scale MOPs. Generally, the dimensionality reduction technique can improve the scalability of an MOEA with a suitable transformation. However, this is non-trivial since the problem properties of real-world problems are usually unknown. Thus, when applied to problems that are not consistent to the assumption of the employed transformation, the effectiveness of this kind of algorithm is not clear.

Enhanced search-based large-scale MOEAs
Different from the divide-and-conquer and dimensionality reduction-based approaches that improve the scalability by constructing low-dimensional space via division or transformation, the enhanced search-based large-scale MOEAs aim to improve the search ability of MOEAs by rebalancing the diversification and convergence. This kind of algorithms is suitable for dealing with large-scale MOPs or sub-problems appearing in the division and transformation that are difficult to divide and to reduce dimensionality. For example, even additively-separable MOPs can bring great challenges to the scalability of MOEAs [26] . Thus, an enhanced search ability is critical for large-scale multi-objective optimization.
As shown in Section 2, it is found that existing MOEAs deteriorate severely and suffer a diversity loss with the increase of the number of decision variables especially on a type of MOPs that have highly-correlated relationships among different decision variables. Motivated by this observation, a scalable MOEA with an enhanced diversification mechanism is developed in [28]. Its main idea is to encourage diversity by forcing the search toward different parts of the Pareto front. This is non-trivial because the search space increases rapidly and the mappings between the Pareto sets and Pareto front are complicated. A new solution generator based on a novel dual local search (DLS) mechanism is employed to solve this issue. In addition to the normal population which considers the comprehensive performance of the algorithm on both diversity and convergence, an external archive is provided for the new solution generator to emphasize the diversification ability. DLS is conducted on the archive to generate a set of diverse new solutions. Two key points shape the DLS. One is to search efficiently in the high-dimensional decision space, and the other is to keep diversity in the objective space simultaneously. First, considering that frequently changing the search direction of an individual may reduce its search efficiency, an indicator-based local search that limits the feasible search space of an individual based on the local improvement of a certain indicator is employed. Second, in order to ensure that different areas of the Pareto front can be fully explored, a local search is employed in the objective space. That is, the individuals in the archive are explored in a roll-polling way to avoid the diversity loss caused by local convergence. A synergy of the two parts constitutes the DLS mechanism. After that, DLS is integrated with the classical indicator-based SMS-EMOA to achieve a better balance between convergence and diversity, leading to the resultant DLS-MOEA. Experimental studies on ZDT, DTLZ, WFG and CEC 2009 competition problems with up to 8 192 decision variables verify the competitiveness of DLS-MOEA compared with a number of state-ofthe-art algorithms and its advantage in the balance between diversification and convergence. In particular, the results on CEC 2009 competition show the superiority of DLS-MOEA in terms of diversification ability. Gradient-based methods are possible choices for designing local search operators, which can make the search more directional but require the objective functions to be differentiable. To utilize this advantage, some gradient-based information is employed in [94] for better scalability. It defines descent direction as a unit vector that intuitively represents a decrement in all objectives or in most of them, and keeps the same value in the others. The gradient-based information is integrated with MOEAs and results in a two-stage algorithm named gradientbased multi-objective evolutionary strategy (GBMES). The first stage is to find a small set of Pareto-optimal solutions and the second is to reconstruct the entire front departing from a few points from such front. The gradient-based method is employed to perform a local search for a small number of superior solutions with the aim of accelerating the search toward the Pareto front. Experi-mental studies on DTLZ2 with up to 100 decision variables verify the scalability of GBMES.
Unlike most works that obtain a point in the objective space by calculating the objective function values of the decision vector, a model-based method for representing and searching non-dominated solutions is presented in [95], so that a solution can be obtained through direct sampling in the objective space. Its main idea is to construct Gaussian process-based inverse models that map all found non-dominated solutions from the objective space to the decision space. These inverse models are then used to generate new solutions by directly sampling in the objective space. Experiment results on WFG and modified ZDT and DTLZ problems with up to 104 decision variables show the robust search performance of this method. The drawback is that since the problem is simplified severely to facilitate inverse modeling, its performance on more complicated problems and problems with larger scales may not be optimistic.
There are also some attempts to customize search operators to improve the scalability of certain MOEAs [96,97] . The work in [98] focuses on a specific type of large-scale problems named large-scale sparse MOPs, where most decision variables of the optimal solutions are zero. To deal with this kind of problem and improve the sparsity of the generated solutions, a new population initialization strategy and genetic operators by taking the sparse nature of the Pareto optimal solutions into consideration are proposed. A test suite for large-scale sparse multi-objective optimization is also presented. Experiments on the test suite with up to 1 000 decision variables and four application examples (feature selection, pattern mining, critical node detection and neural network training problems) with up to 1 241 decision variables show its superiority in solving large-scale sparse MOPs.
Parallel large-scale MOEAs have also been studied within this category. Environment selection operators of MOEAs play an important role in the balance between convergence and diversity. Usually, environment selection operators are difficult to parallelize as the selection is mainly performed after all candidates are evaluated. Thus, this process can be computationally expensive when faced with large-scale multi-objective optimization. To accelerate the efficiency, a novel parallel framework, namely parallel EA (PEA), that separates the environmental selection operator from the entire evolutionary process is proposed in [99]. The main idea is to separate convergence and diversity aspects of the environmental selection. More specifically, a series of independent subpopulations are employed and optimized in parallel to pursue a high convergence ability, and only a few representative solutions from different sub-populations are selected as an archive to emphasize the diversification ability. The control properties including the concept of distance and position variables are used to remove the dependencies among sub-processes (sub-populations) and improve the parallelization of an algorithm. Experimental results on the LSMOP benchmark problems with up to 1 039 decision variables show the superiority of PEA in terms of the convergence, diversity and speedup.

Conclusions and future directions
In this paper, the evolutionary computation for largescale multi-objective optimization during the past decade of progress is reviewed. First, we summarize the scalability analysis and challenges of traditional MOEAs when applied to MOPs with a large number of decision variables. Second, the MOEAs for large-scale MOPs are divided into three categories based on the mechanism of improving the scalability of an algorithm: large-scale MOEAs based on divide-and-conquer, large-scale MOEAs based on dimension reduction and enhanced search-based large-scale MOEAs. The research progress within each category is reviewed and discussed respectively. In general, large-scale MOEAs, as an emerging research direction, are still in their initial stage of development. Even though a number of large-scale MOEAs have been proposed in recent years, there are still many open problems that need to be solved. Some directions that we believe are worth investigating within this area are provided in the following.
Large-scale MOP analysis system. The abovementioned scalability analysis and algorithmic characteristic analysis imply that different types of large-scale MOPs require different types of large-scale MOEAs. A system that can analyze the characteristics of a largescale MOP systematically and comprehensively would be conducive to providing fast algorithm design recommendations for large-scale black-box MOPs and achieving the best efficiency for a given problem. Although there are some preliminary attempts in [25,28], it still lacks a systematic analysis, and thus it is worthy of further research. On one hand, although characteristics of MOPs, such as whether an evenly distributed sample of parameter vectors in the search space maps to an evenly distributed set of objective vectors in fitness space (i.e., bias), many-to-one mappings, and dissimilar parameter domains, have been widely studied [36−38] , they are rarely studied from the perspective of algorithm scalability. Their respective impacts on the scalability of an algorithm are worthy of in-depth analysis. On the other hand, in the case that some characteristics are known to have a key impact on algorithm scalability, a tool that can detect these characteristics can help to provide some prior knowledge for a black-box problem. Therefore, research on such detection methods can be beneficial to the design of algorithms.
In-depth analyses and understandings of the effectiveness of existing methods. The current practice of validating large-scale MOEAs is mainly based on empirical studies on artificially constructed test problems.
Although they can reflect the difficulties of MOPs to some extent, they are not specifically designed to study the scalability of algorithms. Therefore, a test problem set that can well reflect the characteristics of algorithms in this respect is important for the research on algorithm scalability. In spite of this, without a clear understanding of the difficulties a real-world large-scale MOP presents to an MOEA, the characteristics of the existing test problems in the context of large-scale multi-objective optimization are not well understood. Thus, it is difficult to assert the veracity of empirical studies conducted on these test problems. In order to draw accurate conclusions, it is imperative that the test problems employed be well understood and be sufficiently representative of realworld problems. On the other hand, the theoretical analysis of MOEAs has only been scarcely studied, and most of them focus on the analysis of some relatively simple discrete optimization problems such as pseudo-boolean functions [100,101] . Therefore, more in-depth analyses and understandings of large-scale MOEAs are needed. To be specific, three aspects could be studied progressively. First, in addition to MOP benchmark problems, further studies of the performance of existing algorithms on realworld large-scale MOPs are needed. Results observed from these studies as well as prior knowledge of these tested problems can be studied to reveal some new challenges induced by the increase of the number of decision variables. Second, an in-depth analysis of these challenges should be conducted to discover new characteristics that are important to the scalability of an algorithm but are currently unknown. Such analyses would lead to a test problem set tailed for scalability study. Third, on the basis of the analyses as well as the newly constructed test problems, the scalability of existing algorithms can be reassessed and studied, resulting in a systematic inference about the advantages and disadvantages of existing methods.
Multi-method fusion. Many existing large-scale MOEAs are not mutually exclusive. For example, many enhanced search-based approaches can be used as an optimizer for sub-problems in decomposition-based approaches. The way to utilize the synergy of different methods and integrate them to achieve a better scalability has rarely been studied. Given a large-scale MOP, the difficulties involved in the problem can be analyzed systematically through the above analysis system. Multiple solution strategies could then be selected accordingly from existing approaches, or designed from scratch. Finally, an algorithm that integrates these strategies adaptively can be developed specifically for the target problem.

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.