Objective reduction based on nonlinear correlation information entropy

It is hard to obtain the entire solution set of a many-objective optimization problem (MaOP) by multi-objective evolutionary algorithms (MOEAs) because of the difficulties brought by the large number of objectives. However, the redundancy of objectives exists in some problems with correlated objectives (linearly or nonlinearly). Objective reduction can be used to decrease the difficulties of some MaOPs. In this paper, we propose a novel objective reduction approach based on nonlinear correlation information entropy (NCIE). It uses the NCIE matrix to measure the linear and nonlinear correlation between objectives and a simple method to select the most conflicting objectives during the execution of MOEAs. We embed our approach into both Pareto-based and indicator-based MOEAs to analyze the impact of our reduction method on the performance of these algorithms. The results show that our approach significantly improves the performance of Pareto-based MOEAs on both reducible and irreducible MaOPs, but does not much help the performance of indicator-based MOEAs.


Introduction
Loosely speaking, a many-objective optimization problem (MaOP) (Khare et al. 2003;Praditwong and Yao 2007) is a special kind of multi-objective optimization problem (MOP) with more than three objectives (Fleming et al. 2005;Hughes 2007). The large number of objectives in many-objective optimization brings many challenges to multi-objective evolutionary algorithms (MOEAs). Most solutions in the population of an MaOP are non-dominated (Ishibuchi et al. 2008), thus, the selection mechanism based on the Pareto dominance is less effective. Pareto-based MOEAs such as NSGA-II (Deb et al. 2002a) fail to solve MaOPs (Purshouse and Fleming 2003;Hughes 2005;Wagner et al. 2007;Khare et al. 2003). Without an effective dominance relation, MOEAs are unable to provide promising search directions (Deb and Jain 2014). Moreover, the growing number of objectives increases the computational complexity of Paretobased MOEAs. Although the non-dominated rank sort (Deb and Tiwari 2005), deductive sort (McClymont and Keedwell 2012), and corner sort (Wang and Yao 2014) have been proposed to reduce that complexity, the progress is still unsatisfactory.
To overcome the difficulty of MaOPs, the existing research can be divided into five classes: -Dominance relation modification The Pareto dominance is ineffective for MaOPs. Much work aims to modify the original dominance relation (Köppen et al. 2005;Kukkonen and Lampinen 2007;Sato et al. 2007;Farina and Amato 2002;Dai et al. 2015), but their performance is still less than satisfactory. -Decomposition-based MOEAs The main idea is to solve MaOPs by aggregation functions with a series of weight vectors to obtain several single-objective optimization problems (Zhang and Li 2007;Ma et al. 2014), but it suffers from poor performance on MaOPs with highly correlated objectives (Ishibuchi et al. 2009) because of the unsuitable arrangement of weight vectors (Ishibuchi et al. 2011a, b). -Indicator-based MOEAs With a metric as a single objective, indicator-based MOEAs can avoid employing the Pareto dominance (Zitzler and Künzli 2004;Bader and Zitzler 2011;Wagner et al. 2007;Gong et al. 2014). However, I ε+ in IBEA Zitzler and Künzli (2004) and I H in HypE Bader and Zitzler (2011) provide unsatisfactory diversity on the Pareto front (PF) (Hadka and Reed 2012). -Incorporation with decision makers Usually, decision makers do not need all the optimal solutions of MaOPs (Cvetkovic and Parmee 2002). They can input their interested regions or preferences to obtain parts of the nondominated solution set (Sindhya et al. 2011;Ben Said et al. 2010;Koksalan and Karahan 2010;Wang et al. 2013a;Kim et al. 2012;Karahan and Koksalan 2010;Giagkiozis and Fleming 2014). Additionally, decision makers have different targets for different objectives and multi-target search was employed (Wang et al. 2013b). -Objective reduction For some MOPs, unnecessary objectives can be ignored without changing their Pareto sets (Gal and Hanne 1999). Thus, the difficulty caused by a large number of objectives of a MaOP can be reduced (Fonseca and Fleming 1995;Coello Coello 2005;Deb 2001), and the existing Pareto-based MOEAs for MOPs with low-dimensional objectives can be used.
Objective reduction aims to make the problems with redundant objectives easier to solve by the existing MOEAs. The basic goal of objective reduction is to select the smallest set of objectives without changing the Pareto set of the original problem (Gal and Hanne 1999). In Brockhoff and Zitzler (2009), there is a related, but different explanation of this aim through the correlation among objectives, i.e., trying to obtain the smallest set of conflicting objectives. Based on different understanding of objective reduction, the existing objective reduction techniques can be divided into three classes: -Dominance relation preservation-based objective reduction It is based on a measure for the changes of the dominance structure, which obtains a minimum subset of objectives with the preserved dominance relation (Brockhoff and Zitzler 2006). An additional term δ is adopted to measure the difference between the dominance struc-tures of two subsets. However, the technique can only be applied to the linear objective reduction. -Pareto corner search The Pareto corner search evolutionary algorithm (PCSEA) (Singh et al. 2011) is a newly proposed objective reduction approach. It only searches the corners of PFs. Then, it uses the obtained solutions to analyze the relation among objectives. Finally, it outputs a subset of non-correlated objectives. PCSEA is an off-line method. -Machine learning-based objective reduction As the process of objective reduction can be seen as feature selection, this method focuses on the objectives with negative correlation and uses an improved correlation matrix of objectives to measure the conflict degree of two objectives (López Jaimes et al. 2008). With the obtained correlation matrix as distances, the method divides those objectives into neighborhoods. Then, it adopts a qneighbor structure to select objectives. However, q has to be set in advance. Other machine learning techniques for dimension reduction, such as principal component analysis (PCA) and maximum variance unfolding (MVU), have also been applied to objective reduction (Saxena and Deb 2007;Deb and Saxena 2005). These objective reduction methods use machine learning techniques to select conflicting objectives according to the correlation information (the correlation matrix and correntropy matrix, for instance).
The aforementioned objective reduction approaches have their disadvantages. For example, both approaches in Brockhoff and Zitzler (2006) and PCSEA are off-line approaches for supporting decision makers after running MOEAs. This paper focuses on online objective reduction approaches. Although interpreting objective reduction through the correlation (Brockhoff and Zitzler 2009) is not exactly the same as the original definition (Gal and Hanne 1999), it covers the majority of cases in practice and is easy to apply to online objective reduction approaches. Therefore, we follow this interpretation of objective reduction and use the nondominated population in every generation as the learning dataset to identify the redundant objectives in this paper.
Objectives that can be reduced are either linearly or nonlinearly correlated, mostly nonlinearly correlated (Saxena et al. 2013). However, the majority of the existing approaches use linear statistical tools to measure both linear and nonlinear correlation. In such cases, nonlinear correlation would be weakened by the linear description, which misleads the reduction.
In this paper, we use the same measurement for both linear and nonlinear correlation; thus, the performance of online objective reduction approaches can be improved. We find NCIE (Wang et al. 2005) to be a very robust measure for both linearly and nonlinearly correlated datasets, which has been applied to the analysis of neurophysiological signals (Pereda et al. 2005), the quantification of the dependence among noisy data (Khan et al. 2007), etc. Therefore, we adopt NCIE as a correlation measure in objective reduction and study its impact on online objective reduction approaches.
The rest of the paper is organized as follows. We first show different cases of redundant objectives in MOPs in Sect. 2. In Sect. 3, NCIE is introduced. In Sect. 4, our approach will be described in detail. Section 5 reports the experimental results, in which the behavior of our approach is analyzed and discussed. Finally, Sect. 6 gives the conclusion and points out the future work.

Conflicting and redundant objectives 2.1 Conflicting objectives
Simply, the conflict between two objectives means that the improvement on one objective would deteriorate the other objective. The conflict might be global or local (the range of conflict) (Freitas et al. 2013), and linear or nonlinear (the structure of correlation) (Saxena et al. 2013).

Redundant objectives
If there is no conflict between two objectives, one of them can be viewed as a redundant objective for this MOP. Generally, the redundant objectives in an MOP are defined as the objectives that can be ignored without changing the structure of its original PF (Gal and Hanne 1999).

Reducible MOPs
Many-objective optimization problems (MOPs) with redundant objectives are reducible MOPs, which can be applied objective reduction techniques. If such MOPs can be reduced to MOPs with low-dimensional objectives, existing MOEAs can be used.
The above definition is not strictly mathematical. However, as Brockhoff and Zitzler (2006) mentioned, the existing literature has not clarified two main problems for objective reduction. One is the effect of objective reduction on dominance, and the other is the evaluation of the subset of objectives after reduction.
To model objective reduction mathematically, the redundant objectives are considered as the objectives positively correlated to some other objectives in the MOP (Brockhoff and Zitzler 2009). Actually, this transformation is not strictly equivalent. Table 1 shows some MOPs with a redundant objective f 3 . Their parallel coordinate graphs are shown in Fig. 1. In Cases 1 and 2, f 3 is positively correlated to f 1 , which can be reduced. In Case 3, f 3 is constant and non- correlated to any objective. In Case 4, f 3 is in conflict with f 1 and f 2 in most parts (locally), but it does not contribute to the PF structure, because f 3 is constant and non-correlated to any objective on the PF. However, Cases 3 and 4 cannot be covered by the above definition. Case 3 is special in the real world, and Case 4 is hard to be detected during the search. Therefore, we only focus on Cases 1 and 2 in this paper.
In Cases 1 and 2, the linear and nonlinear correlation are both important in objective reduction. However, the majority of the existing approaches employ linear tools to describe all the scenarios (Deb and Saxena 2005), which results in poor performance for nonlinear correlation. Comparing Cases 1 and 2, f 3 is a redundant objective for f 1 . In Case 1, f 3 is linearly correlated to f 1 , but nonlinearly correlated to f 1 in Case 2. If we use linear tools to evaluate the correlation degree, the obtained conflict degree in Case 2 is smaller than that in Case 1. It is obviously less reasonable. That is the reason why we use NCIE to capture a more general correlation for objective reduction.

Nonlinear correlation information entropy
Mutual information entropy is a kind of generalized correlation; it is sensitive to different kinds of relation, which is shown in Eq. (1) (Maes et al. 1997), where X (with domain of L possible values) and Y (with domain of M possible values) are two discrete random variables, H (X ) is the information entropy of X , which is defined as Eq. (2). H (X, Y ) is the joint entropy of X and Y shown as Eq.
(2), p i is the probability of X with the ith value. Similarly, p i j is the probability of X with the ith value and Y with the jth value in Eq. (3).
The authors in Wang et al. (2005) proposed a new nonlinear correlation information entropy for multi-variable analysis. The results show that the new entropy quantizes the correlation in [0,1] for both linear and nonlinear cases (Wang et al. 2005).
Nonlinear correlation information entropy (NCIE) firstly divides variables X and Y into b rank grids. Then, the probabilities can be sampled by the counts in those grids. Thus, p i j in the i jth grid can be calculated. The joint entropy is shown in Eq. (4), where N is the size of the dataset, n i j is the number of samples distributed in the i jth rank grid, and b is set to √ N . NCIE is shown in Eq. (5), where H r (X ) is the revised entropy of X as Eq. (6). Thus, the only parameter b is set self-adaptively, which makes NCIE parameter self-adaptive. NCIE can also be calculated by a simple formula as Eq. (7).
Based on the NCIE matrix , the relation among K variables can be analyzed.

Objective reduction based on nonlinear correlation information entropy 4.1 Basic idea
Our proposed method uses NCIE as a metric to reduce redundant objectives, whose flowchart is shown in Fig. 2. The proposed method first analyzes the correlation of objectives using the non-dominated population as its dataset. Based on the correlation of objectives, the method obtains a subset of conflicting objectives for MOEAs. Then, MOEAs only focuses on this subset of objectives, which is updated by the objective reduction approach in every generation. The correlation analysis and objective selection are two key steps in an objective reduction approach. For correlation analysis, a majority of the existing approaches are based on the correlation matrix, which is only used for the linear correlation measure. As NCIE can handle both the linear and nonlinear correlation, we adopt it to measure the correlation in our approach. For objective selection, we abandon those common techniques in the existing approaches (such as PCA and feature selection) and design a straightforward method to select conflicting objectives (explained in Sect. 4.3).
The NCIE-based correlation analysis is based on the nondominated population in every generation; thus, the conflict between objectives are local rather than global (López Jaimes et al. 2014). As Sect. 3 shows, the conflicts might be local in some cases. Thus, our proposed method could reduce some non-globally redundant objectives but locallyredundant objectives. During the execution of MOEAs, the conflict degree would be updated by the value of NCIE. In short, the basic idea of our approach is to keep the most conflicting objectives and omit the most positively correlated objectives in the NCIE matrix during run time.

Correlation analysis
Although NCIE can describe both the linear and nonlinear correlation between objectives, it cannot describe their conflicting relation. NCIE cannot be used directly in its original version for our aim. In view of this, NCIE is modified by adding the information of covariance. Covariance is valued in [−1, 1], whose sign describes whether two variables are in conflict. The modified NCIE is shown in Eq. (8), where cov i j is the i jth element in the correlation matrix.
In the modified NCIE, the sign is from covariance, whose role is to show whether two objectives are in conflict. The modified NCIE can describe the conflict degree. If the modified NCIE of two objectives is a large positive value, the two objectives are highly positively correlated. If the modified NCIE of two objectives is a large negative value, the two objectives are highly conflicted. If two objectives have a modified NCIE around zero, they are not correlated. In this case, the sign of the modified NCIE is not very important, because the difference between the values with different signs is small. With the modified matrix, we can use either a threshold or a classification method to determine the correlation degree of two objectives.

Objective selection
With the modified NCIE matrix, our approach selects the most conflicting objectives for MOEAs. Our approach is applied in every generation of MOEAs to update the correlation information among objectives. The details are shown in Algorithm 1, where S r is the selected objective set and S t is a temporary set. After the calculation of the modified NCIE matrix, our approach selects the most conflicting objective, which is the objective with the largest absolute sum of its negative NCIEs to other objectives. Then, it omits the objectives that are positively correlated to the selected objective. Finally, our approach outputs the selected objectives. In the process of omitting objectives, a threshold T is applied to determine whether two objectives are positively correlated. The effect of T is analyzed in Sect. 5.2.3.
Algorithm 1 Pseudo code of the objective selection in our approach. If all the elements in R N are positive, 7: J = argmax(sum(R N (1 : m, j))). Find the most representative objective 8: Find the most conflicting objective with remaining objectives 10: End 11: Move f J from S t to S r . 12: Find set F with the objectives correlated to f J . 13: Delete set F from S t . 14: End To show the process of our objective selection method, we take a modified NCIE matrix on DLTZ5(2,5) (Deb et al. 2002b) in Table 2 as an example (T is set as 0 , because it has the largest absolute sum of its negative NCIEs to other objectives ( f 5 has the most conflicting degree with other objectives). There is no objective positively correlated to f 5 ; thus, there is not a redundant objective with f 5 in the remaining objectives. Then, because it has the largest absolute sum of NCIEs to other objectives. Objectives f 1 , f 2 , f 3 are omitted (S t = ∅); they are all positively correlated to f 4 (not in conflict with f 4 ) as redundant objectives, because R N (4,1) > T , R N (4,2) > T , and R N (4,3) > T . Finally, our approach obtains the reduced objective set { f 5 , f 4 }, which represents the main conflict in DLTZ5(2,5).
Our approach is different from the approach that outputs a fixed number of objectives (Deb and Saxena 2005). It selects different numbers of objectives according to the situation of the current population, which is more robust for different problems.

Classification for correlated and non-correlated objectives
Parameter T ([0, 1]) is the threshold to determine the correlation degree between objectives. It is important to use a suitable T , because if T is too large, some redundant objectives may be regarded as non-reducible, or some conflicting objectives would not be retained. It is difficult to set T manually in advance without any knowledge of the optimization problem. Actually, the whole issue should be regarded as a classification problem that separates the objectives correlated to objective f i from those non-correlated to objective f i . We avoid the manual setting of T by Algorithm 2. As the clustering problem is a one-dimensional problem of a small size, any clustering technique can fulfill the task of cutting across the least dense area between the two clusters. In this paper, we use K-means for clustering. In practice, there is the situation that all the objectives are non-correlated or correlated to f i . Therefore, we add two virtual values 0 and 1 during the clustering to handle such cases.
Algorithm 2 Pseudo code of classifying objectives.
1: Input: R-modified NCIE of the remaining objectives to objective f i (R = R N (i, j), ( f j ∈ S t ) and ( f i ∈ S r )), m-number of objectives. 2: Output: R with classification for correlated and non-correlated objectives. 3: Add values 0 and 1 (the boundary of non-correlated and correlated) to R N as R * . 4: Sort R * to R in an ascending order. 5: Classify R into two clusters (R [1 : k] and R [k +1 : end]). In other words, cut across the least dense area between the two clusters. 6: R [1 : k] without 0 is the cluster of objectives non-correlated to f i , R [k + 1 : end] without 1 is the cluster of objectives correlated to f i .

Computational complexity
For an m-objective problem with a solution set of size N (in most cases, N is larger than m), the NCIE matrix calculation in the correlation analysis has O(m 2 N ) complexity, and the objective selection has O(m 2 ) complexity. Therefore, the total complexity of our method is (O(m 2 N )) per generation.

Test problems, metrics, and settings
As the DTLZ problems (Deb et al. 2002b) and WFG3 (Huband et al. 2006) are MOPs with different numbers of objectives, we adopt them as the test problems in our experiments. Among these test problems, DTLZ1-4 are irreducible, and DTLZ5 (Deb and Saxena 2005) and WFG3 are reducible. DTLZ5(I ,M) is an M-objective problem with I conflicting objectives. We use IGD (the average distance from the true PF to the obtained PF) (Van Veldhuizen and Lamont 1998; Zhang et al. 2008) to evaluate both convergence and diversity in our experiments. The calculation of IGD is shown in Eq. (9), where PF true is a reference set that is uniformly sampled from the true PF. Most of the test problems in this paper can be reduced, whose PFs are degraded on their dimensions, and we uniformly sample PF true in its objective-reduced space to guarantee the accuracy of IGD. Taking DTLZ5(2,M) as an example, it can be reduced to { f M−1 , f M }; we sample PF true in the space of f M−1 and f M and then the values of other objectives can be calculated. It is worth noting that PF true and PF obtained in IGD are both of their full dimensions rather than that after objective reduction to guarantee fair comparison: All the algorithms in the following subsections are repeated for 30 independent runs and stop after 200 generations (SBX crossover (η = 15) with probability 1 and polynomial mutation (η = 15) with probability 0.1). The population size is 200 (due to larger numbers of objectives in the test problems of following subsections, we will enlarge the size of the population later). The machine characteristics in our experiments are a T5470 1.6-GHz CPU and 1G RAM.

Characteristics of DTLZ5
As we know, the objectives of DTLZ5(2,M) can be reduced to a two-objective problem with only two selected objectives To demonstrate this, we compare NSGA-IIs on DTLZ5(2,10)s with only two selected objectives { f i , f 10 }.
The average IGD values of 30 independent runs are shown in Fig. 3. Although DTLZ5(2,10)s with { f i , f 10 } all have two objectives, the difficulties are not at the same level. In Fig. 3, we find the increasing i decreases the difficulty of DTLZ5(2,10), which can be proved theoretically (Deb et al. 2002b). The reduction results by different approaches are shown in the following subsections.

Correlation analysis
The modified NCIE matrix plays an important role in the correlation analysis of our approach. The correlation matrix is another popular metric of correlation of multiple variables. We embed these two different matrices separately into our objective reduction approach to show the behavior of the modified NCIE matrix. The two objective reduction approaches are both embedded in NSGA-II. The differences in reduction performance and execution time are summarized below. The number of objectives after reduction over 200 generations is shown in Fig. 4. For the five-objective DTLZ5, both approaches based on the modified NCIE and correlation matrices reduce the problem to a three-objective optimization problem. However, when the number of objectives increases, the approach based on the modified NCIE matrix reduces more redundant objectives than that based on the correlation matrix. For the 50-objective problem, the approach based on the modified NCIE matrix reduces it into a three-objective problem, while the correlation matrix-based approach can only reduce the number of objectives to 14.
To investigate the differences of the modified NCIE and correlation matrices further, we show the number of times of each objective retained over 200 generations after objective reduction in Fig. 5. The approach based on the correlation matrix cannot reduce DTLZ5 to a two-objective problem except for five-objective DTLZ5. For the five-objective DTLZ5, the approach based on the modified NCIE matrix retains { f 1 , f 5 }, which is not the best reduction result. However, when the number of objectives increases to 50, the chance of retaining { f 1 , f M } by the approach based on the modified NCIE matrix decreases, but that of retaining { f M−1 , f M } increases. According to Fig. 5, the modified NCIE matrix performs better than the correlation matrix on keeping objectives when the number of objectives is large, e.g., 50. Figure 6 shows the execution time of NSGA-IIs with our objective reduction approaches based on modified NCIE and correlation matrices on DTLZ5(2,M). With the increasing number of objectives, both approaches increase their exe- cution time. However, the approach based on the modified NCIE matrix increases its execution time much slower than the approach based on the correlation matrix.
From the above results, we find that the approach based on the modified NCIE matrix performs better than the approach based on the correlation matrix for the problems with a large number (e.g., 50) of objectives, because the correlation matrix cannot provide clear correlation information for the subsequent objective selection.

Classification for objectives
The classification of objectives plays an important role in our approach. We compare our approaches with different T s and the classification method. The experiment is conducted on the reducible and irreducible problems (DLTZ5(2,10) and DLTZ2 with 10 objectives). Our objective reduction approach is embedded in NSGA-II. For the reducible problems, our objective reduction approach aims to reduce the most redundant objectives. For the irreducible problems, our objective reduction approach aims to keep the right correlation of objectives. Therefore, we adopt the median number of objectives after reduction over 200 generations to evaluate the behavior of our approach. Figure 7 shows the number of objectives after reduction over 200 generations in 30 independent runs with different T s. As the classification method has no T , which cannot be shown on the horizontal axis as other fixed T s, hence we show it on the horizonal axis by a special position outside the interval [0, 1]. Comparing the two sub-figures, we find that the size of T affects the behavior of our approach on reducible problems more than irreducible problems. Our approach decreases its performance with increasing T . The classification method obtains the best performance of objective reduction. For the globally irreducible problem DTLZ2, there are still some locally redundant objectives and our approach reduces three objectives. If T is set too small, some conflicting objectives are considered as redundant objectives, which would lead to the wrong dominance structure. If T is set too large, some redundant objectives would not be removed, which would waste the computational expense. The suitable value of T varies across problems. Therefore, a robust classification is very important to our approach.

Objective selection
We adopt a direct method to select the most conflicting objectives according to the obtained NCIE matrix from correlation analysis. The objective reduction approach based on PCA (Saxena et al. 2013) is a well-known one. Therefore, we compare our objective selection method with the PCA method (using the same setting as in Saxena et al. 2013). As NCIE is a nonlinear metric, we also compare it with the kernel PCA (KPCA) method (with Gaussian kernel function). DTLZ5(2,M) (M = 5, 10, 20, 30, 50) is chosen as the test  problem in this subsection. We embed all the approaches in NSGA-II. The differences in reduction performance and execution time are summarized below.
The number of objectives after reduction over 200 generations is shown in Fig. 8. DTLZ5(2,M) can be reduced to a two-objective problem. All the approaches obtain two objectives for the DTLZ5 with five objectives. The KPCA-based method cannot reduce any objectives for the DTLZ5 with 10 and 20 objectives, but reduces the DTLZ5 with 30 and 50 objectives to 25 objectives. In contrast, our approach and the approach based on PCA have better performance. When the number of objectives increases, our approach reduces more objectives than the approach based on PCA. For example, our approach reduces DTLZ5(2,50) to a three-objective problem, which is much better than the approach based on PCA.
In addition to the number of objectives after reduction, the obtained objectives affect the final results. We show the number of times of each objective retained after objective reduction in Fig. 9, from which we can find their different strategies of reducing objectives. As the approach based on KPCA does not have a good reduction behavior, the discussion is now focused on our approach and the approach based on PCA. For the DTLZ5s with 5, 10, 20, and 30 objectives, the approach based on PCA reduces the problems to { f 1 , f M }. Our approach reduces them to { f M−1 , f M } by more chances than other compared approaches. For 50-objective DLTZ5, the approach based on PCA obtains a three-objective set { f 1 , f 49 , f 50 }, while our approach obtains { f 49 , f 50 }.
Our proposed approach can reduce objectives more efficiently than the approach based on PCA. This is reflected through two aspects, one is the number of objectives after reduction, and the other is the selected objectives. Figure 10 is the execution time of these methods in 30 independent runs. For the DTLZ5 with five and ten objectives, our approach and the approach based on PCA use almost the same time. With the increasing number of objectives, the approach based on PCA requires longer execution time than our approach. Because of the poor performance of KPCA, few objectives can be reduced and its execution time is much longer than the other two methods.
We find that the objective selection in our approach performs better than the approach based on PCA in different aspects (objective reduction performance and execution time). The main reason why our objective selection approach outperforms the approach based on PCA is that our approach reduces more objectives than the approach based on PCA. Because of the larger number of objectives obtained by the approach based on PCA, it cannot reduce the difficulty of the original problem.
KPCA, as a nonlinear method, maps the data to a high-dimensional space to keep its nonlinear characteristic. However, its kernel function has to be chosen in advance, which affects its performance significantly. From the results, we can find that the Gaussian kernel function is not suitable for DTLZ5.

Population size
Our approach uses NCIE to measure the correlation among objectives based on the population during the execution of MOEAs. To study the effect of the population size on the performance of NCIE, we embed our approach in NSGA-II with different population sizes (100 and 200). DTLZ5(2,M) (M = 5, 10, 20, 30, 50) is chosen as the test problem in this subsection. We show the probability of each objective retained for 30 independent runs in Fig. 11. In the cases of 100 and 200 solutions, the retained objectives are very similar on all the tested DTLZ5 problems. In other words, the influence of the population size on the performance of NCIE is small.

Experiments on performance
In this subsection, we apply our approach to both Paretobased and indicator-based MOEAs (NSGA-II Deb et al. 2002a and IBEA (I ε+ -based) Zitzler and Künzli 2004). Both the reducible and irreducible problems are tested in the following experiments.

Pareto-based MOEAs
We embed our NCIE-based approach in NSGA-II on DTLZ1-5. The results are analyzed by Mann-Whitney U test (Hollander and Wolfe 1999). The significant ones are in boldface (the significant level is 0.05). Table 3 shows the IGD values of the NSGA-II with our approach and the original NSGA-II on DTLZ5. DTLZ5 is a reducible problem. We find the original NSGA-II cannot solve the MaOPs with more than 10 objectives, whereas the NSGA-II with our approach can solve the MaOPs with 50 objectives. Our objective reduction approach reduces these problems into easier problems. As a result, the NSGA-II with our approach performs better than the original NSGA-II in most cases. However, the NSGA-II with our approach performs worse than the original NSGA-II on DTLZ5(7,10) because DTLZ5(7,10) has few objectives that can be reduced.
There are no objectives that can be ignored globally in the irreducible problems DTLZ1-4. To show the behavior of our approach clearly, the NSGA-II with random objectives reduced is also compared with the NSGA-II with our approach and the original NSGA-II on DTLZ1-4 in Table 4. After objective reduction, the NSGA-II with our approach can handle the problem with more objectives than the original NSGA-II. Comparing the performance of the NSGA-II with our approach and the original NSGA-II on five-objective DTLZ problems, the IGD values are not greatly improved by  our approach except the hard-to-converge problem DTLZ1. Our objective reduction method did not seem to be effective on irreducible problems. Comparing the IGD values of the NSGA-IIs with our approach and random objectives reduced, the former outperforms the latter significantly. However, some objectives can be reduced locally; thus objective reduction to some extent promotes the population to better convergence. In short, the objective reduction for some irreducible problems can still make them easier for MOEAs by exploiting local correlation among objectives.

Indicator-based MOEAs
IBEA (Zitzler and Künzli 2004) is an indicator-based MOEA, which is well known for its ability for MaOPs. We embed our objective reduction approach in IBEA to analyze its effects on indicator-based MOEAs. The significant results are in boldface after being analyzed by Mann-Whitney U test (Hollander and Wolfe 1999) (the significant level is 0.05).   Table 5 is the result of the IBEA with our approach and the original IBEA on DTLZ5. For the reducible problem DTLZ5, no statistically significant difference is detected between the IBEA with our approach and the original IBEA, although our objective reduction approach seems to reduce the performance of IBEA.
As DTLZ1-4 are irreducible, the same random objective reduced IBEA as that in Sect. 5.3.1 is compared with the IBEA with our approach and the original IBEA. The results are shown in Table 6. There is no statistically significant difference among the three IBEAs.
The number of objectives appears to have little influence on the behavior of IBEA. In other words, the effect of our approach is not significant on IBEA for either reducible or irreducible problems.

WFG problems
WFG3 (Huband et al. 2006) can be reduced to a two-objective optimization problem, which is a different reducible problem from DTLZ5. We embed our approach in both NSGA-II and IBEA on WFG3 with 5-50 objectives. Table 7 shows the results of the MOEAs (NSGA-II and IBEA) with our approach and the original MOEAs in terms of IGD. The results are similar to those on DTLZ5 in Sects. 5.3.1 and 5.3.2. Our approach significantly improves Pareto-based MOEAs, but not indicator-based MOEAs.

Discussion
Generally, the advantage of IBEA is its good convergence ability on MaOPs, which NSGA-II cannot achieve. However, IBEA cannot obtain the results of good diversity because of the poor performance from I ε+ (Hadka and Reed 2012). NSGA-II pays more attention to this aspect. Our experimental results support these two points. The aim of our approach is to improve the convergence ability while maintaining the good diversity of NSGA-II.
Our approach works well with Pareto-based MOEAs, but not with indicator-based MOEAs. Pareto-based MOEAs have difficulties on MaOPs because of ineffective Pareto dominance for large numbers of objectives. Our approach can reduce the redundant number of objectives to make the Pareto dominance work again. However, indicator-based MOEAs do not suffer from the Pareto dominance problem, even though the large numbers of objectives decrease their performance too.
For the reducible problems such as DTLZ5 and WFG3, our approach selects the most conflicting objectives, which decreases much computational cost. Thus, the NSGA-II with our approach can solve problems with more objectives, which  "-" means that the algorithm cannot obtain the solution set within limited computational time. The significant results are in boldface (significant level = 0.05) cannot be solved by the original NSGA-II. For example, DTLZ5(2,50) can be reduced to a two-objective problem, and the computational cost is decreased to 4 % of that for a 50-objective problem. However, with the increasing I in DTLZ5(I ,M), the NSGA-II with our approach decreases its convergence ability. This is because the difficulties of those problems are still high after objective reduction. For the irreducible problems such as DTLZ1-4, the NSGA-II with our approach can solve those with 25 objectives, whereas the original NSGA-II can only solve the problems with five objectives. Furthermore, the NSGA-II with our approach can also obtain slightly better results than the original NSGA-II, because our approach was able to capture and exploit local objective interactions.

Conclusion
Since the correlation among redundant objectives might be either linear or nonlinear, the existing linear objective reduction approaches have limitations. We have proposed a novel objective reduction approach based on NCIE, which can handle both linear and nonlinear correlations. In our approach, we employ NCIE, a nonlinear metric, to measure the correlation among objectives. In addition, we use a simple objective selection method without any pre-defined parameter, which results in the robustness of our approach. The experiments on DTLZ5 in Sect. 5.2.4 shows that our approach can select the most conflicting objectives for reduction. Our approach can be embedded in any MOEA to reduce the number of objectives, as demonstrated by the experiment on NSGA-II and IBEA. The experimental results show that our approach improves Pareto-based MOEAs (NSGA-II) on reducible problems (DTLZ5 and WFG3), but cannot improve the performance of indicator-based MOEAs (IBEA). At the same time, our approach also improves the performance of Pareto-based MOEAs on the irreducible problems (DTLZ1-4) slightly, because the difficulty of the original problems decreases locally, which promotes convergence.
However, there are some disadvantages of the NCIE approach that we have to overcome in our future work. (1) The reduction performance of our approach on the prob-lems with more than 20 objectives is still not ideal. (2) The improvement of our approach on indicator-based MOEAs needs to be strengthened. (3) It will be interesting to evaluate our techniques on hypervolume-based IBEAs. (4) It will be useful to apply our approach to search knee areas of MaOPs (Bechikh et al. 2011).