On the effect of normalization in MOEA/D for multi-objective and many-objective optimization

The frequently used basic version of MOEA/D (multi-objective evolutionary algorithm based on decomposition) has no normalization mechanism of the objective space, whereas the normalization was discussed in the original MOEA/D paper. As a result, MOEA/D shows difficulties in finding a set of uniformly distributed solutions over the entire Pareto front when each objective has a totally different range of objective values. Recent variants of MOEA/D have normalization mechanisms for handling such a scaling issue. In this paper, we examine the effect of the normalization of the objective space on the performance of MOEA/D through computational experiments. A simple normalization mechanism is used to examine the performance of MOEA/D with and without normalization. These two types of MOEA/D are also compared with recently proposed many-objective algorithms: NSGA-III, MOEA/DD, and θ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}-DEA. In addition to the frequently used many-objective test problems DTLZ and WFG, we use their minus versions. We also propose two variants of the DTLZ test problems for examining the effect of the normalization in MOEA/D. Test problems in one variant have objective functions with totally different ranges. The other variant has a kind of deceptive nature, where the range of each objective is the same on the Pareto front but totally different over the entire feasible region. Computational experiments on those test problems clearly show the necessity of the normalization. It is also shown that the normalization has both positive and negative effects on the performance of MOEA/D. These observations suggest that the influence of the normalization is strongly problem dependent.


Introduction
Recently, many-objective optimization has received a lot of attention in the evolutionary multi-objective optimization (EMO) community, where optimization of four or more objectives is called many-objective optimization [14,15,23]. Many-objective problems present a number of challenges [10,19] to the EMO community such as the deterioration in search ability of Pareto dominance-based algorithms [6,29] and the increase in computation time of hypervolume-based algorithms [2,3]. For many-objective problems, it has been demonstrated in the literature [10,12] that MOEA/D [27] works well in comparison with Pareto dominance-based and hypervolume-based algorithms in terms of their search ability and computation time. As a result, a number of EMO algorithms have been proposed for many-objective problems based on the same or similar framework as MOEA/D (e.g., NSGA-III [5], MOEA/DD [16], I-DBEA [1], and θ -DEA [26]).
Various approaches to the improvement of MOEA/D have also been proposed in the literature (for detail, see a review [22]). One important research issue is the specification of a scalarizing function [11,21,24]. The specification of weight vectors [8] and their adjustments [20] is also important. Other important issues include solution-subproblem matching [17,18] and resource allocation to subproblems [28]. Whereas the normalization of the objective space is also important, this issue has not been actively studied in the literature [4]. One exception is Bhattacharjee et al. [4], where a six-sigmabased method was proposed for removing the influence of dominance-resistant solutions on nadir point calculation.
In this paper, we examine the effect of the normalization on the performance of MOEA/D. The frequently used basic version of MOEA/D [27] has no normalization mechanism, whereas the normalization was discussed in the original MOEA/D paper [27]. As a result, MOEA/D is often outperformed by other EMO algorithms with normalization mechanisms in their applications to test problems, where each objective has a totally different range of objective values. For example, in the WFG4-9 test problems [9], the range of the Pareto front on the ith objective is [0, 2i]. This means that the tenth objective f 10 has a ten times wider range than the first objective f 1 . It is difficult for MOEA/D without normalization to find a set of uniformly distributed solutions over the entire Pareto front for such a many-objective test problem. In this paper, we combine a simple normalization mechanism to MOEA/D in order to compare the performance between MOEA/D with and without normalization. This paper is organized as follows. First, we briefly explain MOEA/D and a simple normalization mechanism. Next, we demonstrate that the combined normalization mechanism has positive and negative effects on the performance of MOEA/D through computational experiments on the DTLZ1-4 [7] and WFG4-9 [9] test problems with three to ten objectives. We also discuss why the normalization deteriorates the performance of MOEA/D on some test problems. Then, we create two variants of the DTLZ1-4 test problems to further examine the effect of the normalization in MOEA/D. Each test problem in one variant has objective functions with totally different ranges. More specifically, the range of the Pareto front on the ith objective is [0, α i−1 ], where α is a nonnegative parameter (e.g., 2 and 10). The other variant has a kind of deceptive nature. Whereas the range of each objective is totally different over the entire feasible region, the Pareto front has the same range for each objective. In other words, each objective has totally different values in early generations and similar values after enough generations. Experimental results on the new test problems clearly show the necessity of the normalization as well as its negative effects. Finally, we report experimental results on minus versions [13] of the DTLZ1-4 and WFG4-9 test problems. The best results on almost all minus test problems are obtained by MOEA/D with normalization among MOEA/D, NSGA-III, MOEA/DD, and θ -DEA.

MOEA/D and normalization
We explain MOEA/D and the normalization of the objective space for the following m-objective minimization problem: where f i (x) is the ith objective to be minimized (i = 1, 2, . . . , m), x is a decision vector, and X is a feasible region of x in the decision space.
In MOEA/D, a multi-objective problem is handled as a set of single-objective problems. Each single-objective problem is defined by a scalarizing function with a different weight vector w = (w 1 , w 2 , . . . , w m ). A set of uniformly distributed weight vectors is generated based on the following relations [27]: where H is a positive integer for determining the resolutions of the generated weight vectors. MOEA/D generates all weight vectors satisfying these relations. The population size is the same as the total number of the generated weight vectors. This is because each single-objective problem with a different weight vector has a single solution.
The basic idea of MOEA/D is to search for the best solution along each weight vector, as shown in Fig. 1. The starting point of all weight vectors, which is called a reference point, is specified by the best value of each objective. When the best value of each objective is unknown, the best value among all the examined solutions is used for each objective.
A scalarizing function is used in MOEA/D for combining multiple objectives into a single objective. The original paper on MOEA/D by Zhang and Li [27] examined the weighted sum, Tchebycheff and PBI (penalty-based boundary intersection) functions as a scalarizing function. In this paper, we use the PBI function, since it has been frequently used in the literature. The idea of the PBI function has also been Fig. 1 Basic idea of MOEA/D. The best solution for each weight vector is searched for approximating the entire Pareto front used in recently proposed many-objective algorithms such as I-DBEA [1] and θ -DEA [26]. The PBI function is defined as

Pareto front
where θ is a penalty parameter, and d 1 and d 2 are defined using the reference point z * (which is the starting point of the weight vectors in Figs. 1 and 2): These two distances d 1 and d 2 are explained in Fig. 2a. The idea behind the use of the PBI function is to search for a solution on the Pareto front along each weight vector, as shown in Fig. 1, by minimizing both d 1 and d 2 . The value of θ specifies the importance of each distance. Its effect is explained in Fig. 2b, c using a contour line for the current solution (open circle). When θ is large, the search along the weight vector is emphasized, as shown in Fig. 2b. When θ is small, the search toward the Pareto front is emphasized, as shown in Fig. 2c. In this paper, θ = 5.0 is used as in the original MOEA/D paper.
Whereas MOEA/D without normalization is usually used in the literature, its necessity is clear. Figure 3a illustrates the uniformly distributed weight vectors and the obtained solutions. Whereas the weight vectors are uniformly distributed, the obtained solutions are biased towards the region with small f 1 values. This is because the range of each objective on the Pareto front is different. Figure 3b shows the normalization of Fig. 3a for clear illustration of the obtained solutions. If MOEA/D has a normalization mechanism, the search is performed in the normalized objective space using the uniformly distributed weight vectors, as shown in Fig. 4b. In this case, we can obtain the uniformly distributed solutions. The obtained solutions in Fig. 4b are also uniformly distributed in the original objective space, as shown in Fig. 4a.
In this paper, we use a simple normalization mechanism based on non-dominated solutions in the current population. Let z i L and z iU be the minimum and maximum values of the Better Region w Better Region where ε is a positive value for preventing the denominator from becoming zero in the case of z i L = z iU . We examine two specifications of ε: 10 −6 and 1. By this normalization, all non-dominated solutions are normalized to points in [0, 1] m in the normalized objective space. In the original MOEA/D paper [27], the use of the above-mentioned normalization mechanism with ε = 0 was examined for the two-objective ZDT3 problem and a modified version of the two-objective ZDT1 problem. in WFG4-9. Thus, the normalization of the objective space is needed in WFG4-9, whereas it is not needed in DTLZ1-4. For comparison, we also apply NSGA-III [5], MOEA/DD [16], and θ -DEA [26] to DTLZ1-4 and WFG4-9. As NSGA-III, we use Yuan's implementation [25]. All of these three EMO algorithms and MOEA/D use the same set of uniformly distributed weight vectors. Their basic search strategies are also the same. They search for the best solution for each weight vector. Similar solution sets are obtained when they are applied to a simple test problem with a normalized triangular Pareto front such as the three-objective DTLZ1. Their differences can be summarized as follows: Our computational experiments are performed in the same manner as in Yuan et al. [11]. The neighborhood size is 20. The polynomial mutation with the distribution index 20 is used with the mutation probability 1/n, where n is the number of decision variables. The simulated binary crossover with the distribution index 20 is used with the crossover probability 1.0. The number of weight vectors is specified as 91 (three objectives), 210 (five objectives), 156 (eight objectives), and 275 (ten objectives).

Experimental results on DTLZ and WFG
The performance of each approach is evaluated by calculating the average hypervolume value over 51 runs. Before calculating the hypervolume, the objective space is normalized using the true Pareto front of each test problem, so that the ideal and nadir points are (0, 0, . . . , 0) and (1, 1, . . . , 1). This information is used only for the hypervolume calculation after the execution of each algorithm. The reference point for the hypervolume calculation is specified as (1.1, 1.1, . . ., 1.1) in the normalized objective space.
Experimental results are shown in Table 1. For easy reading of the experimental results, the average hypervolume value of each algorithm on each test problem is normalized using the best result in each row, as shown in Table 1. Thus, the best value for each test problem is always 1.0000, as shown in Table 1. The best and worst results are highlighted by the bold font and the underline, respectively.
The performance of MOEA/DD in Table 1 clearly shows the necessity of normalization. Whereas the best results are obtained from MOEA/DD for most test problems in DTLZ1-4, NSGA-III, and θ -DEA outperform MOEA/DD in their applications to WFG4-9. This is because MOEA/DD has no normalization mechanism, whereas the search in NSGA-III and θ -DEA is performed in the normalized objective space. However, when they were applied to the normalized WFG4-9 in [13], the best results were obtained from MOEA/DD. MOEA/D with no normalization mechanism also shows clear performance deterioration for WFG4-9, as shown in Table 1. By combining the simple normalization mechanism with ε = 10 −6 into MOEA/D, its performance is improved for the three-objective and five-objective WFG4-9 test problems, as shown in Table 1. However, the simple normalization mechanism with ε = 10 −6 deteriorates the performance of MOEA/D for almost all of the other test problems. Especially, negative effects of the simple normalization mechanism on the performance of MOEA/D are clearly observed for DTLZ1-4, as shown in Table 1.
By increasing the value of ε from ε = 10 −6 to ε = 1 in Table 1, the severe negative effects of the simple normalization mechanism with ε = 10 −6 are remedied for most test problems except for DTLZ3 and DTLZ4. In the next section, we discuss why the performance of MOEA/D on some test problems is severely degraded by the simple normalization mechanism.

Negative effect of normalization
Let us examine the search behavior of MOEA/D with the simple normalization mechanism (ε = 10 −6 ) on the ten-objective DTLZ1 and WFG9 test problems, where the negative effects of the normalization are severe, as shown The best value is normalized to 1.0000 in each row in Table 1. In Fig. 5a, we show the obtained solution set by a single run of MOEA/D without normalization on the tenobjective DTLZ1 in the normalized objective space [0, 1] 10 . The single run with the median average hypervolume value is selected from the 51 runs as a representative run. The corre-sponding result by MOEA/D with normalization is shown in Fig. 5b, where the diversity of solutions is severely degraded by normalization.
In the same manner as in Fig. 5, we show the obtained solution sets on the ten-objective WFG9, as shown in Fig. 6.   From Figs. 5, 6, and 7, we can see that the reason for poor results by MOEA/D with normalization (ε = 10 −6 ) is the deterioration of solution diversity. Now, the question is why the diversity is severely degraded by the normalization. To address this issue, we monitor the width of the range of non-dominated solutions for each objective in each generation. That is, we monitor the value of (z iU − z i L ), which is used in the denominator of the simple normalization mechanism in each generation. This monitoring is performed for the selected runs in Figs. 5 and 6. The monitoring results in the first 100 generations are shown in Fig. 8 on DTLZ1 and Fig. 9 on WFG9.
From Fig. 8a on the ten-objective DTLZ1 problem, we can see that the value of the width (z iU − z i L ) is totally different for each objective in the first ten generations. Some objectives have very small values of the width (e.g., about 0.2) and others have large values of the width (e.g., more than 5). This is more clearly observed in Fig. 9a, where the width of some    (b) Normalization ( = 10 6 ).

Fig. 9
Width of the range of non-dominated solutions for each objective in each of the first 100 generations of the runs in Fig. 6 on the ten-objective WFG9 test problem. Results for the ten objectives are simultaneously depicted objectives is close to zero, whereas other objectives have much larger width (e.g., about 3). These observations suggest that the normalization based on non-dominated solutions in the current population may severely modify the scale of each objective.
Since the denominator in the simple normalization mechanism is (z iU − z i L ) + ε, the objective space is severely rescaled by the normalization when ε is very small. As a result, the diversity of solutions is severely deteriorated in Figs. 8b and 9b within the first ten generations. When ε is not very small (e.g., ε = 1, as shown in Fig. 7), the problem of very small values of (z iU − z i L ) is remedied.   Let us further discuss why unnecessary normalization decreases the diversity of solutions. Figure 10 shows the relation between the best solution for the PBI function with the weight vector (0.9, 0.1) for three specifications of θ when the Pareto front is linear. When θ is large, the optimal solution is obtained on the search line specified by the weight vector, as shown in Fig. 10b, c. When θ is small, the best solution is not on the search line specified by the weight vector, as shown in Fig. 10a. Thus, a large value of θ such as θ = 5 is frequently used in the literature. However, a large value of θ makes the better region than the current solution very small (i.e., inside the red contour line in Fig. 10, see also Fig. 2). This leads to slow convergence especially in the case of many-objective optimization [10]. Thus, an appropriate specification of θ is very important and difficult in MOEA/D with the PBI function.
When the search of the best solution is performed in the original objective space, we can obtain a set of welldistributed solutions for the DTLZ1 test problem by the PBI function with θ = 5, as shown in Fig. 11a. However, when the search is performed in a heavily rescaled objective space, as shown in Fig. 11b, a wide variety of solutions are not likely to be obtained. For example, in Fig. 11b, the best solution for the weight vector (1, 0) is on the f 2 axis (not on the f 1 axis). As a result, the obtained solutions may be heavily biased toward the region with small values of f 1 . This may explain severe deterioration of the diversity of solutions.
Another potential difficulty of using a small value of ε is that objective values of dominated solutions may become very large when the width (z iU − z i L ) is small. For example, when z i L = 0.10, z iU = 0.11, and z i = 1.00, these values are normalized to 0.00, 1.00, and 89.99 by the formulation (z i − 0.10)/(0.01 + ε) with ε = 10 −6 . This difficulty is remedied by increasing the value of ε. For example, z i L = 0.10, z iU = 0.11 and z i = 1.00 are normalized to 0.00, 0.01, and 0.89, respectively, when ε = 1.
However, in this case (i.e., when ε is not very small), we have a different type of difficulty: the width of the Pareto front on each objective in the normalized objective space is different [i.e., the nadir point in the normalized objective space is not (1, 1, . . . , 1)]. For example, let us assume that the Pareto front is a line between (0, 3) and (1, 0) Fig. 12 using experimental results on the three-objective WFG4 test problem by the three variants of MOEA/D: no normalization, normalization with ε = 10 −6 and ε = 1. When no normalization mechanism is used in Fig. 12a, many solutions are obtained around the bottom-left corner of the Pareto front. When ε is very small in Fig. 12b, well-distributed solutions are obtained. However, when ε is not very small (i.e., ε = 1) in Fig. 12c, we can see that more solutions are obtained around the bottom-left corner of the Pareto front than the other two corners (e.g., compare the 3 × 3 solutions around each corner).
As we have explained, small and large values of ε have their own negative effects. As a result, better results are obtained from ε = 10 −6 for some test problems and ε = 1 for other test problems in Table 1 (and other tables in this paper).

New many-objective test problems
We have already explained that the simple normalization mechanism has both positive and negative effects on the performance of MOEA/D. In this section, we propose new many-objective test problems to further examine the effect of the normalization. Our new test problems are generated by slightly modifying the DTLZ1-4 test problems as follows. Each objective in the DTLZ1-4 test problems is written as This function can be easily modified to create new rescaled test problems as follows: where α is a positive value for rescaling. The ith objective has an α i−1 times larger range than the first objective. The newly created test problems from DTLZ1-4 using this objective function are referred to as the rescaled DTLZ1-4 test problems. In our computational experiments, α is specified as α = 10. The objective function in DTLZ1-4 can also be easily modified to create a kind of deceptive rescaled test problems as Since g(x M ) = 0 on the Pareto front, this modification does not change the Pareto fronts of DTLZ1-4. However, the feasible region in the objective space is heavily rescaled. In this paper, the newly created test problems from DTLZ1-4 using this objective function are referred to as the deceptive rescaled DTLZ1-4 test problems. In our computational experiments, α is specified as α = 10 as in the rescaled DTLZ1-4. The Pareto fronts and the feasible regions of DTLZ2 and its two variants are illustrated in Fig. 13. As shown in Fig. 13b, the whole objective space is uniformly rescaled in the rescaled DTLZ1-4. However, in the deceptive rescaled DTLZ1-4, the feasible region is rescaled without changing the shape of the Pareto front, as shown in Fig. 13c.
We perform computational experiments on the rescaled and deceptive rescaled DTLZ1-4 test problems in the same manner as in the above-mentioned computational experiments on DTLZ1-4 and WFG4-9. Experimental results are summarized in Table 2. The average hypervolume value by each algorithm on each test problem is normalized by the best  result. The best result for each test problem is highlighted by the bold font in Table 2, while the worst result is underlined. The experimental results on the rescaled DTLZ1-4 in Table 2 clearly show the necessity of normalization. Good results are not obtained by MOEA/D without normalization and MOEA/DD. On the contrary, good results are obtained from MOEA/D without normalization for the deceptive rescaled DTLZ1-4 test problems. In Table 2, we can also observe severe performance deterioration of NSGA-III and θ -DEA in their application to the deceptive rescaled DTLZ1 and DTLZ3 with eight and ten objectives. For visual examination, obtained solutions by a single run of MOEA/D and θ -DEA on the ten-objective deceptive rescaled DTLZ1 are shown in Fig. 14. In Fig. 14b, most solutions are not converged to the Pareto front in [0, 1] 10 . This observation suggests a future research topic: improvement of the normalization mechanisms in the state-of-the-art many-objective algorithms such as NSGA-III and θ -DEA.

Experimental results on minus test problems
The minus version of DTLZ and WFG was proposed as many-objective test problems with inverted triangular Pareto fronts in [13]. The minus version can be easily formulated by changing each objective in DTLZ and WFG from f i (x) to − f i (x). This modification is the same as changing from "minimization of f i (x)" to " maximization of f i (x)" in DTLZ and WFG. The main feature of the minus DTLZ and WFG test problems is that their Pareto fronts are inverted triangular,   Experimental results on the minus DTLZ1-4 and WFG4-9 are summarized in Table 3. The results on the minus WFG4-9 show the necessity of normalization. Good results are not obtained for those test problems by MOEA/D without normalization and MOEA/DD. One unexpected observation is that the best results are obtained from MOEA/D with normalization (ε = 10 −6 ) for most test problems in Table 3 except for the minus DTLZ1. That is, the original MOEA/D with The best value is normalized to 1.0000 in each row normalization clearly outperforms all the other algorithms, as shown in Table 3. This observation provides us an interesting research question: why the simple normalization mechanism can improve the performance of MOEA/D even on the minus DTLZ1-4 test problems that do not need any normalization    18 that the simple normalization mechanism helps MOEA/D to find well-distributed solutions around the center of the Pareto front. Further examinations are needed to explain why the simple normalization mechanism can improve the performance of MOEA/D for not only the minus WFG4-9 but also the minus DTLZ1-4. From careful examinations of Table 3, we can observe that MOEA/D without normalization outperforms MOEA/DD on most minus DTLZ1-4 test problems, while MOEA/DD was the best on DTLZ1-4 in Table 1. We can also see that MOEA/D without normalization also outperforms NSGA-III and θ -DEA on some minus DTLZ1-4 test problems. Similar observations were reported in [13]. The reason for the infe-   Table 3 (for details, see [4,13]). In [4], the simultaneous use of triangular and inverted triangular lattice structures of weight vectors in MOEA/D was proposed to remedy the high dependency of the performance of MOEA/D on the shape of the Pareto front. The inferior performance of NSGA-III, MOEA/DD, and θ -DEA in Table 3 can be explained in this manner using the shape of the Pareto front of each minus test problem. However, it is still totally unclear why the simple normalization mechanism improves the performance of MOEA/D on the minus DTLZ1-4 test problems, where the normalization of the object space is not needed.

Further computational experiments
Our experimental results in this paper suggest that a very small range of objective values of non-dominated solutions in the current population is the main reason for the severe negative effects of the normalization on the performance of MOEA/D. One may think that the use of all solutions in the current population for the normalization may remedy its severe negative effects. That is, it may be a good idea to calculate the maximum value z iU and the minimum value z i L from all solutions in the current population (i.e., not only from non-dominated solutions but also from all the other solutions) in the following formulation of the normalization: Let us examine this setting for the DTLZ1-4 and WFG4-9 test problems for three specifications of ε: 10 −12 , 10 −6 and 1. Experimental results are summarized in Table 4. For comparison, we also show the results of the normalization based on non-dominated solutions in the current population. In Table 4, the best results are obtained for many test problems from the normalization based on all solutions with ε = 1. The same computational experiments are performed on the rescaled and deceptive rescaled DTLZ1-4 test problems. Experimental results are shown in Table 5. In this table, it is difficult to say which is better between the two normalization strategies. We also perform the same computational experiments on the minus DTLZ1-4 and WFG4-9 test problems. Experimental results are shown in Table 6. High performance of the normalized MOEA/D with ε = 10 −12 and ε = 10 −6 in Table 6 is clearly deteriorated by the use of all solutions for the normalization. Our experimental results in Tables 4 and 5 and 6 show that the choice between the two normalization strategies is problem dependent.

Conclusions
In this paper, we clearly demonstrated the necessity of the objective space normalization in MOEA/D. Good results were not obtained from MOEA/D without normalization for the WFG4-9, rescaled DTLZ1-4, and minus WFG4-9 test problems. We examined the effect of the normalization by combining a simple normalization mechanism into MOEA/D. It was demonstrated through computational experiments that the normalization has both positive and negative effects on the performance of MOEA/D. That is, the performance of MOEA/D was improved by the incorporation of the simple normalization mechanism for some test problems but degraded for other test problems. Our experimental results on the newly created deceptive rescaled DTLZ1-4 test problems suggest the existence of negative effects of the normalization on the performance of not only MOEA/D but also NSGA-III and θ -DEA. Further examina-tions are needed for evaluating positive and negative effects of the normalization on the performance of those recently proposed many-objective algorithms. In our computational experiments on the minus DTLZ1-4 and WFG4-9 test problems, the best results were obtained for most test problems by MOEA/D with normalization. This observation suggests the existence of positive effects of the normalization even for test problems that do not need any normalization. Further examination of positive effects of the normalization for such a test problem is also an interesting future research topic.
Our computational experiments in this paper suggests the following issues as interesting future research topics: 1. Development of an effective and robust normalization mechanism for MOEA/D: the observed negative effects of the simple normalization mechanism on the performance of MOEA/D throughout this paper clearly suggest the necessity of such a normalization mechanism in MOEA/D.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.