1 Introduction

One of the steps of a decision making process is finding optimal solutions of various optimization problems. In cases when more than one objective is concerned it is often not possible to find a single optimal solution. Instead, the decision maker is presented with an entire set of possible solutions that represent various trade-offs between the objectives. A multiobjective optimization problem (MOP) can be formalized as follows:

$$\begin{aligned}&\hbox {minimize}\,F(x) = \left( f_1(x), \ldots , f_m(x)\right) \nonumber \\&\hbox {subject to}\,x \in \varOmega , \end{aligned}$$
(1)

where, \(\varOmega \)—the decision space, \(m\)—the number of objectives.

Because solutions are evaluated using multiple criteria it is usually not possible to order them linearly from the best one to the worst. Instead, solutions are compared using the Pareto domination relation defined as follows [4, 19]. Given two points \(x_1\) and \(x_2\) in the decision space \(\varOmega \) for which:

$$\begin{aligned} F(x_1) = \left( f_1(x_1), \ldots , f_m(x_1)\right) \nonumber \\ F(x_2) = \left( f_1(x_2), \ldots , f_m(x_2)\right) \end{aligned}$$
(2)

we say that \(x_1\) dominates \(x_2\,(x_1 \succ x_2)\) iff:

$$\begin{aligned} \forall i \in \{1, \ldots , m\} : f_i(x_1) \le f_i(x_2)\nonumber \\ \exists i \in \{1, \ldots , m\} : f_i(x_1) < f_i(x_2) \end{aligned}$$
(3)

A solution \(x\) is said to be nondominated (Pareto optimal) iff:

$$\begin{aligned} \lnot \exists x' \in \varOmega : x' \succ x . \end{aligned}$$
(4)

In the area of multiobjective problems solving evolutionary algorithms are often used [2, 30]. The advantage of evolutionary and other population-based approaches is that they maintain an entire population of candidate solutions which represent an approximation of the true Pareto front. Among multiobjective evolutionary algorithms the two well-known groups of approaches are methods based on Pareto domination relation and decomposition-based methods.

Algorithms based on Pareto domination relation use it for numerical evaluation of specimens or for selection. Pareto domination relation is used for evaluation of specimens in algorithms such as SPEA [32] and SPEA-2 [33]. The approach in which Pareto domination is used for selection is used for example in NSGA [23] and NSGA-II [7].

The multiobjective evolutionary algorithm based on decomposition (MOEA/D) was proposed by Zhang and Li [27]. Contrary to the algorithms that base their selection process on the Pareto domination relation the MOEA/D algorithm is a decomposition-based algorithm. In this algorithm the multiobjective optimization problem is decomposed to a number of single-objective problems. To date the MOEA/D algorithm has been applied numerous times and was found to outperform other algorithms not only in the case of benchmark problems [17, 27], but also in the case of combinatorial problems [21] and various applications to real-life problems. The applications in which the MOEA/D algorithm was used include multi-band antenna design [9], deployment in wireless sensor networks [15], air traffic optimization [3], and route planning [24].

Since the MOEA/D algorithm was introduced many modified versions were proposed. Some approaches aim at achieving better performance by exploiting the advantages of various scalarization functions used in the MOEA/D framework. Ishibuchi et al. [12] proposed adaptive selection of either the weighted sum scalarization or Tchebycheff scalarization depending of the region of the Pareto front. An algorithm in which both scalarizations are used simultaneously was also proposed [13]. Other authors attempt using autoadaptive mechanisms in the context of the MOEA/D framework. For example, in the paper [28] a MOEA/D-DRA variant is proposed in which a utility value is calculated for each subproblem and computational resources are allocated to subproblems according to their utility values.

There are also hybrid algorithms in which the MOEA/D is combined with other optimization algorithms. Martinez and Coello Coello proposed hybridization with the nonlinear simplex search algorithm [26]. Li and Landa-Silva combined the MOEA/D algorithm with simulated annealing [16] and applied their approach to combinatorial problems. Another approach based on PSO and decomposition was proposed in [1]. In the case of combinatorial optimization a hybridization with the ant colony optimization (ACO) was attempted [14] in order to implement the “learning while optimizing” principle.

The main motivation for this paper is to research the possibility of increasing the diversity of the search, namely extending the Pareto front, by directing the selective pressure towards outer regions of the Pareto front. While it is a different approach than those discussed above it can be easily combined with many of the methods by other authors. The presented algorithm can be hybridized with other optimization algorithms and local search procedures in the same way as the original MOEA/D algorithm. For example, the approach proposed in [26] employs a simplex algorithm as a local search procedure to improve the solutions found by the MOEA/D algorithm. It works at a different level than the approach proposed in this paper which modifies the working of the MOEA/D algorithm itself. Therefore, hybrid approaches, such as described in papers [1, 14, 16, 26] could use the weight generation scheme presented in this paper instead of the original one. Adaptive allocation of the resources discussed in [28] could also be attempted with the algorithm presented in this paper.

The MOEA/D algorithm employs the idea of decomposition of the original multiobjective problem to several single-objective problems. To obtain a single, scalar objective from several ones that are present in the original problem various aggregation approaches are used. Many authors use the weighted sum approach [19] and the Tchebycheff approach [27]. In both approaches a weight vector \(\lambda = (\lambda _1, \ldots , \lambda _m)\), \(\lambda _i \ge 0\) for all \(i \in \{ 1, \ldots , m \}\), \(\sum _{i=1}^m \lambda _i = 1\) is assigned to each specimen in the population. The formulation of a scalar problem in the Tchebycheff aggregation approach is as follows:

$$\begin{aligned}&\hbox {minimize}\,g^{te}\left( x|\lambda ,z^*\right) = \max \limits _{1 \le i \le m} \left\{ \lambda _i | f_i(x) - z_i^* | \right\} \nonumber \\&\hbox {subject to}\,x \in \varOmega , \end{aligned}$$
(5)

where, \(\lambda \)—the weight vector assigned to the specimen, \(z^* \)—the reference point.

The reference point \(z^* = (z_1^*, \ldots , z_m^*)\) used in the Tchebycheff aggregation approach is composed of the best attainable values along all the objectives:

$$\begin{aligned} z^* = min \{ f_i(x) : x \in \varOmega \} . \end{aligned}$$
(6)

If the best possible values for all the objectives are not known in advance the reference point \(z^*\) can be calculated based on the values of objectives that were attained so far by the population.

The authors of the MOEA/D algorithm proposed the following way of generating weight vectors [27]. First, a value of a parameter \(H\) is chosen which determines the size of a step used for generating weight vectors. A set of weight vectors is generated by selecting all the possible \(m\)-element combinations of numbers from the set:

$$\begin{aligned} \left\{ \frac{0}{H}, \frac{1}{H}, \ldots , \frac{H}{H} \right\} \end{aligned}$$
(7)

such that:

$$\begin{aligned} \sum _{j=1}^m \lambda _j = 1. \end{aligned}$$
(8)

This weight vector generation approach produces:

$$\begin{aligned} N = C_{H+m-1}^{m-1} \end{aligned}$$
(9)

weight vectors. Because there is exactly one weight vector assigned to each specimen in the population the size of the population equals \(N\). For a bi-objective optimization problem \((m = 2)\) this weight vector generation procedure builds a set of vectors \(\left\{ \lambda ^{(i)}\right\} \), \(i \in \{0, \ldots , H\}\), where:

$$\begin{aligned} \lambda ^{(i)} = \left[ \frac{i}{H}, \frac{H - i}{H} \right] \!. \end{aligned}$$
(10)

The MOEA/D algorithm uses a concept of subproblem neighborhood for transferring information between specimens that represent solutions of scalar subproblems. The neighborhood size is controlled by the algorithm parameter \(T\). For each weight vector \(\lambda ^{(i)}\) its neighborhood \(B(i)\) is defined as a set of \(T\) weight vectors that are closest to \(\lambda ^{(i)}\) with respect to the Euclidean distance calculated in the \(m\)-dimensional space of vector weights. In the bi-objective case for \(T = 2k + 1\), \(k \in {\mathbb {N}}\) the neighborhood can be located symmetrically around \(\lambda ^{(i)}\). For \(T = 2k\), \(k \in {\mathbb {N}}\) it is not possible and one “excess” weight vector must remain on one side of \(\lambda ^{(i)}\) (for a study of the effects that this asymmetry has on the working of the MOEA/D algorithm please refer to [18]). In further considerations we assume, that this excess vector is included on the side with indices larger than \(i\). Therefore, for weight vectors located near edges of the Pareto front the neighborhood is shifted and is equal to:

(11)

For \(\lambda ^{(i)}\) located farther from the edges the neighborhood is:

$$\begin{aligned} B(i) = {\left\{ \begin{array}{ll} \left\{ i - k, \ldots , i, \ldots , i + k\right\} &{} \text{ for } T = 2k + 1, k \in {\mathbb {N}} \\ \left\{ i - k + 1, \ldots , i, \ldots , i + k\right\} &{} \text{ for } T = 2k, k \in {\mathbb {N}} . \\ \end{array}\right. } \end{aligned}$$
(12)

When a new specimen for the \(i\)th subproblem is to be generated, parent selection is performed among the specimens from the neighborhood \(B(i)\). The newly generated offspring may then replace not only the current specimen for the \(i\)th subproblem, but also specimens assigned to the subproblems from the neighborhood \(B(i)\). By definition, each vector \(\lambda ^{(i)}\) belongs to its own neighborhood \(B(i)\). Concepts involved in the working of the MOEA/D algorithm are presented in Fig. 1 which shows the population (black dots) and optimization directions determined by the weight vectors (arrows) for a bi-objective case.

Fig. 1
figure 1

Overview of concepts involved in the working of the MOEA/D algorithm in the case of minimization in a bi-objective problem

2 Increasing the outward selective pressure

The MOEA/D algorithm uses weight vectors to aggregate objectives of a multiobjective optimization problem into a single, scalar objective. Clearly, the coordinates of a weight vector \(\lambda ^{(i)}\) determine the importance of each objective \(f_j\), \(j \in \{1, \ldots , m\}\) in \(i\)-th scalar subproblem.

Solutions in the neighborhood \(B(i)\) participate in information exchange with the solution of a subproblem parameterized by weight vector \(\lambda ^{(i)}\). Thus, one can expect that the distribution of weight vectors in the neighborhood \(B(i)\) may influence the selective pressure exerted on specimens that represent solutions of the \(i\)-th subproblem.

Weight vectors \(\lambda ^{(i)}\) located near the center of the weight vector set have all the coordinates close to \(1/m\). In such a case the neighborhood \(B(i)\) is located more or less symmetrically around the weight vector \(\lambda ^{(i)}\). For weight vectors located near or on the edge of the weight vector set the neighborhood \(B(i)\) is placed asymmetrically. This effect is easy to illustrate in a bi-objective case. If \(\lambda ^{(i)} = [1/2, 1/2]\) the neighborhood \(B(i)\) is symmetric as shown in Fig. 1. For weight vectors \(\lambda ^{(0)}\) and \(\lambda ^{(N - 1)}\) the neighborhoods \(B(0)\) and \(B(N - 1)\) are placed entirely on one side of the weight vector as shown in Fig. 2.

Fig. 2
figure 2

The neighborhoods \(B(0)\) and \(B(N - 1)\) in a bi-objective minimization problem

Such placement of weight vectors in the neighborhood of the \(\lambda ^{(0)}\) vector causes a situation in which specimens assigned to the 0th subproblem can only exchange information with specimens that are evaluated using weight vectors pointing slightly downwards. Correspondingly, specimens that solve the subproblem \(N-1\) can only exchange information with specimens that are evaluated using weight vectors pointing slightly to the left. In both situations it can be expected that an inward selective pressure will be exerted on specimens on the edges of the Pareto front preventing them from extending towards higher values of objectives \(f_1\) and \(f_2\). Other specimens located at positions closer than \(T / 2\) from the edge can also be influenced by such pressure, which can be expected to be smaller if a specimen is located farther from the edge.

Obviously, the selective pressure influences the behavior of the population and therefore it can be expected to influence also the shape of the Pareto front attained by the algorithm. The methods presented in this paper were developed based on an assumption that directing the selective pressure outwards will make the Pareto front spread wider. This assumption was positively verified by a preliminary test performed on a simple benchmark problem. The construction of this benchmark function and the results of the preliminary test are presented in Sect. 3.

In this paper three methods of directing the selective pressure outwards are proposed: “fold”, “reduce” and “diverge”. The first two of these methods differ from the standard MOEA/D algorithm in the way in which the neighborhoods are defined. In the “diverge” method the neighborhoods are defined in the same way as in the standard MOEA/D algorithm but weight vectors are calculated in a different way.

2.1 The “fold” method

In the “fold” method weight vectors are generated using the same procedure as proposed by the authors of the MOEA/D algorithm in [27]. This weight vector generation procedure is described in Sect. 1. It generates all the possible \(m\)-element combinations of numbers from the set (7) that sum to 1 as described by Eq. (8).

In the “fold” method the neighborhoods of weight vectors are defined differently than in the standard MOEA/D implementation. Each neighborhood \(B(i)\) is placed symmetrically around the weight vector \(\lambda ^{(i)}\). In the case of weight vectors placed near the edges of the Pareto front the unavailable weight vectors are represented by special values \(-1\) and \(N\) which correspond to “virtual” weight vectors as shown in Fig. 3.

Fig. 3
figure 3

Symmetric neighborhoods \(B(0)\) and \(B(N - 1)\) including “virtual” weight vectors used in the “fold” method

In the MOEA/D algorithm the neighborhood \(B(i)\) is used for selecting parents for genetic operations (crossover and mutation). Parent selection is done by randomly selecting a weight vector from the neighborhood \(B(i)\). The specimen that corresponds to the selected weight vector is used in genetic operations. This selection process is slightly modified in the “fold” method in the case of neighborhoods which include “virtual” weight vectors. The probability of selecting each of the weight vectors (a “real” or a “virtual” one) is the same and equals \(\frac{1}{|B(i)|}\). Therefore, with some probability, the selection procedure can choose one of the “virtual” weight vectors represented by a special value \(-1\) or \(N\). “Virtual” weight vectors do not have specimens assigned to them, so if a “virtual” weight vector is selected a “real” weight vector has to be selected instead. In a biobjective case the specimen corresponding to the outermost weight vector \(\lambda ^{(0)}\) or \(\lambda ^{(N - 1)}\) is returned instead of any “virtual” weight vector. The probability \(P_O(i)\) that the specimen corresponding to the outermost vector will be used is in such case equal to:

(13)

This probability satisfies inequalities:

(14)

Clearly, when the “fold” method is used for a biobjective problem the probability of selecting the outermost weight vector \(\lambda ^{(0)}\) or \(\lambda ^{(N - 1)}\) increases in the neighborhoods surrounding weight vectors placed near the edge of the Pareto front.

The approach described above can easily be generalized to problems with more than two objectives. First, the “virtual” vectors are generated. These are vectors that satisfy the condition (8), but contain elements that are negative (e.g. \(-1/H\), \(-2/H\)) or larger than 1 (e.g. \((H + 1)/H\), \((H + 2)/H\)). For each “real” weight vector \(\lambda _i\) a preliminary neighborhood \(B'(i)\) is formed by selecting \(T\) vectors closest to \(\lambda _i\) from both the “real” and “virtual” weight vectors. If a “virtual” weight vector \(\lambda _v\) is selected for the neighborhood \(B'(i)\) it is replaced by a “real” weight vector \(\lambda _r\) closest to \(\lambda _v\) (cf. Figs. 4 and 5). If there are several “real” weight vectors at the same distance from \(\lambda _v\) one of them is selected at random. Obviously, such weight vector \(\lambda _r\) is placed on the edge of the weight vector set.

Fig. 4
figure 4

Selection of “real” weight vectors for the neighborhood \(B(i)\) of a weight vector \(\lambda _i\) used in the “fold” method

Fig. 5
figure 5

Neighborhood \(B(i)\) of a weight vector \(\lambda _i\) used in the “fold” method

After the neighborhood construction procedure described above, the neighborhood \(B(i)\) contains only the “real” weight vectors and the vectors placed on the edge of the set of “real” weight vectors have multiple copies in \(B(i)\). Parent selection is performed by selecting elements from \(B(i)\) with uniform probability. In neighborhoods \(B(i)\) corresponding to weight vectors placed on the edge of the weight vector set some “virtual” weight vectors are replaced by “real” weight vectors. The selective pressure towards the center of the Pareto front is thus reduced.

2.2 The “reduce” method

In the “reduce” method the weight vectors are generated using a similar procedure as in the “fold” method. From the set of “real” and “virtual” weight vectors a preliminary neighborhood \(B'(i)\) of a weight vector \(\lambda _i\) is generated by considering \(T\) closest neighbors of \(\lambda _i\) (both “real” and “virtual” weight vectors). From these \(T\) vectors only the “real vectors” are retained in \(B(i)\). Contrary to the “fold” method no multiple copies of the weight vectors are placed in \(B(i)\). The neighborhood \(B(i)\) of a vector \(\lambda _i\) placed on the edge of the set of the “real” weight vectors looks as shown in Fig. 5. Therefore, the size of the neighborhood is decreased for such \(\lambda _i\) compared to the original MOEA/D algorithm.

In a biobjective case the size of the neighborhood \(B(i)\) equals:

(15)

When parents are selected, each neighborhood member can be selected with the probability \(\frac{1}{|B(i)|}\), so there is no preference for the outermost weight vectors in a given neighborhood \(B(i)\). However, the overall probability of selecting the outermost weight vector is increased compared to the standard MOEA/D algorithm because the neighborhoods near the edge of the Pareto front are smaller.

2.3 The “diverge” method

In the “diverge” method the weight vector generation procedure is modified. Instead of choosing values from the set (7) in this method the coordinates are taken from the set:

$$\begin{aligned} \left\{ \frac{0}{H}-\alpha , \ldots , \frac{H}{H}+\alpha \right\} , \end{aligned}$$
(16)

where:

  • \(\alpha \) - a parameter that controls the divergence of weight vectors.

The condition (8) must be satisfied by all vectors \(\lambda ^{(0)}, \ldots , \lambda ^{(N - 1)}\), i.e. all coordinates in each weight vector must sum to 1. Therefore, in a bi-objective case all the weight vectors have the form:

$$\begin{aligned} \lambda ^{(i)} = \left[ \frac{i}{H} + \frac{(2i - H)\alpha }{H}, \frac{H - i}{H} - \frac{(2i - H)\alpha }{H} \right] , \end{aligned}$$
(17)

where: \(i = 0, \ldots , H\).

Obviously,

$$\begin{aligned} \lambda ^{(0)} = \left[ -\alpha , 1 + \alpha \right] , \quad \lambda ^{(H)} = \left[ 1 + \alpha , -\alpha \right] . \end{aligned}$$
(18)

The layout of the optimization directions determined by the weight vectors in the “diverge” method in a bi-objective minimization problem is presented in Fig. 6.

Fig. 6
figure 6

The layout of the optimization directions determined by the weight vectors in the “diverge” method in a bi-objective minimization problem

Note, that in this method some of the weight vectors have negative coordinates. The aggregation function used for decomposing the multiobjective problem into scalar ones must be defined in such a way that for the negative weights the definition remains correct. For example, negative weights can easily be used with the weighted sum decomposition. Other approaches, such as the Tchebycheff decomposition which minimizes \(max\{ \lambda _i|f_i(x) - z_i^*| \}\) (cf. Eq. (5)) have to be modified in order to accommodate negative weights. Obviously, in the Tchebycheff decomposition only the objectives \(f_i(x)\) corresponding to weight vectors coordinates \(\lambda _i > 0\) have any influence on the value of the \(max\) function (terms corresponding to negative weight vector coordinates are always not larger than 0). Thus, this particular decomposition method has to be modified in order to accommodate negative weight vector coordinates. A solution employed in this paper is to calculate the distance from the nadir point \(z^\#\) (a point with all the worst, instead of the best, coordinates) for these objectives that correspond to negative coordinates \(\lambda _i\) in the weight vector. Therefore in the “diverge” method the Tchebycheff decomposition is performed using Eq. (19).

$$\begin{aligned}&\hbox {minimize}\,g^{te}\left( x|\lambda ,z^*\right) = \max \limits _{1 \le i \le m} \left\{ \phi ( \lambda _i , f_i(x) ) \right\} \nonumber \\&\hbox {subject to}\,x \in \varOmega , \end{aligned}$$
(19)

where: \(\lambda \)—the weight vector assigned to the specimen, \(\phi \)—a function calculated as follows:

$$\begin{aligned} \phi (\lambda , y ) = {\left\{ \begin{array}{ll} \lambda | y - z_i^* | &{} \quad \text{ for } \quad \lambda \ge 0 \\ \lambda | y - z_i^\# | &{} \quad \text{ for } \quad \lambda < 0 \\ \end{array}\right. } \end{aligned}$$
(20)

where: \(z^* \)—the reference point, \(z^\# \)—the nadir point.

Similarly as the reference point \(z^*\) the nadir point \(z^\#\) is updated during the runtime of the algorithm. It contains the worst objective values found so far by the algorithm. These are usually the objective values attained during the first generations of the evolution.

In the “diverge” method neighborhoods of weight vectors are defined in the same way as in the standard MOEA/D algorithm. The parent selection procedure is also the same as in the standard MOEA/D algorithm.

3 Preliminary tests

In the previous section three methods of directing the selective pressure outwards are proposed: “fold”, “reduce” and “diverge”. The design of these methods was based on a hypothesis that directing the selective pressure outwards will make the Pareto front spread wider. To verify this assumption a preliminary test was performed on a simple benchmark problem with a symmetric Pareto front designed in such a way that the difference in the Pareto front extent should easily be visible. Also, the results produced by the MOEA/D algorithm for this benchmark problem are rather poor considering the simplicity of the problem. The solutions found by the MOEA/D algorithm tend to concentrate in the middle of the actual Pareto front. The solutions at the edges of the actual Pareto front are not found at all by the unmodified MOEA/D algorithm.

The geometric representation of this problem is as follows. First, a parameter \(d\) is set which determines the dimensionality of the search space which is a \(d\)-dimensional cube \(\varOmega = [0, 1]^d\). The two conflicting objectives are to approach two different points in this search space as close as possible. One of these points denoted \(A\) has all coordinates equal to \(\frac{1}{4}\) and the other denoted \(B\) all coordinates equal to \(\frac{3}{4}\):

$$\begin{aligned} A&= \left[ \frac{1}{4}, \ldots , \frac{1}{4}\right] \in {\mathbb {R}}^d \end{aligned}$$
(21)
$$\begin{aligned} B&= \left[ \frac{3}{4}, \ldots , \frac{3}{4}\right] \in {\mathbb {R}}^d \end{aligned}$$
(22)

The values of the two objective functions \(f_1, f_2 : [0, 1]^d \rightarrow R\) are defined as distances to points \(A\) and \(B\) respectively. Formally, the optimization problem can be defined as:

$$\begin{aligned}&\hbox {minimize}\,F(x) = \left( \Vert x - A\Vert , \Vert x - B\Vert \right) \nonumber \\&\hbox {subject to}\,x \in [0, 1]^{d} \!\!, \end{aligned}$$
(23)

which is equal to:

$$\begin{aligned}&\hbox {minimize}\,F(x) = \left( f_1(x), f_2(x)\right) \nonumber \\&\hbox {subject to}\,x \in [0, 1]^d \!\!, \end{aligned}$$
(24)

where:

$$\begin{aligned} f_1(x)&= \sqrt{\sum _{i=1}^{d} (x_i - A_i)^2 } \end{aligned}$$
(25)
$$\begin{aligned} f_2(x)&= \sqrt{\sum _{i=1}^{d} (x_i - B_i)^2 } \end{aligned}$$
(26)

Clearly, the objectives are conflicting, and the Pareto set of this optimization problem is the line segment connecting the points \(A\) and \(B\) in the decision space \([0, 1]^d\). When the solution is located at one of the ends of this line segment one of the objectives is 0 while the other has the value of \(\frac{\sqrt{d}}{2}\) which is the length of the line segment \(AB\). The Pareto front is therefore a line segment in the objective space \({\mathbb {R}}^2\) with the equation:

$$\begin{aligned} y = -x + \frac{\sqrt{d}}{2}, x \in \left[ 0, \frac{\sqrt{d}}{2}\right] \!\!. \end{aligned}$$
(27)

Positioning the solution anywhere on the \(AB\) line segment is allowed, so all the points on this Pareto front can be attained.

Sets of all nondominated solutions obtained in 30 runs of the preliminary tests are presented in Fig. 7. Table 1 presents the minimum and the maximum values attained along each of the objectives by each method. Obviously, if the minimum values are lower and the maximum values are higher, the Pareto front found by the algorithm spreads wider which indicates a greater diversity of the search.

Fig. 7
figure 7

Sets of all nondominated solutions obtained in 30 runs of the preliminary tests

Table 1 The minimum and the maximum values attained along each of the objectives by each method in preliminary tests on a problem with a symmetric Pareto front

The Pareto fronts presented in Fig. 7 and the values presented in Table 1 clearly show that the unmodified MOEA/D algorithm performed worst in the preliminary test. The “fold” method seems to be only slightly better. On the other hand the “reduce” method improved the results significantly and the “diverge” method produced even more widely spread Pareto fronts. Thus, the results of the preliminary tests support the hypothesis that the diversity of the search can be improved by directing the selective pressure outwards to the edges of the Pareto front.

4 Experimental study

The experiments were performed in order to verify if increasing the outward selective pressure improves the results obtained by the algorithm. The experiments were performed on test problems with complicated Pareto sets F1–F9 described in [17]. Some of these test problems were also used during the CEC 2009 MOEA competition [29], namely F2 (as UF1), F5 (as UF2), F6 (as UF8), and F8 (as UF3).

The performance of the standard MOEA/D algorithm was compared to the performance of the three methods of directing the selective pressure outwards: “fold”, “reduce” and “diverge” described in Sect. 2. The “diverge” method requires setting the value of the parameter \(\alpha \) which determines by how much the weight vectors diverge to the outside. To test the influence of this parameter on the algorithm performance the value of this parameter was set to \(\alpha = 0.1\), \(0.2\), \(0.3\), \(0.4\) and \(0.5\).

For each data set and each method of increasing the outward selective pressure 30 iterations of the test were performed. For comparison, tests using the standard version of the MOEA/D algorithm were also performed.

4.1 Performance assessment

In the case of multiobjective optimization problems evaluating the results obtained by an algorithm is more complicated than in the case of single-objective problems. Solutions produced by each run of an algorithm are characterized by two or more conflicting objectives and thus they may not be directly comparable. A common practice is to evaluate the entire set of solutions using a certain indicator which represents the quality of the whole solution set.

In this paper two indicators were used: the hypervolume (HV) [34] also known as the size of the objective space covered [31] and the Inverted Generational Distance (IGD) [17] also known as the distance from the representatives (or \(D\)-metric) [27]. For a given set of solutions \(P\) the hypervolume is the Lebesgue measure of the portion of objective space that is dominated by solutions in \(P\) collectively:

$$\begin{aligned} HV(P) = L\left( \bigcup _{x \in P} \left[ f_1(x), r_1 \right] \times \cdots \times \left[ f_m(x), r_m \right] \right) , \end{aligned}$$
(28)

where: \(m\)—the dimensionality of the objective space, \(f_i(\cdot )\), \(i = 1, \ldots m\)—the objective functions, \(R = (r_1, \ldots , r_m)\)—a reference point, \(L(\cdot )\)—the Lebesgue measure on \({\mathbb {R}}^m\).

In two and three dimensions the hypervolume corresponds to the area and volume respectively. Better Pareto fronts are those that have higher values of the HV indicator. The hypervolume indicator is commonly used in the literature to evaluate Pareto fronts and it has good theoretical properties. It has been proven [10] that maximizing the hypervolume is equivalent to achieving Pareto optimality. To the best of the knowledge of the author, this is the only currently known measure with this property.

The second indicator, the Inverted Generational Distance (IGD) measures how close the solutions in an approximation of a Pareto front \(P\) approach points from a predefined set \(P^*\). The \(P^*\) set contains points that are uniformly distributed in the objective space along the real Pareto front of a given multiobjective optimization problem. Formally, for a given reference set \(P^*\) and an approximation of the Pareto front \(P\) the IGD indicator is calculated as:

$$\begin{aligned} IGD(P, P*) = \frac{\sum _{v \in P^*}d(v, P)}{|P^*|} , \end{aligned}$$
(29)

where: \(d(v, P)\)—a minimum Euclidean distance between \(v\) and the points in \(P\).

Better Pareto fronts are those that have lower values of the IGD indicator. The IGD indicator measures both the diversity and the convergence of \(P\). Calculating the IGD indicator requires generating a set of points that are uniformly distributed along the true Pareto front of a given multiobjective optimization problem. This can be a significant obstacle in the case of real-life optimization problems for which the true Pareto front is not known. For benchmark problems the true Pareto front is usually known so the points in the reference set \(P^*\) can be set exactly on this front.

4.2 Parameter settings

The algorithm used in the experiments is based on the version of the MOEA/D algorithm described in [17] which is the same paper in which the test problems were proposed. For the “fold”, “reduce” and “diverge” methods neighborhood construction and specimen selection procedures were changed as described in Sect. 2. Other aspects of how the algorithm worked were not changed with respect to the standard MOEA/D algorithm.

For each test problem and for each tested method proposed in this paper the algorithm was run for \(N_{gen} = 500\) generations. Since each of the algorithm versions performs the same number of objective function evaluations the total number of objective function evaluations was the same in each test. In the case of problems F1–F9 the size of the population was set to \(N = 300\) specimens for 2-objectives and to 595 for 3-objectives. This corresponds to the weight vector step size \(H = 299\) and \(H = 33\) respectively. Both the number of generations and the population size were the same as used in the original paper on the problems with complicated Pareto sets [17].

The neighborhood size was set to \(T = 20\). The version of the MOEA/D algorithm proposed in [17] uses two parameters that are intended to prevent premature convergence of the population. These parameters are \(\delta \) and \(\eta _T\). During parent selection, solutions are selected from the neighborhood \(B(i)\) with probability \(\delta \) and from the entire population with probability \(1 - \delta \). The \(\delta \) parameter was set to \(0.9\). The \(\eta _T\) parameter determines the maximum number of solutions that can be replaced by a new child solution. This parameter is used to prevent a situation in which too many solutions in a certain neighborhood are replaced by a single child solution. The \(\eta _T\) parameter was set to 2.

The genetic operators were the same as used by the authors of the MOEA/D algorithm in [17] who suggested using the Differential Evolution (DE) operator [22] instead of the Simulated Binary Crossover (SBX) operator [5] used in their previous paper on MOEA/D [27]. The Differential Evolution operator is parameterized by two parameters \(CR\) and \(F\). The values of these parameters were set to \(CR = 1.0\) and \(F = 0.5\). Mutation was performed using the polynomial mutation operator [6, 11]. Following the parameterization proposed by the authors of the MOEA/D algorithm the distribution index for the polynomial mutation was set to 20. The probability of mutation was set to \(P_{mut} = 1/n\) where \(n\)—the number of decision variables in a given problem. Tchebycheff decomposition was used for aggregating the objectives to a scalar objective function. All the parameters used in the experiments are summarized in Table 2.

Table 2 Algorithm parameters (note, that population size increases with the increasing number of objectives)

5 Experiments

In the experiments solutions of the test problems F1–F9 were generated using the standard MOEA/D algorithm and using the three methods of directing the selective pressure outwards: “fold”, “reduce” and “diverge” described in Sect. 2. The “diverge” method is parameterized by the parameter \(\alpha \) which controls by how much the weight vectors diverge to the outside. In the experiments the value of this parameter was set to \(\alpha = 0.1\), \(0.2\), \(0.3\), \(0.4\) and \(0.5\). Each algorithm was run 30 times for each test problem. The number of generations in each run was \(N_{gen} = 500\). Solutions generated by the algorithms were compared using the hypervolume (HV) and Inverted Generational Distance (IGD) indicators described in Sect. 4.1.

Figures 8, 9, 10, 11, 12, 13, 14, 15, and 16 contain plots which present, for each of the test problems, the dynamic behavior of the hypervolume indicator. Presented values are medians from 30 runs performed in the experiments.

Fig. 8
figure 8

Hypervolume values obtained for the F1 test problem. Median values from 30 runs

Fig. 9
figure 9

Hypervolume values obtained for the F2 test problem. Median values from 30 runs

Fig. 10
figure 10

Hypervolume values obtained for the F3 test problem. Median values from 30 runs

Fig. 11
figure 11

Hypervolume values obtained for the F4 test problem. Median values from 30 runs

Fig. 12
figure 12

Hypervolume values obtained for the F5 test problem. Median values from 30 runs

Fig. 13
figure 13

Hypervolume values obtained for the F6 test problem. Median values from 30 runs

Fig. 14
figure 14

Hypervolume values obtained for the F7 test problem. Median values from 30 runs

Fig. 15
figure 15

Hypervolume values obtained for the F8 test problem. Median values from 30 runs

Fig. 16
figure 16

Hypervolume values obtained for the F9 test problem. Median values from 30 runs

Figures 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, and 32 present the Pareto fronts obtained in the experiments for the MOEA/D algorithm and the best performing (in terms of the hypervolume indicator) of the modified versions of the algorithm proposed in this paper. The Pareto fronts in the figures were obtained by taking all the nondominated solutions from 30 runs of each of the algorithms.

Fig. 17
figure 17

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F1 problem in 30 runs

Fig. 18
figure 18

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.1\) for the F1 problem in 30 runs

Fig. 19
figure 19

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F2 problem in 30 runs

Fig. 20
figure 20

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.2\) for the F2 problem in 30 runs

Fig. 21
figure 21

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F3 problem in 30 runs

Fig. 22
figure 22

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.1\) for the F3 problem in 30 runs

Fig. 23
figure 23

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F4 problem in 30 runs

Fig. 24
figure 24

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.2\) for the F4 problem in 30 runs

Fig. 25
figure 25

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F5 problem in 30 runs

Fig. 26
figure 26

Pareto front of all nondominated solutions obtained by the “reduce” method for the F5 problem in 30 runs

Fig. 27
figure 27

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F7 problem in 30 runs

Fig. 28
figure 28

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.1\) for the F7 problem in 30 runs

Fig. 29
figure 29

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F8 problem in 30 runs

Fig. 30
figure 30

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.2\) for the F8 problem in 30 runs

Fig. 31
figure 31

Pareto front of all nondominated solutions obtained by the MOEA/D algorithm for the F9 problem in 30 runs

Fig. 32
figure 32

Pareto front of all nondominated solutions obtained by the “diverge” method with \(\alpha = 0.1\) for the F9 problem in 30 runs

Figures 33, 34, and 35 contain boxplots which present, for each of the test problems, the distributions of the hypervolume indicator obtained at the last (500th) generation by each of the tested algorithms.

Fig. 33
figure 33

Distribution of the hypervolume obtained at the last (500th) generation by each of the algorithms for the F1–F3 test problems

Fig. 34
figure 34

Distribution of the hypervolume obtained at the last (500th) generation by each of the algorithms for the F4–F6 test problems

Fig. 35
figure 35

Distribution of the hypervolume obtained at the last (500th) generation by each of the algorithms for the F7–F9 test problems

From the figures that present the dynamic behavior of the hypervolume indicator and from the boxplots that present the distribution of the hypervolume values it can be observed that the standard MOEA/D algorithm was outperformed by other algorithms - most often the ones using the “diverge” method. To validate this observation statistical testing was performed in which the results produced by each of the algorithms were compared to those produced by the standard MOEA/D algorithm. In most cases the normality of the distribution of hypervolume values cannot be guaranteed, therefore a test that does not require this assumption had to be chosen. In this paper the Wilcoxon signed rank test [20] introduced in [25] was used. This test was recommended in a recent survey [8] as one of the methods suitable for statistical comparison of results produced by metaheuristic optimization algorithms. The Wilcoxon test does not assume the normality of data distributions and has a null hypothesis which states the equality of medians.

Tables 3, 4, 5, 6, 7, 8, 9, 10, and 11 present results obtained by all the algorithms after 500 generations. The median hypervolume value from all the 30 runs is given and the results of the statistical test are presented. The p-value produced by the Wilcoxon test is the upper bound of the probability of obtaining the presented results if the null hypothesis (equality of medians) holds. Therefore, low p-values support a conclusion that the medians obtained for a given algorithm and for the standard MOEA/D algorithm are not equal. A value in the “interpretation” column has the following meaning. If the median for a given algorithm is lower than that for the standard MOEA/D algorithm the interpretation is “worse”. If the median for a given algorithm is higher than that for the standard MOEA/D algorithm the interpretation is “significant” if the corresponding p-value is not larger than \(0.05\) and “insignificant” otherwise.

Table 3 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F1 test problem
Table 4 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F2 test problem
Table 5 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F3 test problem
Table 6 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F4 test problem
Table 7 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F5 test problem
Table 8 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F6 test problem
Table 9 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F7 test problem
Table 10 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F8 test problem
Table 11 Median hypervolume values obtained at the last (500th) generation by each of the algorithms for the F9 test problem

Table 12 contains a summary of statistical tests performed on the results obtained by each of the algorithms on each of the test problems (presented in detail in Tables 3, 4, 5, 6, 7, 8, 9, 10, and 11). In this table the sign “+” is used to denote that a given algorithm performed significantly better than the standard MOEA/D algorithm. The sign “#” is used to denote that a given algorithm performed better than the standard MOEA/D algorithm but no statistical significance was achieved. The sign “\(-\)” is used to denote that a given algorithm performed worse than the standard MOEA/D algorithm.

Table 12 The summary of the results of statistical tests performed for each of the algorithms working on each of the test problems

The algorithm based on the “diverge” method is parameterized by the parameter \(\alpha \) which controls the divergence of the weight vectors. In the experiments the value of this parameter was set to \(\alpha = 0.1\), \(0.2\), \(0.3\), \(0.4\) and \(0.5\). Among these values the value of \(\alpha = 0.2\) yielded the best results in the case of test problems F2, F4 and F8. For all the other problems the “diverge” method produced the best results for \(\alpha = 0.1\).

The evaluation using the Inverted Generational Distance (IGD) indicator is less conclusive than the one performed using the hypervolume indicator. The values of the IGD obtained in the experiments are presented in Tables 1314 and 15. The best (lowest) value for each test problem is marked in bold. The original MOEA/D algorithm obtained the best IGD value for F2 and F6 test problems. In the remaining cases the versions of the algorithm that are proposed in this paper obtained better results. The versions that produced better results than the original MOEA/D algorithm were “fold”, “reduce” and “diverge” with \(\alpha = 0.1\). However, these versions produced best results for different test problems. Therefore it is hard to choose the best performing algorithm with respect to the IGD indicator.

Table 13 Median values of the Inverted Generational Distance (IGD) indicator obtained at the last (500th) generation by each of the tested algorithms for the F1–F3 test problems
Table 14 Median values of the Inverted Generational Distance (IGD) indicator obtained at the last (500th) generation by each of the tested algorithms for the F4–F6 test problems
Table 15 Median values of the Inverted Generational Distance (IGD) indicator obtained at the last (500th) generation by each of the tested algorithms for the F6–F9 test problems

6 Conclusion

In this paper three approaches to increasing the selective pressure to the outside of the Pareto front were proposed: the “fold”, “reduce” and “diverge” methods. The investigated methods change the way in which weight vectors affect the working of the multiobjective evolutionary algorithm based on decomposition (MOEA/D). The motivation for increasing the outwards selective pressure is that in the standard version of the algorithm the specimens which are placed near the edges of the Pareto front can only exchange information with specimens that direct their search to the inside of the Pareto front. Such mechanism can be expected to decrease the capability of the algorithm to extend the area of the search.

In order to validate the proposed methods experiments on nine test problems F1–F9 were performed. These test problems were specifically proposed in [17] as benchmarks for testing the MOEA/D algorithm on problems with complicated Pareto sets. The quality of Pareto fronts found by the tested algorithms was evaluated using the hypervolume indicator. The results of the experiments show that the algorithm based on the “diverge” method in which weight vectors are pointed outwards was able to significantly improve the optimization results for all the test problems except F5. In the case of the 3-dimensional problem F6 the improvement was obtained for the values of the parameter \(\alpha < 0.5\). For all the problems the best results for the “diverge” method were obtained when the parameter \(\alpha \) was set either to \(0.1\) or to \(0.2\). A possible explanation of these results is that high values of the \(\alpha \) parameter cause the algorithm to put too much pressure on extending the search instead of improving the objectives. Other algorithms, namely “fold” and “reduce” were able to improve the results for some of the test problems but without statistical significance. The only exception is the F1 problem for which all the proposed methods performed significantly better than the standard MOEA/D algorithm. To the contrary, on the F5 problem no significant improvement has been observed, even though some of the tested methods produced better results than the standard MOEA/D algorithm. Distributions of hypervolume values presented in Fig. 34 show that for the F5 problem the quality of results produced by all the algorithms varies greatly between runs. This makes it hard to determine if the observed differences in median values are caused by a better working of a particular algorithm or by coincidental variation of the performance of the search. It is worth noticing that also the authors of the MOEA/D algorithm observed in [17] that this algorithm performs poorly on the test problem F5.

In general, the proposed approach based on extending the selective pressure outwards seems to improve the quality of the results. On most of the test problems the improvement obtained by the “diverge” method is statistically significant at the confidence level \(0.05\). The “fold” and “reduce” methods performed worse than the “diverge” method with the only exception of the F5 test problem for which the “reduce” method performed best. This suggests that it is not enough to only increase the probability of choosing the outermost weight vectors in the selection process. There seems to be a significant benefit in explicitly directing the search towards the outside of the Pareto front.

Further work may include hybridization of the proposed approach with other optimization algorithms and local search methods. For example, hybrid approaches presented in [1, 14, 16, 26] could be combined with the weight generation scheme proposed in this paper. Local search procedures work on solutions already generated by the main evolutionary algoritm and usually are executed as a separate subroutine. Therefore, it is possible to choose independently which weight generation scheme to use and what kind of solution improvement procedure to employ for a given problem.