A memetic procedure for global multi-objective optimization

Lapucci, Matteo; Mansueto, Pierluigi; Schoen, Fabio

doi:10.1007/s12532-022-00231-3

A memetic procedure for global multi-objective optimization

Full Length Paper
Open access
Published: 22 November 2022

Volume 15, pages 227–267, (2023)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Programming Computation Aims and scope Submit manuscript

A memetic procedure for global multi-objective optimization

Download PDF

2759 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper we consider multi-objective optimization problems over a box. Several computational approaches to solve these problems have been proposed in the literature, that broadly fall into two main classes: evolutionary methods, which are usually very good at exploring the feasible region and retrieving good solutions even in the nonconvex case, and descent methods, which excel in efficiently approximating good quality solutions. In this paper, first we confirm, through numerical experiments, the advantages and disadvantages of these approaches. Then we propose a new method which combines the good features of both. The resulting algorithm, which we call Non-dominated Sorting Memetic Algorithm, besides enjoying interesting theoretical properties, excels in all of the numerical tests we performed on several, widely employed, test functions.

The directed search method for multi-objective memetic algorithms

Article 20 August 2015

FEMOEA: a fast and efficient multi-objective evolutionary algorithm

Article 17 September 2016

MOEA/D-GLS: a multiobjective memetic algorithm using decomposition and guided local search

Article 21 September 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multi-objective optimization problems have a significant relevance in many applications of various fields, such as engineering [5, 47, 50], management [28, 55], statistics [7], space exploration [45, 51], etc.. It is thus not surprising that many research streams have flourished around this topic for the last 25 years.

Multi-objective optimization (MOO) exhibits two major complexities, that coupled together make problems particularly difficult to handle. The first complexity element is the general absence of a solution minimizing all the objective functions simultaneously; as a consequence, the definitions of optimality (global, local and stationarity), based on Pareto’s theory, are not trivial and make optimization processes not obvious, both in terms of aims and tools. The second one, on the other hand, traces one classical issue typical of scalar optimization: in absence of convexity assumptions, there is no equivalence between local and global (Pareto) optimality.

The combination of the two aforementioned features of MOO problems makes evolutionary-type algorithms (EAs) particularly well suited to be used and, indeed, they have been the most widely studied class of algorithms for decades in this context [35, 44]. The NSGA-II algorithm [13] is arguably the most popular among these methods; basically, it is a population-based procedure exploiting a cheaply computable score to efficiently rank solutions w.r.t. the objectives, and performing the classical genetic crossover, mutation and selection operations to create the new generation of solutions. Basically, NSGA-II represents the de facto standard, at least popularity-wise, for unconstrained and bound-constrained MOO.

In fact, alongside the EA stream, two different classes of approaches have been studied for MOO. The first one concerns scalarization approaches [17, 18, 46]. However, solving MOO problems by scalarization has some drawbacks: firstly, unfortunate choices of weights may lead to unbounded scalar problems, even under strong regularity assumptions [20, Section 7]; moreover, scalarization is designed to produce a single solution and, in order to generate an approximation of the whole Pareto front, the problem has to be repeatedly solved with different choices of weights; unfortunately, it is not known a priori how weights should be selected to obtain a wide and uniform Pareto front.

The other family of methods is that of MO descent methods (either first-order, second-order and derivative-free) [6, 8, 16, 20, 21, 26, 27]. These methods mimic classical iterative scalar optimization algorithms.

Originally, these methods were designed to generate a single Pareto-stationary solution, but in recent years specific strategies have been proposed to handle lists of points and to generate approximations of the entire Pareto front [9,10,11,12, 22, 38]. Numerical results show that these methods, when used on problems with reasonable regularity assumptions, are effective and much more efficient than evolutionary methods, especially as the problem size grows [9, 10, 12].

On the other hand, descent algorithms have convergence properties that are theoretically relevant but, in practice, guarantee only Pareto stationarity of the retrieved solutions. This is a significant limitation with highly non-convex problems, similarly as for gradient-based algorithms in scalar optimization that may converge to stationary points which are not even local optima.

In scalar global optimization, particularly successful strategies are memetic ones. Memetic algorithms combine population-based techniques (either heuristic and/or genetic ones) and local search steps [4, 29, 30, 39, 40, 43]. In the case of MOO, this idea has only superficially been considered. In fact, each class of MOO algorithms has practical drawbacks. EAs have no theoretical convergence property [20] and are usually expensive [31, 37, 49]. On the other hand, descent algorithms often produce suboptimal solutions when starting from non carefully chosen points and are thus not suitable for highly non-convex problems. For these reasons, some MO memetic approaches have been proposed in the literature. However, we can find approaches that are mostly application-specific [54] or that employ heuristic [33, 34, 36, 41], meta-heuristic [1, 53], stochastic [15, 19] or scalarization-based [31, 49, 52] local search steps. Even the few proposed strategies employing gradient information for the local search steps do not exploit the concept of common descent directions. Rather, convex combinations of gradients are generated and exploited in various ways [2, 3, 34, 37, 48].

In this paper, we show by computational experiments the benefits and limitations of evolutionary and MO descent algorithms in different settings (high/low dimensionality, convex/non-convex objectives). Then, we propose a memetic algorithm for bound-constrained MOO problems, combining an evolutionary approach (namely, the popular NSGA-II) with MO descent methods, similarly to what is done in the scalar case in [30]. We finally show that the proposed method, that inherits the good features of both EA and MO descent families, outperforms the state-of-the-art MOO solvers in any setting.

The rest of the manuscript is organized as follows. In Sect. 2, we recall the main concepts of both the descent methods and the NSGA-II algorithm. In Sect. 3, we provide a description of our memetic algorithm along with a theoretical analysis. In Sect. 4, we first compare the two families of algorithms in some specific problems, in order to show the benefits and the shortcomings of both. Then, we show the results of computational experiments highlighting the good performance of our approach w.r.t. main state-of-the-art methods. Finally, in Sect. 5 we provide some concluding remarks.

2 Preliminaries

In this work, we consider multi-objective optimization problems of the form

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbb {R}^n}\;&F(x)=(f_1(x),\ldots ,f_m(x))^T\\\text {s.t. }&x \in [l, u], \end{aligned} \end{aligned}$$

(1)

where $F:\mathbb {R}^n\rightarrow \mathbb {R}^m$ is a continuously differentiable function and $l,u\in \mathbb {R}^n$ with $l_i\le u_i \forall i \in \{1,\ldots , n\}$. The values of l and u may possibly be infinite. Given the boundary constraints, we denote the feasible set, which is closed, convex and non empty, by $\varOmega = \{x \in \mathbb {R}^n \mid x \in [l, u]\}$. We denote by $J_F$ the Jacobian matrix associated with F.

In order to introduce some preliminary concepts of multi-objective optimization, we define a partial ordering of the points in $\mathbb {R}^m$. Considering two points $u, v \in \mathbb {R}^m$, we define

$$\begin{aligned} \begin{array}{lcr} u< v \iff u_i < v_i \forall i = 1,\ldots , m, \\ u \le v \iff u_i \le v_i \forall i= 1,\ldots , m. \end{array} \end{aligned}$$

If $u \le v$ and $u \not = v$, we can say that u dominates v and we use the following notation: $u \lneqq v$. Finally, we say that the point $x \in \mathbb {R}^n$ dominates $y \in \mathbb {R}^n$ w.r.t. F if $F(x) \lneqq F(y)$.

In multi-objective optimization problems, we ideally would like to obtain a point which simultaneously minimizes all the objectives $f_1,\ldots , f_m$. However, such a solution is unlikely to exist. For this reason, the Pareto optimality concepts have been introduced.

Definition 1

A point $\bar{x}\in \varOmega $ is Pareto optimal for Problem (1) if a point $y\in \varOmega $ such that $F(y)\lneqq F(\bar{x})$ does not exist. If there exists a neighborhood $\mathcal {N}(\bar{x})$ such that the previous property is satisfied in $\varOmega \cap \mathcal {N}(\bar{x})$, then $\bar{x}$ is locally Pareto optimal.

In practice, it is difficult to attain solutions characterized by the Pareto optimality property. A slightly weaker property is weak Pareto optimality.

Definition 2

A point $\bar{x}\in \varOmega $ is weakly Pareto optimal for Problem(1) if a point $y\in \varOmega $ such that $F(y)< F(\bar{x})$ does not exist. If there exists a neighborhood $\mathcal {N}(\bar{x})$ such that the previous property is satisfied in $\varOmega \cap \mathcal {N}(\bar{x})$, then $\bar{x}$ is locally weakly Pareto optimal.

We refer to the set of all Pareto optimal solutions of the problem as the Pareto set, while by Pareto front we refer to the image of the Pareto set through F.

We can now introduce the concept of Pareto stationarity.

Definition 3

A point $\bar{x}\in \varOmega $ is Pareto-stationary for Problem(1) if we have that

$$\begin{aligned} \max _{j=1,\ldots ,m}\nabla f_j(\bar{x})^Td\ge 0, \end{aligned}$$

for all feasible directions $d\in \mathcal {D}(\bar{x})=\{v\in \mathbb {R}^n\mid \exists \bar{t}>0:\bar{x}+tv\in \varOmega \; \forall \, t\in [0,\bar{t}\,]\}$.

Under differentiability assumptions, Pareto stationarity is a necessary condition for all types of Pareto optimality. Further assuming the convexity of the objectives in Problem (1), the condition is also sufficient for Pareto optimality. The property can be compactly re-written as

$$\begin{aligned} \min _{d \in \mathcal {D}(\bar{x})} \max _{j=1,\ldots ,m} \nabla f_j(\bar{x})^Td = 0. \end{aligned}$$

Finally, we introduce a relaxation of Pareto stationarity, recalling the $\varepsilon $-Pareto-stationarity concept introduced in [8].

Definition 4

Let $\varepsilon \ge 0$. A point $\bar{x} \in \varOmega $ is $\varepsilon $-Pareto-stationary for Problem(1) if

$$\begin{aligned} \min _{\begin{array}{c} d \in \mathcal {D}(\bar{x})\\ \Vert d\Vert \le 1 \end{array}} \max _{j=1,\ldots ,m} \nabla f_j(\bar{x})^Td \ge -\varepsilon . \end{aligned}$$

In the following, we briefly review evolutionary and descent algorithms for MOO, with particular emphasis on the NSGA-II algorithm and steepest/projected gradient descent methods respectively.

2.1 Multi-objective descent methods

First of all, let us address the following unconstrained optimization problem.

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbb {R}^n}\;&F(x)=(f_1(x),\ldots ,f_m(x))^T. \end{aligned} \end{aligned}$$

(2)

If a point $\bar{x} \in \mathbb {R}^n$ is not Pareto-stationary, then there exists a descent direction w.r.t. all the objectives. Therefore, following [21, Section 3.1], we can introduce the steepest common descent direction as the solution of problem

$$\begin{aligned} \min _{\begin{array}{c} d\in \mathbb {R}^n \\ \left\| d \right\| \le 1 \end{array}}\max _{j=1,\ldots ,m} \nabla f_j(\bar{x})^Td, \end{aligned}$$

(3)

which, if $\ell _\infty $ norm is employed, can be re-written as an LP one:

$$\begin{aligned} \begin{aligned} \min _{\beta \in \mathbb {R},\,d \in \mathbb {R}^n}\;&\beta \\ \text {s.t. }&-1\le d_i\le 1\quad \forall \,i=1,\ldots ,n,\\&\nabla f_j(\bar{x})^Td\le \beta \quad \forall \,j=1,\ldots ,m. \end{aligned} \end{aligned}$$

(4)

A slightly different problem formulation is the $\ell _2$-regularized one, which is again proposed in [21]. However, we preferred to use formulation (3) because of the simplicity of the LP problem. We define the continuous function $\theta : \mathbb {R}^n \rightarrow \mathbb {R}$ such that $\theta (\bar{x})$ indicates the optimal value of Problem (3) at $\bar{x}$. If $\bar{x}$ is Pareto-stationary, $\theta (\bar{x}) = 0$, otherwise $\theta (\bar{x}) < 0$. We also denote by $\mathcal {V}(\bar{x}) \subseteq \mathbb {R}^n$ the set of optimal solutions to Problem (3). Indeed, the solution may not be unique, although this fact is not a real technical issue.

Based on the concept of steepest common descent direction, the standard Multi-Objective Steepest Descent (MOSD) algorithm was proposed in [21]. In MOSD, a back-tracking Armijo-type Line Search (ALS) is used. The idea of this latter one is to reduce the step size until we get a sufficient decrease for all the objective functions. We report ALS in Algorithm 1.

We now recall the main theoretical results of the two algorithms, starting from the finite termination property of the line search.

Lemma 1

[21, Lemma 4] If F is continuously differentiable and $J_F(x)d<0$ (i.e. $\theta (x)<0$), then there exists some $\varepsilon > 0$, which may depend on x, d and $\beta $, such that

$$\begin{aligned} F(x+td) < F(x) + \beta t J_F(x)d \end{aligned}$$

for all $t\in (0,\varepsilon ]$.

Regarding the MOSD procedure, the following convergence property holds.

Lemma 2

[21, Theorem 1, Section 9.1] Every accumulation point of the sequence $\{x_k\}$ produced by the MOSD algorithm is a Pareto-stationary point. If the function F has bounded level sets, in the sense that $\{x\in \mathbb {R}^n\mid F(x)\le F(x_0)\}$ is bounded, then the sequence $\{x_k\}$ stays bounded and has at least one accumulation point.

Through the years, the MOSD procedure was extended to handle a sequence of sets $\{X_k\}$ of non-dominated points, rather than a sequence of points, aiming to approximate the Pareto front of the optimization problems. Indeed, in the multi-objective context, an approximation of the Pareto front could be much more useful than a single solution: the user is free to choose, a posteriori, the solution providing the most appropriate trade-off among many. An algorithm representing an extension of the MOSD procedure is the one introduced in [10], which is called Front Steepest Descent Algorithm (FSDA).

In the next lemma, we report the convergence property of the algorithm, where the authors use the concept of linked sequence introduced in [38].

Definition 5

A sequence $\{x_k\}$ is a linked sequence if, for all k, $x_k \in X_k$ and $x_k$ is generated at iteration $k - 1$ starting the search procedure from $x_{k - 1}$.

Lemma 3

[10, Proposition 5] Let $\{X_k\}$ be the sequence of sets of non-dominated points produced by FSDA. Let us assume that there exists a point $x_0 \in X_0$ such that:

$x_0$ is not Pareto-stationary;
the set $\mathcal {L}(x_0) = \bigcup _{j=1}^{m}\{x \in \mathbb {R}^n: f_j(x) \le f_j(x_0)\}$ is compact.

Let $\{x_k\}$ be a linked sequence, then it admits limit points and every limit point is Pareto-stationary for Problem (2).

In [10], two concepts are introduced. The first one is the steepest partial descent direction at $\bar{x}$ w.r.t. a subset of indices of objectives $I \subseteq \{1,\ldots , m\}$. This type of direction at a point $\bar{x}$ can be found solving this optimization problem:

$$\begin{aligned} \min _{\begin{array}{c} d\in \mathbb {R}^n \\ \left\| d \right\| \le 1 \end{array}} \max _{j \in I} \nabla f_j(\bar{x})^Td. \end{aligned}$$

(5)

As for the steepest common descent direction, we define the continuous function $\theta ^I: \mathbb {R}^n \rightarrow \mathbb {R}$, where $\theta ^I(\bar{x})$ indicates the optimal value of Problem (5) at $\bar{x}$. Accordingly, we denote by $\mathcal {V}^I(\bar{x}) \subseteq \mathbb {R}^n$ the set of optimal solutions to the Problem(5). If appropriately used, the steepest partial descent direction may be useful to spread the search in the objectives space and to reach the extreme regions of the Pareto front.

For our purposes, given a subset I, we also define $F_I(\bar{x})$ as the |I|-dimensional vector with components $f_j(\bar{x})$, with $j \in I$. In addition, given a set of points $\bar{X}$, we introduce the set $\bar{X}^I \subseteq \bar{X}$ as the set of points that are mutually non-dominated w.r.t. $F_I$, i.e.

$$\begin{aligned} \bar{X}^I = \{x \in \bar{X} \mid \not \exists y \in \bar{X} \text { s.t. } F_I(y) \lneqq F_I(x)\}. \end{aligned}$$

The second concept introduced in [10] is a weaker front-based variant of ALS: we call it the Front Armijo-Type Line Search (FALS). We report it in Algorithm 2. In FALS, the step size is reduced until a sufficient decrease is reached w.r.t. all the points in $\bar{X}^I$ for at least one of the objective functions $f_j$, with $j \in I$. FALS can be considered as a weak extension of ALS to the multi-objective case. As it is not required to obtain a sufficient decrease for all the objective functions, employing FALS leads to two consequences: less required computational time and bigger values for the step size. These features again may be very useful to obtain good and spread Pareto front approximations in a short time.

FALS has a finite termination property, which we recall in the following lemma.

Lemma 4

[10, Proposition 4] Let $I \subseteq \{1,\ldots , m\}$, $x_c \in X_k^I$ be such that $\theta ^I(x_c) < 0$, i.e. there exists a direction $d_c^I \in \mathcal {V}^I(x_c)$ such that

$$\begin{aligned} \nabla f_j(x_c)^Td^I_c < 0 \end{aligned}$$

$\forall j \in I$. Then $\exists \bar{\alpha } > 0$, sufficiently small, such that

$$\begin{aligned} F_I(y) + \textbf{1}\beta \bar{\alpha }\theta ^I(x_c) \not < F_I(x_c + \bar{\alpha } d_c^I), \forall y \in X_k^I, \end{aligned}$$

i.e., the while loop of FALS terminates in a finite number $\bar{h}$ of iterations, returning a value $\bar{\alpha } = \delta ^{\bar{h}}\alpha _0$. Furthermore, the produced point $x_c + \bar{\alpha }d_c^I$ is not dominated with respect to the set $X_k$.

Since we consider bound constraints in Problem(1), we need to recall the (single-point) Multi-Objective Projected Gradient (MOPG) method. This latter algorithm was firstly introduced in [16] and then developed and analyzed in [24, 25]. In addition, the MOPG main results were summarized in [23]. The method is an extension of the MOSD procedure dealing with constrained problems. In particular, it deals with optimization problems characterized by a feasible closed and convex set $\varOmega $. We report the MOPG procedure in Algorithm 3.

The first difference w.r.t. the MOSD procedure is the way the direction is retrieved, since now the problem constraints have to be considered. The steepest descent direction is now defined as the solution of

$$\begin{aligned} \min _{\begin{array}{c} z \in \varOmega \\ \Vert z - \bar{x} \Vert \le 1 \end{array}} \max _{j=1,\ldots ,m} \nabla f_j(\bar{x})^T(z - \bar{x}). \end{aligned}$$

(6)

Note that, again, in practice we employ the $\ell _\infty $ norm so that the problem can be reformulated as an LP problem similar to (4). As in the unconstrained case, we define the continuous function $\theta _\varOmega : \mathbb {R}^n \rightarrow \mathbb {R}$ such that $\theta _\varOmega (\bar{x})$ indicates the optimal value of Problem(6) at $\bar{x}$. We also denote by $\mathcal {Z}_\varOmega (\bar{x}) \subseteq \varOmega $ the set of optimal solutions to Problem(6), and by $\mathcal {V}_\varOmega (\bar{x}) = \{z - \bar{x} \mid z \in \mathcal {Z}_\varOmega (\bar{x})\} \subseteq \varOmega $ the set of optimal directions. We denote these latter ones as constrained steepest common descent directions; if a subset $I \subseteq \{1,\ldots , m\}$ is considered, they are referred to constrained steepest partial descent directions. If $\theta _\varOmega (\bar{x}) < 0$, the point is not Pareto-stationary and we proceed to find a step size through ALS. As opposed to the unconstrained case, where $\alpha _0$ can be any positive real number, in the MOPG procedure $\alpha _0 = 1$. Since the set $\varOmega $ is convex, d is a feasible direction by construction and $\alpha _0 = 1$, every produced point will be feasible.

We report here two theoretical results of the MOPG method.

Lemma 5

[23, Lemma 4.3] Let $\{x_k\} \subset \mathbb {R}^n$ be a sequence generated by MOPG. Then, we have $\{x_k\} \subset \varOmega $.

Lemma 6

[23, Theorem 4.4] Every accumulation point, if any, of a sequence $\{x_k\}$ generated by MOPG is a feasible Pareto-stationary point.

In order to deal with box-constrained optimization problems, we adapted the FSDA algorithm [10]. We call the adaptation Front Projected Gradient Algorithm (FPGA) and the differences w.r.t. the original algorithm are the following:

the initial set $X_0$ is composed by feasible non-dominated points w.r.t. F;
the direction is found solving Problem(6) instead of Problem(3);
we employ the Bound-constrained Front Armijo-Type Line Search (B-FALS), which is a modified version of FALS that we introduced to take into account bound constraints.

B-FALS, which we report in Algorithm 4, is similar to FALS: the only added requirement is that the step size must lead to a point that is feasible.

Through the following proposition, we show that B-FALS terminates in a finite number of iterations.

Proposition 1

Let $I \subseteq \{1,\ldots , m\}$, $x_c \in X_k^I$ be such that $\theta _\varOmega ^I(x_c) < 0$, i.e. there exists a direction $d_{\varOmega c}^I \in \mathcal {V}_\varOmega ^I(x_c)$ such that

$$\begin{aligned} \nabla f_j(x_c)^Td_{\varOmega c}^I < 0 \end{aligned}$$

$\forall j \in I$. Then $\exists \bar{\alpha } > 0$, sufficiently small, such that

$$\begin{aligned} x_c + \bar{\alpha } d_{\varOmega c}^I \in \varOmega \end{aligned}$$

and

$$\begin{aligned} F_I(y) + \textbf{1}\beta \bar{\alpha }\theta _\varOmega ^I(x_c) \not < F_I(x_c + \bar{\alpha } d_{\varOmega c}^I) \forall y \in X_k^I, \end{aligned}$$

i.e., the while loop of B-FALS terminates in a finite number $\bar{h}$ of iterations, returning a value $\bar{\alpha } = \delta ^{\bar{h}}\alpha _0$. Furthermore, the produced point $x_c + \bar{\alpha }d_{\varOmega c}^I$ is not dominated with respect to the set $X_k$.

Proof

Assume by contradiction that the thesis is false. Then the algorithm produces an infinite sequence $\{\delta ^h\alpha _0\}$ such that, for all h, either

$$\begin{aligned} x_c + \delta ^h\alpha _0 d_{\varOmega c}^I \not \in \varOmega \end{aligned}$$

or a point $y_h \in X_k^I$ exists such that

$$\begin{aligned} F_I(y_h) + \textbf{1}\beta \delta ^h\alpha _0\theta _\varOmega ^I(x_c) < F_I(x_c + \delta ^h\alpha _0 d_{\varOmega c}^I). \end{aligned}$$

(7)

Since $\varOmega $ is convex, $d_{\varOmega c}^I$ is a feasible direction by construction and $\delta ^h\alpha _0\rightarrow 0$ as $h\rightarrow \infty $, for h sufficiently large the point $x_c+\delta ^h\alpha _0d_{\varOmega c}^I$ is feasible and thus Condition (7) holds. Then, following the proof of Proposition 4 in [10], we can prove the thesis.$\square $

We provide a full description of FPGA in Appendix A, along with feasibility and convergence properties. In this work, we consider the FPGA algorithm as a representative of gradient-based methods designed to produce Pareto front approximations for bound-constrained optimization problems.

In [10], a variant of the FSDA algorithm exploiting an Armijo-type extrapolation technique is also introduced. The authors claim that this variant outperforms the original algorithm. For this reason, in the initial stage of our work, we decided to also adapt this variant for box-constrained optimization problems. However, some preliminary computational experiments led us to the somewhat surprising conclusion that vanilla FPGA is better than using extrapolation. A possible justification of this result may lie in the presence of constraints. Anyhow, we decided for the sake of brevity to only consider FPGA in the remainder of the paper.

2.2 NSGA-II

NSGA-II is a non-dominated sorting-based multi-objective evolutionary algorithm that was proposed in [13]. In particular, NSGA-II is a genetic algorithm that creates a mating pool by combining the parent and offspring populations and selecting the best N solutions. In this section, we review the main characteristics of NSGA-II. For a deeper understanding of the algorithm mechanisms, the reader is referred to the original work [13].

We report the main steps of NSGA-II in Algorithm 5.

NSGA-II deals with a fixed size population (N solutions) and takes as input an initial population $X_0$. For the sake of clarity, from now on we consider $X_0$ as a set composed by N feasible solutions. However, we want to remark two facts.

Starting with a population $X_0$ composed by N points is not necessary: if the population is smaller/bigger, after the first iteration it is increased/reduced in order to get exactly N solutions in it.
NSGA-II can also manage unfeasible points. However, since in our work we address bound constrained problems, and the genetic operators ensure that after the first iteration no point in the population violates the bound constraints, we assume that $X_0$ is only composed by feasible points.

The core idea of the algorithm is that during an iteration:

the parents are chosen among the current solutions (Line 7);
N offsprings are created from the parents through the crossover operator (Line 8);
the offsprings are mutated using the mutation function (Line 9);
a new population of 2N solutions is created merging the current population with the offsprings (Line 10);
by the function getMetrics scores are associated to the members (Line 11);
only the best N points (survivors) are selected and maintained (Line 12).

The crossover and mutation operators have a crucial role in the NSGA-II mechanisms. The crossover operator aim is the creation of offsprings that inherit (hopefully the best) features of the parents. The mutation operator introduces some random changes in the offsprings. This latter one could be useful when we want to spread our search in the objectives space as much as possible. For a more detailed and technical explanation about these two operators, we again refer the reader to [13]. We want to remark here that the NSGA-II mechanisms ensure that there are no duplicates among the offsprings and any offspring is not a duplicate of any point in the current population. At the end of the algorithm execution, the current population $X_k$ is returned.

In the next subsections, we provide other details of the algorithm that are useful for our purposes.

2.2.1 Metrics

In this section, we explain the metrics used in the NSGA-II mechanisms (computed in the getMetrics function). In particular, these scores are used to select the parents and the survivors.

The first one is the ranking, which leads to a splitting of the population in different domination levels. Briefly, if a point has rank 0, it means that it is not dominated by any point in $X_k$ w.r.t. F. If it has rank 1, it is dominated by some of the points with rank 0, but it is not w.r.t. any other point with rank equal to or greater than 1. In order to obtain the ranking values, a fast sorting approach is employed, which is one of the strength elements of the NSGA-II algorithm.

The second considered metric is crowding distance. It is useful to get an estimate of the density of solutions surrounding a particular point in the population. Having a high crowding distance indicates that the point is in a poorly populated area of the objectives space, and maintaining it in the population may likely lead to a spread Pareto front. Note that for each point this metric is calculated with respect to the solutions with the same ranking value.

We again refer the reader to the original paper [13] for the rigorous definition of the metrics.

2.2.2 Parents selection

In the getParents function, pairs of parents are randomly chosen among the solutions in $X_k$. Then, considering a pair, only one of the two points is selected by binary tournament. In this latter one, the solutions are compared in the following way.

The point with the lowest rank is preferred.
If the ranking value is the same for both points, the one with the highest crowding distance is chosen.
In the unlikely case in which the crowding distance values are equal too, a random choice is done.

The selected point will be used with a parent chosen from another pair in the crossover function in order to create offsprings.

This approach of comparing the solutions is also used in the getSurvivors function.

2.2.3 Selection operation

After getting the offsprings through the crossover and mutation operators, the new population is composed by 2N solutions. The aim of the getSurvivors function is to select and maintain the best N solutions. As in the getParents function, the selection is based on the ranking and the crowding distance.

The set composed by the 2N points is initially sorted based on the ranking.
The solutions with the same rank are sorted based on the crowding distance.
The first N points are chosen as the best ones.

3 Non-dominated sorting memetic algorithm

In this section, we introduce a novel memetic algorithm for bound-constrained MOO problems, which we call Non-dominated Sorting Memetic Algorithm (NSMA). We first show and describe the algorithmic scheme. Then, we formally introduce the Front Multi-Objective Projected Gradient (FMOPG) algorithm, which is the descent method used within the NSMA and for which we also provide a rigorous theoretical analysis.

3.1 Algorithmic scheme

The scheme of NSMA is reported in Algorithm 6. Basically, the structure of the proposed algorithm is similar to that of NSGA-II, from which we also inherit all the genetic operators. The main differences between the two methods are constituted by three new operations:

getSurrogateBounds (Line 9);
getCrowdingDistanceThreshold (Line 14);
optimizePopulation (Line 17).

In the next subsections, we give a detailed description of these three new functions.

3.1.1 Estimating surrogate bounds

When the addressed problem is characterized by a particularly large feasible region ($l \ll u$), the NSGA-II algorithm turns out to be slow at obtaining a good approximation of the Pareto front. This issue occurs because of the crossover and, above all, of the mutation operator. Random mutations over a large search area lead from the very first iterations to a population which is overly disperse and far from optimality. In such a scenario, even the effectiveness of the crossover operator might be compromised: some parents might have extremely bad features. As a consequence, NSGA-II may exhibit a performance slowdown.

In NSMA, this issue is solved using surrogate bounds for the crossover and the mutation operators instead of the original ones. These bounds are obtained using the getSurrogateBounds function, which we report in Algorithm 7. The surrogate bounds are computed using the current population and a shift value $s_h$. This latter parameter is employed to progressively enlarge the region where the population can be distributed: a greater value leads to a bigger enlargement.

Ideally, the exploration starts considering only a small portion of the feasible area, which is defined by the initial population and the shift value $s_h$. In this way, the points cannot be moved by the crossover and the mutation operators too far away in the feasible set. At each following iteration, new surrogate bounds are computed to enlarge the search space.

After a number of iterations, it may happen that the surrogate bounds cover a bigger region than the one defined by the original bounds. In such case, the search goes on over the entire feasible set.

3.1.2 Identifying exploration candidates

Similarly as in memetic approaches for scalar optimization, performing local searches starting from each point in a population usually turns out to be inefficient. In fact, a great computational effort is required to optimize many points that in the end do not lead to good solutions.

In the case of NSMA, one may think of only performing local searches for the rank-0 points. However, this idea is inefficient too: during the last iterations, most, if not all, the points are likely to be associated with a ranking value equal to 0. Furthermore, many of these points could be in a high density area of the Pareto front and, therefore, optimizing all of them could be a waste of computational time.

The issue is solved by choosing to optimize the rank-0 points associated with an high crowding distance. As already remarked in Sect. 2.2.1, such points are in a poorly populated area of the objectives space. Therefore, optimizing them, we still contribute to obtain a better approximation of the Pareto front, since they are rank-0 points, and, at the same time, we have the possibility to populate a low density area, leading to a better spread Pareto front.

Through the getCrowdingDistanceThreshold function, which we report in Algorithm 8, we retrieve the q–quantile of the crowding distances of the rank-0 points in $\hat{X}_{k + 1}$. We denote by $\bar{c}_{k + 1}$ this quantity: only the rank-0 points associated with a crowding distance greater than or equal to $\bar{c}_{k + 1}$ will be optimized through the FMOPG algorithm. Smaller values for the parameter q lead to the optimization of a greater number of points.

As stated in [13], some points will be associated to a crowding distance equal to $+\infty $. These points are considered the extreme solutions of the Pareto front w.r.t. a specific objective function. For this reason, they are always used as starting solutions for local searches, since they could lead to a wider Pareto front approximation.

3.1.3 Local searches by multi-objective descent

In the optimizePopulation function, which we report in Algorithm 9, the FMOPG method is employed to refine the population by performing local searches. This function is the core of our memetic approach: it allows to combine the typical features of descent methods with the genetic operators of NSGA-II.

In order to be optimized through FMOPG w.r.t. a subset of indices of objectives $I \subseteq \{1,\ldots , m\}$, a point $x_p$ must satisfy the following conditions:

Its rank must be 0 and its crowding distance must be greater than or equal to $\bar{c}_{k + 1}$ (Line 5a). These requirements are already discussed in Sect. 3.1.2.
It must belong to $\hat{X}_{k + 1}^I$, which is the set of mutually non-dominated points w.r.t. $F_I$ contained in $\hat{X}_{k + 1}$ (Line 5b). A formal definition of this set can be found in Sect. 2.1. Trying to optimize points which are not contained in $\hat{X}_{k + 1}^I$ could be useless, since we have no guarantee to reach a non-dominated point w.r.t. $F_I$.
It must not be Pareto-stationary w.r.t. $F_I$ (Line 5c).

If the point satisfies all these requirements, it is used as starting solution in the FMOPG algorithm. Along with it, the set $\hat{X}_{k + 1}^I$ is used as input for the algorithm. FMOPG returns the set of produced solutions, which are collected in the set $\tilde{X}_p$ and inserted in the set $\hat{X}_{k + 1}$.

Lastly, the new population $\hat{X}_{k + 1}$ is reduced in order to have exactly N survivors. This last operation is performed through the getMetrics (Sect. 2.2.1) and the getSurvivors (Sect. 2.2.3) functions of the NSGA-II algorithm.

3.2 The front multi-objective projected gradient algorithm

The Front Multi-Objective Projected Gradient (FMOPG) algorithm is the descent method used in our memetic approach. In particular, it is a variant of the MOPG method (Algorithm 3).

3.2.1 Algorithmic scheme

We report the scheme of FMOPG in Algorithm 10.

The main difference between FMOPG and MOPG is the following: while in the original algorithm the current point $x_k$ is only optimized w.r.t. itself, in FMOPG it is also w.r.t. the set of points in which it is contained. At each iteration, the constrained steepest descent direction $d_{\varOmega k}^I$ at the current point $x_k$ w.r.t. the subset of indices of the objectives I is found (Line 5). Then, in Line 6 a step size $\alpha _k$ is calculated by the B-FALS procedure (Algorithm 4). Given the direction and the step size, a new point $x_{k + 1}$ is obtained (Line 7). This latter one is inserted in the set $X_k$, leading to a new set $X_{k + 1}$ (Line 8).

The FMOPG algorithm iterates until the current solution $x_k$ is Pareto-stationary w.r.t. $F_I$. At the end, the method returns the sequence of points $\{x_k\}$ generated during the iterations. Indeed, considering the stopping conditions of B-FALS, we have no guarantee that, for all k, the point $x_{k + 1}$ dominates $x_k$ w.r.t. $F_I$. So, every point produced by FMOPG could be useful to obtain good and spread Pareto front approximations.

Finally, note that the FMOPG algorithm is called by the optimizePopulation function with an additional parameter $\varepsilon _t$ (Line 7 of Algorithm 5). In fact, FMOPG is executed using $\varepsilon $-Pareto-stationarity as stopping condition. In NSMA (Algorithm 6), we consider a decreasing sequence $\{\varepsilon _t\} \subset \mathbb {R}_0^+$. So, during the iterations, we get closer and closer to the Pareto-stationarity.

3.2.2 Algorithm analysis

In this section, we provide a rigorous analysis of the FMOPG algorithm from a theoretical perspective. The following analysis is crucial to state the convergence properties of FMOPG. These latter ones are crucial to guarantee that local searches within NSMA stop in finite time and, thus, the overall algorithm is well-defined.

Before proceeding, we need to state an assumption.

Assumption 1

Let $I \subseteq \{1,\ldots , m\}$, $X_0\subset \varOmega $ be a set of feasible points and $x_0 \in X_0$. There does not exist a point $y_0 \in X_0$ that dominates $x_0$ w.r.t. $F_I$, i.e. $x_0 \in X_0^I$.

This assumption is reasonable since a point $x_p$ to be optimized through FMOPG must be non-dominated w.r.t. $F_I$ (Sect. 3.1.3).

We begin characterizing the points produced by the FMOPG algorithm.

Proposition 2

Consider a generic iteration k of FMOPG. Let $I \subseteq \{1,\ldots , m\}$, $X_k$ be a set of feasible points and $x_k \in X_k$. Assume that $x_k$ is not dominated by any point in $X_k$ w.r.t. $F_I$. Then, B-FALS returns a step size $\alpha _k > 0$ such that the point $x_{k+1} = x_k+\alpha _kd_{\varOmega k}^I$ is feasible and not dominated by any point in $X_{k+1}$ w.r.t. $F_I$.

Proof

The B-FALS algorithm is performed from $x_k \in X_k^I$, with $\theta _\varOmega ^I(x_k) < 0$, along a constrained steepest descent direction $d_{\varOmega k}^I$. Then, from Proposition 1, B-FALS terminates in a finite number of steps and returns a step size $\alpha _k > 0$ such that the point $x_{k + 1} = x_k + \alpha _kd_{\varOmega k}^I$ has the following properties:

$x_{k + 1} \in \varOmega $;
$x_{k + 1}$ is not dominated by any other point in $X_k$ w.r.t. $F_I$.

Since $X_{k+1} = X_{k} \cup \{x_{k+1}\}$, the assertion is finally proved.$\square $

Remark 1

Since the point $x_{k + 1}$ induced by the step size produced by B-FALS is not dominated by any point in $X_{k + 1}$ w.r.t. $F_I$, we can easily conclude that the new point is also not dominated w.r.t. all the objectives.

Given Proposition 2, we can state the following corollary.

Corollary 1

Let Assumption 1 hold with $I \subseteq \{1,\ldots , m\}$, the set $X_0$ and the point $x_0$. Then, the sequence of sets $\{X_k\}$ and the sequence of points $\{x_k\}$ generated by FMOPG are such that for all $k=0,1,\ldots $, $x_{k}$ is feasible and not dominated by any point in $X_k$ w.r.t. $F_I$.

Proof

The assertion straightforwardly follows if the assumptions of Proposition 2 are satisfied at every iteration k of the algorithm.

When $k=0$, this is guaranteed by Assumption 1. The case of a generic iteration k simply follows by induction from Proposition 2 itself.$\square $

Before proceeding with the convergence analysis, we make a further reasonable assumption. This hypothesis is similar to Assumption 1 in [10]. However, in this context, Assumption 1 and bound constraints must be also taken into account.

Assumption 2

Assumption 1 holds and $x_0$ is also such that:

$x_0$ is not Pareto-stationary w.r.t. $F_I$;
the set $\mathcal {L}(x_0) = \bigcup _{j = 1}^{m} \{x \in \varOmega : f_j(x) \le f_j(x_0)\}$ is compact.

This assumption is stronger than the one required to prove convergence of the MOSD method (Lemma 2). However, as also observed in [10] for FALS (Algorithm 2), this is reasonable since the second stopping criterion of B-FALS is weaker than the one used in ALS (Algorithm 1).

Proposition 3

Let Assumption 2 hold with $I \subseteq \{1,\ldots , m\}$, the set $X_0$ and the point $x_0$. Let $\{x_k\}$ be the sequence of points generated by FMOPG. Then $\{x_k\}$ admits limit points and every limit point is Pareto-stationary considering the objectives $f_j$, with $j \in I$.

Proof

Firstly, we prove that the sequence $\{x_k\}$ admits limit points. Since $x_0\in X_k$ for all k, Corollary 1 guarantee that, for each k, $x_k \in \varOmega $ and there exists an index $j(x_k) \in I$ such that

$$\begin{aligned} f_{j(x_k)}(x_k) \le f_{j(x_k)}(x_0). \end{aligned}$$

So,

$$\begin{aligned} x_k \in \{x \in \varOmega : f_{j(x_k)}(x) \le f_{j(x_k)}(x_0)\} \end{aligned}$$

and, therefore,

$$\begin{aligned} x_k \in \mathcal {L}(x_0), \forall k. \end{aligned}$$

Assumption 2 assures that the sequence $\{x_k\}$ is bounded. Hence, this latter one admits limit points: we can consider a subsequence $K \subseteq \{1, 2,\ldots \}$ such that

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} x_k = \bar{x}. \end{aligned}$$

We recall that $\bar{x}$ is Pareto-stationary w.r.t. $F_I$ if and only if $\theta _\varOmega ^I(\bar{x}) = 0$. By contradiction, we assume that $\bar{x}$ is not Pareto-stationary w.r.t. $F_I$: there exists $\bar{\varepsilon } > 0$ such that

$$\begin{aligned} \theta _\varOmega ^I(x_k) \le -\bar{\varepsilon } < 0, \forall k \in K. \end{aligned}$$

(8)

Next, we want to prove the following statement:

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} \alpha _k\theta _\varOmega ^I(x_k) = 0. \end{aligned}$$

(9)

Again, by contradiction, we assume that the assertion is not true: there exists a subsequence $\bar{K} \subseteq K$ and $\bar{\eta } > 0$ such that

$$\begin{aligned} \alpha _k\theta _\varOmega ^I(x_k) \le -\bar{\eta } < 0, \forall k \in \bar{K}. \end{aligned}$$

(10)

Recalling Proposition 1 and Corollary 1, for all $k \in \bar{K}$, B-FALS returns in a finite number of iterations a step size $\alpha _k$ such that

$$\begin{aligned} x_{k + 1} = x_k+\alpha _kd_{\varOmega k}^I \in \varOmega \end{aligned}$$

(11)

and

$$\begin{aligned} F_I(y_k) + \textbf{1}\beta \alpha _k\theta _\varOmega ^I(x_k) \not < F_I(x_{k+1}), \end{aligned}$$

for all $y_k \in X_k$. By using Eq. (10), we obtain that, for all $k \in \bar{K}$ and for all $y_k \in X_k$,

$$\begin{aligned} F_I(y_k) - \textbf{1}\beta \bar{\eta } \not < F_I(x_{k+1}). \end{aligned}$$

(12)

Since $\beta > 0$ and $\bar{\eta } > 0$, we have that $- \beta \bar{\eta } < 0$.

Since $X_k = X_0 \cup \{x_1\} \cup \cdots \cup \{x_k\}$ and $x_0 \in X_0$, it simply follows that, for all $k \in \bar{K}$,

$$F_I(x_0) - \textbf{1}\beta \bar{\eta } \not < F_I(x_{k+1}).$$

Therefore, for all $k \in \bar{K}$, there exists $j_k \in I$ such that

$$\begin{aligned} f_{j_k}(x_0) > f_{j_k}(x_0) - \beta \bar{\eta } \ge f_{j_k}(x_{k+1}), \end{aligned}$$

and, then, considering also Eq. (11),

$$\begin{aligned} x_{k + 1} \in \mathcal {L}(x_0). \end{aligned}$$

(13)

Moreover, let us consider $k_1, k_2 \in \bar{K}$, with $k_1 < k_2$. By the instructions of the algorithm, we know that $x_{k_1+1}\in X_{k_2}$. Thus, from Eq. (12), we know that

$$F_I(x_{k_1+1}) - \textbf{1}\beta \bar{\eta } \not < F_I(x_{k_2+1})$$

Therefore, for any pair $k_1, k_2 \in \bar{K}$, with $k_1 < k_2$, there exists $j_{k_2} \in I$ such that

$$\begin{aligned} f_{j_{k_2}}(x_{k_1 + 1}) - \beta \bar{\eta } \ge f_{j_{k_2}}(x_{k_2+1}). \end{aligned}$$

(14)

Equations(13) and (14) imply that we have an infinite sequence $\{F_I(x_{k + 1})\}_{k \in \bar{K}}$, with $x_{k + 1} \in \mathcal {L}(x_0)$, where any pair of points is at a distance not smaller than $\beta \bar{\eta }$ from each other. Thus, the set

$$Z = \{z \in \mathbb {R}^m \mid z = F(x_{k + 1}), x_{k + 1} \in \mathcal {L}(x_0), k \in \bar{K}\}$$

is not compact. This last statement and the continuity of F contradict Assumption 2, since the image of a compact set under a continuous map should be compact. Thus, Equation (9) holds.

Recalling Eq. (8), from Equation (9) we obtain the following statement:

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} \alpha _k = 0. \end{aligned}$$

Given this limit, we can consider sufficiently large values for $k \in K$ such that

$$\begin{aligned} \alpha _k < \frac{\alpha _k}{\delta } \le 1. \end{aligned}$$

(15)

Since $\varOmega $ is convex and $d_{\varOmega k}^I$ is a feasible direction by construction, Eq. (15) implies that the point $x_k + (\alpha _k / \delta ) d_{\varOmega k}^I \in \varOmega $. Therefore, for sufficiently large values for $k \in K$, the stopping conditions of B-FALS imply that there exists a point $y_k \in X_k$ such that

$$\begin{aligned} F_I(y_k) + \textbf{1}\beta \frac{\alpha _k}{\delta }\theta _\varOmega ^I(x_k) < F_I\left( x_k + \frac{\alpha _k}{\delta }d_{\varOmega k}^I\right) . \end{aligned}$$

(16)

Considering Corollary 1 and Eq. (16) respectively, we have that an index $j(x_k) \in I$ exists such that

$$\begin{aligned} f_{j(x_k)}(x_k) + \beta \frac{\alpha _k}{\delta }\theta _\varOmega ^I(x_k) \le f_{j(x_k)}(y_k) + \beta \frac{\alpha _k}{\delta }\theta _\varOmega ^I(x_k) \end{aligned}$$

and

$$\begin{aligned} f_{j(x_k)}(y_k) + \beta \frac{\alpha _k}{\delta }\theta _\varOmega ^I(x_k) < f_{j(x_k)}\left( x_k + \frac{\alpha _k}{\delta }d_{\varOmega k}^I\right) . \end{aligned}$$

Since the set I is finite, we can consider a subsequence $\bar{K} \subseteq K$ such that, for sufficiently large values for $k \in \bar{K}$, $j(x_k) = \hat{j}$ and, combining the two above inequalities,

$$\begin{aligned} f_{\hat{j}}\left( x_k + \frac{\alpha _k}{\delta }d_{\varOmega k}^I\right) - f_{\hat{j}}(x_k) > \beta \frac{\alpha _k}{\delta }\theta _\varOmega ^I(x_k). \end{aligned}$$

Using the Mean-value Theorem, we have that

$$\begin{aligned} f_{\hat{j}}\left( x_k + \frac{\alpha _k}{\delta }d_{\varOmega k}^I\right) - f_{\hat{j}}(x_k) = \frac{\alpha _k}{\delta }\nabla f_{\hat{j}}(\xi _k)^Td_{\varOmega k}^I, \end{aligned}$$

with

$$\begin{aligned} \xi _k = x_k + t_k \frac{\alpha _k}{\delta }d_{\varOmega k}^I, t_k \in (0, 1). \end{aligned}$$

Then, we can write

$$\begin{aligned} \nabla f_{\hat{j}}(\xi _k)^Td_{\varOmega k}^I > \beta \theta _\varOmega ^I(x_k), \end{aligned}$$

from which we can state that

$$\begin{aligned} \nabla f_{\hat{j}}(x_k)^Td_{\varOmega k}^I + \left[ \nabla f_{\hat{j}}(\xi _k) - \nabla f_{\hat{j}}(x_k)\right] ^Td_{\varOmega k}^I > \beta \theta _\varOmega ^I(x_k). \end{aligned}$$

Since $\hat{j} \in I$, we have that

$$\begin{aligned} \theta _\varOmega ^I(x_k) = \max _{j \in I}\nabla f_j(x_k)^Td_{\varOmega k}^I \ge \nabla f_{\hat{j}}(x_k)^Td_{\varOmega k}^I \end{aligned}$$

and

$$\begin{aligned} (1 - \beta )\theta _\varOmega ^I(x_k) + \left[ \nabla f_{\hat{j}}(\xi _k) - \nabla f_{\hat{j}}(x_k)\right] ^Td_{\varOmega k}^I > 0. \end{aligned}$$

Using Eq. (8), we obtain

$$\begin{aligned} -(1 - \beta )\bar{\varepsilon } + \left[ \nabla f_{\hat{j}}(\xi _k) - \nabla f_{\hat{j}}(x_k)\right] ^Td_{\varOmega k}^I > 0. \end{aligned}$$

By taking the limit for $k \rightarrow \infty , k \in \bar{K}$, recalling the continuity of $J_F$, the boundedness of $d_{\varOmega k}^I$ and that $\alpha _k \rightarrow 0$, we get that

$$\begin{aligned} -(1 - \beta )\bar{\varepsilon } > 0. \end{aligned}$$

Since $1 - \beta > 0$ and $\bar{\varepsilon } > 0$, we get the contradiction. So, we prove that the limit point $\bar{x}$ of the sequence $\{x_k\}$ is Pareto-stationary w.r.t. $F_I$.$\square $

Finally, we prove that, when a stopping criterion based on the $\varepsilon $-Pareto-stationarity is considered, FMOPG is well defined, i.e., it terminates in a finite number of iterations.

Proposition 4

Let Assumption 2 hold with $I \subseteq \{1,\ldots , m\}$, the set $X_0$ and the point $x_0$. Let $\varepsilon > 0$. Then, the FMOPG algorithm finds in a finite number of steps a point $x_k$ which is $\varepsilon $-Pareto-stationary w.r.t. $F_I$.

Proof

We assume, by contradiction, that FMOPG produces an infinite sequence of points $\{x_k\}$ such that, for all k, $x_k$ is not $\varepsilon $-Pareto-stationary w.r.t. $F_I$. Since Assumption 2 holds, Proposition 3 ensures that there exists a subsequence $K \subseteq \{1, 2,\ldots \}$ such that

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} x_k = \bar{x} \end{aligned}$$

and $\bar{x}$ is Pareto-stationary w.r.t. $F_I$, i.e., recalling Definition 3, for all $z \in \varOmega $ such that $\left\| z - \bar{x} \right\| \le 1$ we have that

$$\begin{aligned} \max _{j \in I}\nabla f_j(\bar{x})^T(z - \bar{x})\ge 0. \end{aligned}$$

Given the continuity of the max operator and $J_F$, we can state that

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} \max _{j \in I} \nabla f_j(x_k)^T(z - x_k) \ge 0 > -\varepsilon . \end{aligned}$$

This last statement implies that, for sufficiently large values for $k \in K$, for all $z \in \varOmega $ such that $\left\| z - x_k \right\| \le 1$, the following equation has to hold:

$$\begin{aligned} \max _{j \in I} \nabla f_j(x_k)^T(z - x_k) > -\varepsilon , \end{aligned}$$

i.e. $x_k$ is $\varepsilon $-Pareto-stationary w.r.t. $F_I$. Therefore, we get the contradiction and the assertion is proved.$\square $

4 Computational experiments

In this section, we provide the results of thorough computational experiments, focusing on the comparison of NSMA with the main state-of-the-art methods in diverse settings. The code of all the algorithms was written in Python3^{Footnote 1}. In addition, all the tests were run on a computer with the following characteristics: Ubuntu 20.04, Intel Xeon Processor E5-2430 v2 6 cores 2.50 GHz, 16 GB RAM. We used the Gurobi Optimizer (Version 9) to solve instances of Problem (6).

4.1 Settings

In this section, we report detailed information on the settings used for all the considered algorithms in our experiments, the metrics and the problems used to carry out the comparison.

4.1.1 Metrics

In this section, we provide a little description of the metrics and tools used to compare the algorithms.

The first three metrics are the ones introduced in [12]: purity, $\varGamma $-spread and $\varDelta $-spread. These metrics are widely used to evaluate the performance of multi-objective optimization algorithms.

We recall that the purity metric measures the quality of the generated front, i.e., how effective a solver is at obtaining non-dominated points w.r.t. its competitors. In detail, the purity value indicates the ratio of the number of non-dominated points that a solver obtained over the number of the points produced by that solver. Clearly, a higher value is related to a better performance. In order to calculate the purity metric, we need a reference front to establish whether a point is dominated or not. In our experiments, we considered as the reference front the one obtained by combining the fronts retrieved by all the considered algorithms and by discarding the dominated points.

The spread metrics are equally essential, since they measure the uniformity of the generated fronts in the objectives space. The $\varGamma $-spread is defined as the maximum $\ell _\infty $ distance in the objectives space between adjacent points of the Pareto front, while the $\varDelta $-spread basically measures the standard deviation of the $\ell _\infty $ distance between adjacent Pareto front points. As opposed to the purity, low values for the spread metrics are associated with good performance.

In addition to the previous metrics, we used the ND-points metric, introduced in [9]. This score substantially indicates the number of non-dominated points obtained by a solver w.r.t. the reference front. We consider this metric as important as the purity one: in particular, we think that these two metrics should be considered complementary.

Lastly, we employed the performance profiles introduced in [14] to carry out the comparison. Performance profiles are a useful tool to appreciate the relative performance and robustness of the considered algorithms. The performance profile of a solver w.r.t. a certain metric is the (cumulative) distribution function of the ratio of the score obtained by a solver over the best score among those obtained by all the considered solvers. In other words, it is the probability that the score achieved by a solver in a problem is within a factor $\tau \in \mathbb {R}$ of the best value obtained by any of the solvers in that problem. For a more technical explanation, we refer the reader to [14]. Note that performance profiles w.r.t. purity and ND-points were produced based on the inverse of the obtained values, since the metrics have increasing values for better solutions.

4.1.2 Algorithms and hyper-parameters

The first two algorithms we chose for the comparisons are, naturally, the NSGA-II [13] and the FPGA [10] procedures, described in Sect. 2.2 and Appendix A, respectively. We consider these methods as representatives for EAs and descent methods and, thus, NSMA most direct competitors. The parameters values for both algorithms were chosen according to the reference papers. Like NSMA, in NSGA-II the number N of solutions in the population was fixed to 100.

Then, the values for the parameters of NSMA were chosen based on some preliminary experiments on a subset of the tested problems, which we do not report here for the sake of brevity. The values are:

$N = 100$;
$s_h = 10$;
$q = 0.9$;
$n_{opt} = 5$;
in B-FALS $\alpha _0 = 1$, $\beta = 10^{-4}$, $\delta = 0.5$.

We also consider in the experiments the DMS algorithm [12], a multi-objective derivative-free method, inspired by the search/poll paradigm of direct-search methodologies of directional type. DMS maintains a list of non-dominated points, from which the new iterates or poll centers are chosen. The parameters for this method were set according to the reference paper and the code available online (http://www.mat.uc.pt/dms).

In most of the computational experiments, for each algorithm and problem we ran the test for up to 2 minutes. A stopping criterion based on a time limit is the fairest way to compare such structurally different algorithms. Obviously, we also took into account other specific stopping criteria indicating that a certain algorithm cannot improve the solutions anymore.

NSMA and NSGA-II are non-deterministic algorithm. Therefore, we decided to run them 5 times on every problem, with different seeds for the pseudo-random number generator. Every execution was characterized by the same time limit (2 minutes). The five generated fronts were compared based on the purity metric and only the best one was chosen as the output of NSMA/NSGA-II. In this context, the reference front was the combination of the fronts of the 5 executions. Executing 5 runs lets NSMA/NSGA-II reduce its sensibility to the seed used for its random operations. On the other side, FPGA and DMS are deterministic and, then, they were executed once.

4.1.3 Problems

The problems constituting the benchmark of the computational experiments are listed in Table 1. In this benchmark, we considered problems whose objective functions are at least continuously differentiable almost everywhere. If a problem is characterized by singularities, we counted these latter ones as Pareto-stationary points. All the constraints are defined by finite lower and upper bounds.

Table 1 Problems used in the computational experiments

Full size table

The set is mainly composed by the CEC09 problems [56], the ZDT problems [57] and the MOP problems [32]. In particular, some of the CEC09 and the ZDT problems have particularly difficult objective functions. Hence, these problems are particularly interesting for the analysis of the behavior of the algorithms with hard tasks.

We also defined a new test problem with convex objective functions: we refer to it as the MAN problem. Its formal definition is the following:

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbb {R}^n}\quad&\begin{array}{l} f_1(x) = \sum _{i = 1}^{n} (x_i - i)^2 / n^2\\ \\ f_2(x) = \sum _{i=1}^{n} e^{-x_i} + x_i \end{array}\\\\\text {s.t.}\quad&x \in [-10^4, 10^4]. \end{aligned} \end{aligned}$$

Inspired by Custódio et al. [12], for each problem the initial points were uniformly selected from the hyper-diagonal defined by the bound constraints. Furthermore, the number of initial points is equal to the dimension n of the problem. Since in the MOP_1 problem $n=1$, only in this case we started the tests from one feasible point, namely, $x=0$.

4.2 Experimental comparisons between NSGA-II and FPGA

Before turning to the evaluation of the NSMA, we carry out a preliminary study.

Evolutionary algorithms and descent methods have their own drawbacks. In particular, EAs do not have theoretical convergence properties. In addition, they can be very expensive in particular settings. On the other side, descent algorithms suffer on highly non-convex problems: in these cases, they often produce sub-optimal solutions, especially when the starting points are not chosen carefully.

In this section, we want to address two topics:

the impact of convexity of the objective functions on the performance of these algorithms;
the behavior of the methods as the problem dimension n increases.

For the comparisons of this section, we only considered the NSGA-II and FPGA algorithms, which we respectively pick as representatives for the two classes of methods.

As benchmark, we picked four problems that are scalable w.r.t. the problem dimension n and have the following features:

the MAN problem and the ZDT_1 problem have convex objective functions;
the CEC09_4 problem and the ZDT_3 problem have nonconvex objective functions.

For these comparisons, each problem was tested for values of $n\in \{5, 10,20, 30, 40, 50, 100, 200\}$.

We show the performance profiles for the two algorithms on the convex problems in Fig. 1 and on the nonconvex problems in Fig. 2.

We can observe that in the former case the FPGA turned out to be better than NSGA-II in terms of purity. This result reasonably comes from the fact that, in problems characterized by convex objective functions, the use of first-order information and common descent directions lets FPGA find better solutions than NSGA-II for equal computational budget. On the contrary, the FPGA algorithm was outperformed by NSGA-II in terms of $\varGamma $-spread and $\varDelta $-spread. In this perspective, the crossover and mutation operations of NSGA-II allow to consistently obtain spread Pareto front approximations, while the constrained steepest partial descent directions and the B-FALS employed by the FPGA are apparently not as effective.

As for the nonconvex case, we can observe from the purity profile that now the FPGA obtained many points that are dominated by those produced by NSGA-II. The results with the spread metrics are instead analogous to the convex case, with NSGA-II outperforming FPGA. However, the performance gap in terms of $\varGamma $-spread is even larger, while it is less marked for the $\varDelta $-spread.

Table 2 Metrics values obtained by FPGA and NSGA-II in the MAN problem with $n = 5, 10, 20, 30, 40, 50, 100, 200$

Full size table

Table 3 Metrics values obtained by FPGA and NSGA-II in the CEC09_4 problem with $n = 5, 10, 20, 30, 40, 50, 100, 200$

Full size table

In order to assess the performance of the algorithms as the problem dimension n increases, in Tables 2 and 3 we show in detail the metrics values achieved by the two methods on a convex (MAN) and a nonconvex (CEC09_4) problems. Again, the table shows the overall strength of NSGA-II w.r.t. the $\varDelta $-spread metric.

As for the $\varGamma $-spread, in the MAN problem we can observe the great results achieved by FPGA: it outperformed NSGA-II considering values of n equal to or greater than 20. In these cases, the constrained steepest partial descent directions and the B-FALS algorithm turned out to be helpful in exploring the extreme regions of the objectives space and, then, in finding a spread approximation of the Pareto front. In the CEC09_4 problem, it is the opposite: the genetic algorithm managed to obtain the best $\varGamma $-spread values.

The purity values indicate another relevant feature of the two algorithms. In the nonconvex case, FPGA turned out not to be capable of obtaining better points than NSGA-II for low values of n. However, as the value of n increased, the situation gradually changed and FPGA finally obtained better purity values w.r.t. its competitor on the largest problems. These results remark one of the drawbacks of the EAs, i.e., the limited scalability. In this case, common descent directions can be very helpful for cheaply improving the quality of the solutions.

In conclusion, both algorithms have features that make them very effective in specific situations: FPGA was better in convex and/or high dimensional problems, while NSGA-II was more effective in non-convex low dimensional ones. Furthermore, the genetic features of NSGA-II let this latter one perform better in finding spread and uniform Pareto fronts most of the times: this is also reflected in the spread metrics values obtained by NSGA-II. All these facts remark once again how much trying to join these benefits in one algorithm might be appealing.

4.3 Preliminary comparisons between NSMA and the state-of-the-art algorithms

In this section, we provide the results on two problems along with some first comments about the behavior of the four algorithms. We analyzed the CEC09_3 problem with $n = 10$ and the ZDT_3 problem with $n = 20$. The first one has particularly difficult objective functions, while the second one is also characterized by a composite function and a disconnected front which is not convex everywhere. We consider these problems suitable to start an analysis about the performance of the considered algorithms.

From the results on the CEC09_3 problem, shown in Fig. 3, we immediately observe the effectiveness of our approach. Indeed, NSMA outperformed the other algorithms in terms of ND-points, purity and $\varGamma $-spread.

NSGA-II and FPGA turned out to be the second and the third best algorithms, respectively, with FPGA outperforming the genetic method only in terms of $\varDelta $-spread. Note that FPGA achieved a high value for the $\varGamma $-spread metric since it produced a suboptimal point that is dominated and far from the reference front. This point is not shown in the figure for graphical reasons.

NSGA-II and FPGA seem not to be capable of spreading the search in the objectives space. Indeed, they retrieved many points but most of them are concentrated in a small portion of the objectives space. In this regard, NSMA was better: this result arguably comes from the use of constrained steepest partial descent directions with points characterized by a high crowding distance. Indeed, using descent steps at such points lets NSMA obtain a more spread and uniform Pareto front approximation w.r.t. its competitors.

NSMA and NSGA-II turned out to be the best algorithms on the ZDT_3 problem, as we can observe in Fig. 4. Furthermore, they exhibited very similar performance. It is known that NSGA-II is one of the most effective algorithms to use with the ZDT problem class. Indeed, its genetic features allow it to escape from non-optimal Pareto-stationary solutions and to obtain good results with the most complex functions. NSMA seems to use these features as efficiently as NSGA-II. We also observe a little performance enhancement in terms of ND-points and purity.

The lack of these characteristics did not allow FPGA to have the same performance. Indeed, although this algorithm obtained a good value for the purity metric, it produced few points and it was not capable to obtain a spread and uniform Pareto front. DMS seems not to have the same issues, having been able to properly identify two blocks of the disconnected front. However, it performed worse than FPGA in terms of purity.

Finally, we note that NSMA was better than all its competitors in terms of ND-points. It was not obvious a priori to obtain such results, since, as opposed to FPGA and DMS, NSMA considers a fixed number of solutions in the population.

4.4 Performance analysis in variable settings

In this section, we want to assess the robustness of the proposed algorithm in the specific settings where, as highlighted in Sect. 4.2, genetic and descent methods exhibit particular struggles. In detail, we compare the performance of the four algorithms (NSMA, FPGA, NSGA-II and DMS) in two peculiar problems already addressed in Sect. 4.2: MAN (F convex) and CEC09_4 (F nonconvex). Moreover, we consider the following problem dimensionalities: $n = 5, 20, 50, 100$.

Table 4 Metrics values achieved by the four algorithms (NSMA, FPGA, NSGA-II and DMS) on the convex MAN problem for $n = 5, 20, 50, 100$

Full size table

The results for the MAN problem are shown in Fig. 5 and Table 4. For $n = 5$, DMS turned out to be the best algorithm in all the metrics except for the $\varDelta $-spread. Only the FPGA algorithm obtained a similar purity. However, observing the plot, the points produced by the NSMA algorithm seem to be near to those obtained by DMS and FPGA. We hence deduce that the latter algorithms produced only slightly better points.

Furthermore, in this problem NSMA outperformed the competitors in terms of $\varDelta $-spread. Indeed, our method managed to achieve an uniform Pareto front, as opposed to FPGA that produced most of the points in restricted areas of the objectives space.

As the value of n increases, the DMS performance gets worse and NSMA outperforms it w.r.t. all the metrics. In particular, in these cases our method turned out to be the best in terms of the spread metrics. For large values of n, DMS produced only a single point that is also dominated (it is not observable in the figure since it is too far from the reference front). In fact, the performance drop of DMS as the size of problems grows is not unexpected: derivative-free algorithms based on searches along coordinate directions are well known to poorly scale in general. The $\varDelta $-spread metric is not available for DMS in these cases, since it requires at least two points to be returned.

The performance of NSGA-II is rather poor, regardless the value of n. Arguably, this result can be attributed to the aforementioned NSGA-II performance slowdown occurring on problems characterized by a particularly large feasible sets (Sect. 3.1.1). Furthermore, as also commented in Sect. 4.2, in the MAN problem NSGA-II struggles to explore the extreme regions of the objectives space. In this context, NSMA particularly exploited the surrogate bounds, the constrained steepest descent directions and the optimization of the points with high crowding distance. The constrained steepest descent directions also allowed FPGA to be the best algorithm in terms of purity overall. However, this method poorly performed regarding the spread metrics. Finally, for great values of n, our approach and FPGA are the only algorithms whose purity values are not equal to 0. Only NSMA managed to obtain points near to the ones of FPGA.

Table 5 Metrics values achieved by the four algorithms (NSMA, FPGA, NSGA-II and DMS) on the nonconvex CEC09_4 problem for $n = 5, 20, 50, 100$

Full size table

Regarding the CEC09_4 problem, whose results are reported in Figure 6 and Table 5, NSMA was the algorithm with the best overall performance: it generally obtained better metrics values than its most important competitors (FPGA and NSGA-II). Here, the combination of genetic operations and constrained steepest descent directions was greatly helpful to obtain remarkable results. Indeed, the independent use of only one of these two approaches did not lead to the same performance. In this problem, the DMS algorithm performed poorly regardless the value for n.

In conclusion, NSMA can be considered a viable option with convex problems, both in the low and the high dimensional cases. At the same time, our approach did not suffer with non-convex problems, as opposed to FPGA. On the contrary, it also outperformed NSGA-II, which is known to be a particularly suitable algorithm to use in these cases but struggles as the dimensionality of the problem grows.

4.5 Overall comparison

In this last section of computational experiments, we provide the performance profiles for the four considered algorithms on the entire benchmark of problems, listed in Table 1. The profiles are shown in Fig. 7.

The performance profiles remark once again the benefits of using our proposed approach. Regarding the ND-points, NSMA proved to be the most robust algorithm. This result was not obvious: we remind that our method, as opposed to FPGA and DMS, considers a fixed number of solutions in the population.

Another interesting result is related to the purity metric: NSMA is again the clear winner. In problems with complicated objective functions, local optimization of points in the NSMA mechanisms could result in a waste of computational time. From the results, however, we deduce that the converse is true: the combined use of constrained steepest descent directions and genetic operations allowed NSMA to achieve the best performance.

The proposed method also outperformed the other ones in terms of $\varGamma $-spread, while its performance is very similar to the one of NSGA-II in terms of $\varDelta $-spread. We can conclude that our approach is able to effectively obtain spread and uniform Pareto front approximations. At the same time, we deduce that the same cannot be said for the FPGA, which turned out to be the worst method w.r.t. the spread metrics. However, the descent-based algorithm was the second best in terms of purity, outperforming NSGA-II. In general, DMS was not effective overall on the considered benchmark.

Lastly, we tested the four algorithms considering a time limit of 30 s for the experiments: the results can be seen in Fig. 8. Our aim is to observe the effectiveness of the methods at the first iterations.

Considering the ND-points and the purity metrics, we observe that the differences between our approach and the other algorithms are now even clearer, while the situation is not changed in terms of $\varGamma $-spread. Regarding the $\varDelta $-spread metric, NSGA-II was more effective than the other algorithms. However, our method was still competitive, as in terms of this metric it was the second most robust algorithm and it outperformed FPGA and DMS. We can conclude that NSMA turned out to be also effective considering a smaller time limit: from the very first iterations, our approach was capable to obtain good, wide and uniform Pareto front approximations.

5 Conclusions

In this paper, we considered smooth multi-objective optimization problems subject to bound constraints. After a review of the existing literature, we listed and commented the main state-of-the-art approaches designed to approximate the Pareto front of such problems, along with their benefits and drawbacks. In particular, we focused on NSGA-II [13], which is the most popular genetic algorithm, and on FPGA, which is a variant of the gradient-based descent method introduced in [10], capable of handling bound constraints. A detailed definition of FPGA, along with convergence properties, is provided in Appendix A. In a preliminary study, we compared these two algorithms trying to emphasize their strengths and weaknesses.

We then focused on the design of a memetic algorithm, whose aim is to combine the good features of both the aforementioned algorithms. We call this new method Non-dominated Sorting Memetic Algorithm (NSMA). In this procedure, we exploit the genetic operations of NSGA-II and the tools typical of gradient-based descent methods, such as the steepest descent directions and line searches. In particular, we employ a new descent method, called Front Multi-Objective Projected Gradient (FMOPG), which is a front-based variant of the original MOPG firstly introduced in [16]. For FMOPG, we proved properties of convergence to Pareto stationarity for the sequence of produced points.

Moreover, results of thorough computational experiments in which we compared our method with main state-of-the-art algorithms, including FPGA and NSGA-II, are provided. These results show that NSMA can consistently outperform its competitors in terms of popular metrics for multi-objective optimization. Our approach turned out to be highly effective in any considered setting.

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

The implementation code of the NSMA algorithm can be found at https://github.com/pierlumanzu/nsma [42].

References

Bandyopadhyay, S., Saha, S., Maulik, U., Deb, K.: A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Trans. Evol. Comput. 12(3), 269–283 (2008)
Article Google Scholar
Bhuvana, J., Aravindan, C.: Memetic algorithm with preferential local search using adaptive weights for multi-objective optimization problems. Soft Comput. 20 (2015)
Brown, M., Smith, R.E.: Directed multi-objective optimization. Int. J. Comput. Syst. Sign. 6(1), 3–17 (2005)
Google Scholar
Cabassi, F., Locatelli, M.: Computational investigation of simple memetic approaches for continuous global optimization. Comput. Oper. Res. 72, 50–70 (2016)
Article MathSciNet MATH Google Scholar
Campana, E.F., Diez, M., Liuzzi, G., Lucidi, S., Pellegrini, R., Piccialli, V., Rinaldi, F., Serani, A.: A multi-objective direct algorithm for ship hull optimization. Comput. Optim. Appl. 71(1), 53–72 (2018)
Article MathSciNet MATH Google Scholar
Carrizo, G.A., Lotito, P.A., Maciel, M.C.: Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem. Math. Program. 159(1–2), 339–369 (2016)
Article MathSciNet MATH Google Scholar
Carrizosa, E., Frenk, J.B.G.: Dominating sets for convex functions with some applications. J. Optim. Theory Appl. 96(2), 281–295 (1998)
Article MathSciNet MATH Google Scholar
Cocchi, G., Lapucci, M.: An augmented Lagrangian algorithm for multi-objective optimization. Comput. Optim. Appl. 77(1), 29–56 (2020)
Article MathSciNet MATH Google Scholar
Cocchi, G., Lapucci, M., Mansueto, P.: Pareto front approximation through a multi-objective augmented Lagrangian method. EURO J. Comput. Optim. 100008 (2021)
Cocchi, G., Liuzzi, G., Lucidi, S., Sciandrone, M.: On the convergence of steepest descent methods for multiobjective optimization. Comput. Optim. Appl. 1–27 (2020)
Cocchi, G., Liuzzi, G., Papini, A., Sciandrone, M.: An implicit filtering algorithm for derivative-free multiobjective optimization with box constraints. Comput. Optim. Appl. 69(2), 267–296 (2018)
Article MathSciNet MATH Google Scholar
Custódio, A.L., Madeira, J.A., Vaz, A.I.F., Vicente, L.N.: Direct multisearch for multiobjective optimization. SIAM J. Optim. 21(3), 1109–1140 (2011)
Article MathSciNet MATH Google Scholar
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Article MathSciNet MATH Google Scholar
Drugan, M., Thierens, D.: Stochastic pareto local search: Pareto neighbourhood exploration and perturbation strategies. J. Heuristics 18 (2012)
Drummond, L.G., Iusem, A.N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28(1), 5–29 (2004)
Article MathSciNet MATH Google Scholar
Drummond, L.G., Maculan, N., Svaiter, B.F.: On the choice of parameters for the weighting method in vector optimization. Math. Program. 111(1–2), 201–216 (2008)
MathSciNet MATH Google Scholar
Eichfelder, G.: An adaptive scalarization method in multiobjective optimization. SIAM J. Optim. 19(4), 1694–1718 (2009)
Article MathSciNet MATH Google Scholar
Filatovas, E., Lančinskas, A., Kurasova, O., Žilinskas, J.: A preference-based multi-objective evolutionary algorithm r-nsga-ii with stochastic local search. CEJOR 25(4), 859–878 (2017)
Article MathSciNet MATH Google Scholar
Fliege, J., Drummond, L.G., Svaiter, B.F.: Newton’s method for multiobjective optimization. SIAM J. Optim. 20(2), 602–626 (2009)
Article MathSciNet MATH Google Scholar
Fliege, J., Svaiter, B.F.: Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 51(3), 479–494 (2000)
Article MathSciNet MATH Google Scholar
Fliege, J., Vaz, A.I.F.: A method for constrained multiobjective optimization based on SQP techniques. SIAM J. Optim. 26(4), 2091–2119 (2016)
Article MathSciNet MATH Google Scholar
Fukuda, E., Drummond, L.: A survey on multiobjective descent methods. Pesquisa Operacional 34, 585–620 (2014)
Article Google Scholar
Fukuda, E.H., Drummond, L.G.: On the convergence of the projected gradient method for vector optimization. Optimization 60(8–9), 1009–1021 (2011)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Drummond, L.G.: Inexact projected gradient method for vector optimization. Comput. Optim. Appl. 54(3), 473–493 (2013)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Drummond, L.G., Raupp, F.M.: A barrier-type method for multiobjective optimization. Optimization, 1–17 (2019)
Gonçalves, M.L.N., Lima, F.S., Prudente, L.F.: Globally convergent newton-type methods for multiobjective optimization. Comput. Optim. Appl. 83(2), 403–434 (2022)
Article MathSciNet MATH Google Scholar
Gravel, M., Martel, J.M., Nadeau, R., Price, W., Tremblay, R.: A multicriterion view of optimal resource allocation in job-shop production. Eur. J. Oper. Res. 61(1–2), 230–244 (1992)
Article Google Scholar
Gribel, D., Vidal, T.: Hg-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering. Pattern Recogn. 88, 569–583 (2019)
Article Google Scholar
Grosso, A., Locatelli, M., Schoen, F.: A population-based approach for hard global optimization problems based on dissimilarity measures. Math. Program. 110(2), 373–404 (2007)
Article MathSciNet MATH Google Scholar
Hu, X., Huang, Z., Wang, Z.: Hybridization of the multi-objective evolutionary algorithms and the gradient-based algorithms. In: The 2003 Congress on Evolutionary Computation, 2003. CEC’03., vol. 2, pp. 870–877. IEEE (2003)
Huband, S., Hingston, P., Barone, L., While, L.: A review of multiobjective test problems and a scalable test problem toolkit. IEEE Trans. Evol. Comput. 10(5), 477–506 (2006)
Article MATH Google Scholar
Kim, H., Liou, M.S.: Adaptive directional local search strategy for hybrid evolutionary multiobjective optimization. Appl. Soft Comput. J. 19, 290–311 (2014)
Article Google Scholar
Lara, A., Sanchez, G., Coello, C.A.C., Schutze, O.: Hcs: A new local search strategy for memetic multiobjective evolutionary algorithms. IEEE Trans. Evol. Comput. 14(1), 112–132 (2010)
Article Google Scholar
Laumanns, M., Thiele, L., Deb, K., Zitzler, E.: Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002)
Article Google Scholar
Liu, D., Tan, K.C., Goh, C.K., Ho, W.K.: A multiobjective memetic algorithm based on particle swarm optimization. IEEE Trans. Syst. Man Cyberne. B (Cybern.) 37(1), 42–50 (2007)
Article Google Scholar
Liu, T., Gao, X., Yuan, Q.: An improved gradient-based nsga-ii algorithm by a new chaotic map model. Soft. Comput. 21(23), 7235–7249 (2017)
Article Google Scholar
Liuzzi, G., Lucidi, S., Rinaldi, F.: A derivative-free approach to constrained multiobjective nonsmooth optimization. SIAM J. Optim. 26(4), 2744–2774 (2016)
Article MathSciNet MATH Google Scholar
Locatelli, M., Maischberger, M., Schoen, F.: Differential evolution methods based on local searches. Comput. Oper. Res. 43, 169–180 (2014)
Article MATH Google Scholar
Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications. SIAM (2013)
Mandal, S.K., Pacciarelli, D., LØkketangen, A., Hasle, G.: A memetic NSGA-II for the bi-objective mixed capacitated general routing problem. J. Heuristics 21(3), 359–390 (2015). Number: 3
Mansueto, P.: NSMA: A memetic procedure for global multi-objective optimization (2022). https://doi.org/10.5281/zenodo.7299857
Mansueto, P., Schoen, F.: Memetic differential evolution methods for clustering problems. Pattern Recogn. 114, 107849 (2021)
Article Google Scholar
Mostaghim, S., Branke, J., Schmeck, H.: Multi-objective particle swarm optimization on computer grids. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 869–875. ACM (2007)
Palermo, G., Silvano, C., Valsecchi, S., Zaccaria, V.: A system-level methodology for fast multi-objective design space exploration. In: Proceedings of the 13th ACM Great Lakes Symposium on VLSI, pp. 92–95. ACM (2003)
Pascoletti, A., Serafini, P.: Scalarizing vector optimization problems. J. Optim. Theory Appl. 42(4), 499–524 (1984)
Article MathSciNet MATH Google Scholar
Pellegrini, R., Campana, E., Diez, M., Serani, A., Rinaldi, F., Fasano, G., Iemma, U., Liuzzi, G., Lucidi, S., Stern, F.: Application of derivative-free multi-objective algorithms to reliability-based robust design optimization of a high-speed catamaran in real ocean environment1. Engineering Optimization IV-Rodrigues et al.(Eds.) p. 15 (2014)
Shukla, P.K.: On gradient based local search methods in unconstrained evolutionary multi-objective optimization. In: International Conference on Evolutionary Multi-Criterion Optimization, pp. 96–110. Springer (2007)
Sindhya, K., Miettinen, K., Deb, K.: A hybrid framework for evolutionary multi-objective optimization. IEEE Trans. Evol. Comput. 17(4), 495–511 (2013)
Article MATH Google Scholar
Sun, Y., Ng, D.W.K., Zhu, J., Schober, R.: Multi-objective optimization for robust power efficient and secure full-duplex wireless communication systems. IEEE Trans. Wireless Commun. 15(8), 5511–5526 (2016)
Article Google Scholar
Tavana, M.: A subjective assessment of alternative mission architectures for the human exploration of Mars at NASA using multicriteria decision making. Comput. Oper. Res. 31(7), 1147–1164 (2004)
Article MATH Google Scholar
Tiwari, S., Fadel, G., Koch, P., Deb, K.: Performance assessment of the hybrid archive-based micro genetic algorithm (amga) on the cec09 test problems. In: 2009 IEEE Congress on Evolutionary Computation, pp. 1935–1942 (2009)
Villalobos-Cid, M., Dorn, M., Ligabue-Braun, R., Inostroza-Ponta, M.: A memetic algorithm based on an nsga-ii scheme for phylogenetic tree inference. IEEE Trans. Evol. Comput. 23(5), 776–787 (2018)
Article Google Scholar
Wang, X., Hirsch, C., Kang, S., Lacor, C.: Multi-objective optimization of turbomachinery using improved nsga-ii and approximation model. Comput. Methods Appl. Mech. Eng. 200(9–12), 883–895 (2011)
Article MathSciNet MATH Google Scholar
White, D.: Epsilon-dominating solutions in mean-variance portfolio analysis. Eur. J. Oper. Res. 105(3), 457–466 (1998)
Article MATH Google Scholar
Zhang, Q., Zhou, A., Zhao, S., Suganthan, P., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the cec 2009 special session and competition. Mech. Eng. (2008)
Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)
Article Google Scholar

Download references

Acknowledgements

We are grateful to the editor and the anonymous referees for their precious comments that helped us to improve the quality of this manuscript. We would also like to thank Dr. Guido Cocchi, Dr. Giampaolo Liuzzi and Prof. Stefano Lucidi for the useful discussions.

Funding

Open access funding provided by Università degli Studi di Firenze within the CRUI-CARE Agreement. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Global Optimization Laboratory (GOL), Department of Information Engineering, University of Florence, Via di Santa Marta, 3, 50139, Florence, Italy
Matteo Lapucci, Pierluigi Mansueto & Fabio Schoen

Authors

Matteo Lapucci
View author publications
You can also search for this author in PubMed Google Scholar
Pierluigi Mansueto
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Schoen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierluigi Mansueto.

Ethics declarations

Conflict of interests

The authors have no relevant financial or non-financial interests to disclose.

Code availability statement

The full code of the experiments presented in this paper was made available for review. The implementation code of the NSMA algorithm proposed in this paper can be found at https://github.com/pierlumanzu/nsma (DOI: 10.5281/zenodo.7299857).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: The Front Projected Gradient Algorithm

In this appendix, we describe the adaptation of the FSDA algorithm [10] to box-constrained optimization problems, which we call Front Projected Gradient Algorithm (FPGA). We initially report the scheme of the new adaptation. Then, in the remainder of the appendix, we provide a rigorous theoretical analysis.

1.1 A.1 Algorithmic scheme

We report the scheme of FPGA in Algorithm A1.

With respect to the FSDA algorithm, there are two major differences.

For the descent directions, we solve instances of Problem (6), while in the original implementation Problem (3) is considered.
We employ B-FALS (Algorithm 4) instead of FALS (Algorithm 2) in order to handle bound constraints.

In the remainder of the appendix, we provide some FPGA properties.

1.2 A.2 Algorithm analysis

In this subsection, we provide a formal analysis of the FPGA algorithm from a theoretical perspective.

We first prove the feasibility of the points produced by the algorithm.

Proposition A1

Let $\{X_k\}$ be the sequence of sets of points generated by FPGA. Then, for all k, every point $x_c$ in the set $X_k$ is feasible for Problem (1).

Proof

The proof is straightforward. First of all, the initial set $X_0$ is composed by feasible points. New solutions are only added through Line 10. Considering that $\varOmega $ is convex and $d_{\varOmega c}^I$ is a feasible direction by construction for any $I \subseteq \{1,\ldots , m\}$, and reminding the stopping criteria of B-FALS, these new points are contained in $\varOmega $ and, therefore, they are feasible for Problem (1).$\square $

In order to prove convergence properties, we need the concept of linked sequence (Definition 5) and the following assumption.

Assumption 1

Let $X_0$ be a set of feasible non-dominated points w.r.t. F. A point $x_0 \in X_0$ exists such that:

$x_0$ is not Pareto-stationary w.r.t. F;
the set $\mathcal {L}(x_0) = \bigcup _{j=1}^m\{x \in \varOmega : f_j(x) \le f_j(x_0)\}$ is compact.

This latter one is similar to Assumption 1 in [10]. The difference is that, in this case, bound constraints must be also taken into account.

Proposition A2

Let us assume that Assumption 1 holds. Let $\{X_k\}$ be the sequence of sets of non-dominated points w.r.t. F produced by FPGA. Let $\{x_k\}$ be a linked sequence, then it admits limit points and every limit point is Pareto-stationary w.r.t. F.

Proof

The proof is almost identical to the one of Proposition 5 in [10]. There is only one difference. After proving that

$$\begin{aligned} \lim _{\begin{array}{c} k\rightarrow \infty \\ k \in K \end{array}}\alpha _{k + 1} = 0 \end{aligned}$$

(A1)

[10, Equation 22], where K indicates a subsequence, the FSDA authors consider sufficiently large values of k such that $\alpha _{k + 1} < \alpha _0$. In this case, the steps of FALS and the definition of $X_k$, $k \in K$, imply that there exists $y_k \in X_k$ such that

$$\begin{aligned} F_I(y_k) + \textbf{1}\beta \frac{\alpha _{k + 1}}{\delta }\theta ^I(x_k) < F_I\left( x_k + \frac{\alpha _{k + 1}}{\delta }d_k^I\right) . \end{aligned}$$

With respect to FALS, B-FALS has an additional stopping criterion: the step size must lead to a point that is feasible for Problem (1). In this context, $\alpha _{k + 1} / \delta \le \alpha _0$ might not have been selected because the point $x_k + (\alpha _{k + 1} / \delta )d_{\varOmega k}^I \not \in \varOmega $. However, through a little modification, we can handle this additional stopping criterion.

First of all, Eq. (A1) still holds: the proof of this statement is the same provided in [10]. Then, we can consider sufficiently large values of k such that

$$\begin{aligned} \alpha _{k + 1} < \frac{\alpha _{k + 1}}{\delta } \le 1. \end{aligned}$$

In this way, since $\varOmega $ is convex and $d_{\varOmega k}^I$ is a feasible direction by construction, the points produced by the two step sizes are feasible, i.e., the B-FALS feasibility stopping criterion is satisfied. Then, the steps of B-FALS and the definition of $X_k$, $k \in K$, imply that there exists $y_k \in X_k$ such that

$$\begin{aligned} F_I(y_k) + \textbf{1}\beta \frac{\alpha _{k + 1}}{\delta }\theta _\varOmega ^I(x_k) < F_I\left( x_k + \frac{\alpha _{k + 1}}{\delta }d_{\varOmega k}^I\right) . \end{aligned}$$

From this point forward, we can follow the remainder of the proof of Proposition 5 in [10] in order to prove the thesis.$\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lapucci, M., Mansueto, P. & Schoen, F. A memetic procedure for global multi-objective optimization. Math. Prog. Comp. 15, 227–267 (2023). https://doi.org/10.1007/s12532-022-00231-3

Download citation

Received: 27 January 2022
Accepted: 02 November 2022
Published: 22 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s12532-022-00231-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A memetic procedure for global multi-objective optimization

Abstract

Similar content being viewed by others

The directed search method for multi-objective memetic algorithms

FEMOEA: a fast and efficient multi-objective evolutionary algorithm

MOEA/D-GLS: a multiobjective memetic algorithm using decomposition and guided local search

1 Introduction

2 Preliminaries

Definition 1

Definition 2

Definition 3

Definition 4

2.1 Multi-objective descent methods

Lemma 1

Lemma 2

Definition 5

Lemma 3

Lemma 4

Lemma 5

Lemma 6

Proposition 1

Proof

2.2 NSGA-II

2.2.1 Metrics

2.2.2 Parents selection

2.2.3 Selection operation

3 Non-dominated sorting memetic algorithm

3.1 Algorithmic scheme

3.1.1 Estimating surrogate bounds

3.1.2 Identifying exploration candidates

3.1.3 Local searches by multi-objective descent

3.2 The front multi-objective projected gradient algorithm

3.2.1 Algorithmic scheme

3.2.2 Algorithm analysis

Assumption 1

Proposition 2

Proof

Remark 1

Corollary 1

Proof

Assumption 2

Proposition 3

Proof

Proposition 4

Proof

4 Computational experiments

4.1 Settings

4.1.1 Metrics

4.1.2 Algorithms and hyper-parameters

4.1.3 Problems

4.2 Experimental comparisons between NSGA-II and FPGA

4.3 Preliminary comparisons between NSMA and the state-of-the-art algorithms

4.4 Performance analysis in variable settings

4.5 Overall comparison

5 Conclusions

Data availability statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Code availability statement

Additional information

Publisher's Note

Appendix A: The Front Projected Gradient Algorithm

Appendix A: The Front Projected Gradient Algorithm

1.1 A.1 Algorithmic scheme

1.2 A.2 Algorithm analysis

Proposition A1

Proof

Assumption 1

Proposition A2

Proof

Rights and permissions

About this article

Cite this article