1 Introduction

Lexicographic multi-/many-objective (LMO) optimization represents a well-grounded and very active research field. The literature, especially the most recent one, is full of its concrete application to everyday problems such as pricing optimization (Zhong et al. 2021) and allocation tasks (Meng et al. 2020). Even if at first sight the strict preference ordering may seem to be a too strict assumption, evidences testify this is not the case. In some applications, the problem manifests a natural lexicographic ordering, e.g., the two-phase Simplex-based approach to solve linear programs (Isermann 1982): the highest priority objective is to find a feasible basis (even at the expenses of its quality), then it is refined improving step by step its optimality. In some others, the priorities are imposed for reasons outside the control of the decision maker, e.g., by law Weber et al. (2002): the goal in this case is to optimize the management of the water supply over some planning horizon by, in order, maximizing flood protection, minimizing supply shortage for irrigation, and maximizing electricity generation.

In recent years, promising mathematical frameworks (Sergeyev 2017; Benci and Cococcioni 2021) able to numerically manipulate infinite and infinitesimal numbers gave new life to lexicographic optimization. The use of such numbers allows one to impose a preference ordering by means of weighting schemes which overcome several difficulties faced so far with previous approaches (Cococcioni et al. 2018, 2020; Cococcioni and Fiaschi 2021). Furthermore, these frameworks allow one to rethink standard many-objective lexicographic programs from a new multi-objective perspective, giving birth to the wide class of mixed-pareto-lexicographic multi-objective-problems (MPL-MOPs) (Lai et al. 2020a, b, 2021). The latter is a broad family of problems that assume the existence of an arbitrarily complex priority structures among objectives. As special cases it counts the (hereinafter pure) LMO optimization, where all the objectives are arranged in one shallow prioritized sequence, and the (hereinafter pure) Pareto many-objective optimization where no priority exists at all.

Other two relevant subcategories are the priority-chains MPL-MOPs (PC-MPL-MOPs) (Lai et al. 2020a) and the Priority-Levels MPL-MOPs (PL-MPL-MOPs) (Lai et al. 2020b, 2021). The former assumes the priority organizes the objectives in multiple lexicographic sequences (namely the chains, each objective belongs to at least one chain) which must be optimized in a Paretian manner. The second one arranges the objectives in levels of priority which are ordered in a single lexicographic sequence; each level is a multi-objective problem on its own. This latter category of problems will receive particular attention in this work because of its novelty, the high impact it may have on the community as well as on industrial applications, and the peculiarity of its optimal front which cannot be approximated by any definition of dominance existing so far.

The work is organized as follow: Sect. 2 provides a brief summary of lexicographic optimization with particular attention to the state of the art; Sects. 35 introduce in detail PL-MPL-MOPs along with the minimum mathematical theory needed to cope with them, namely the Grossone Methodology; Sect. 6 proposes an evolutionary algorithm built upon NSGA-II to solve PL-MPL-MOPs, namely PL-NSGA-II; Sect. 7 assesses PL-NSGA-II efficacy comparing it against several popular evolutionary algorithms on four benchmarks (one of which comes from a real application); Sect. 8 briefly discusses current challenges in identifying a reasonable unary performance indicator for PL-MPL-MOPs. Finally, Sect. 9 summarizes the content of the work and concludes the discussion.

2 A brief review of pure lexicographic optimization

As opposed to Paretian Multi-Objective optimization, LMO optimization is not well covered by books and surveys. This section aims at partially covering such a gap by providing a brief review of the literature on LMO optimization. Moreover, it is helpful in preparing the ground for the main topic of this work: an introduction to recent novelties in MPL-MOPs and the deep analysis of its special case concerning priority levels, namely the PL-MPL-MOPs.

LMO optimization dates its first mature works back to the early ’70s with the pioneer papers by Isermann (1974), Fishburn (1974) and Behringer (1977). The generic formulation of a LMO program consists of \(m\in \mathbb {N}\) objective functions \(f_1,\,\ldots ,\,f_m\) to be optimized over a common domain \(\Omega \subseteq \mathbb {R}^n\), \(n\in \mathbb {N}\), and upon which a strict preference ordering exists. For instance, assuming the natural ordering as preference ordering and the minimization as optimization task (which will hold through the remaining of the work as well), the LMO optimization problem is equivalent to the following program:

$$\begin{aligned} {} \begin{array}{l} \text {min }\qquad {f_m(x)}\\ \text {s.t. }\qquad {x\in \Omega ^{(m)}} \end{array} \end{aligned}$$
(1)

where \(\Omega ^{(m)}\) is recursively defined as a sequence of domain sub-sampling

$$\begin{aligned} \Omega ^{(0)}{:}{=}\Omega , \;\;\; \Omega ^{(i)}{:}{=}\{x\in \Omega ^{(i-1)}\,\vert \,f_i(x) \text { is minimum}\}. \end{aligned}$$

This definition induces one of the most adopted techniques to solve an LMO optimization problem: the preemptive approach (Ignizio 1983; Sherali 1982; Sherali and Soyster 1983). It consists in (at most) m scalar programs to be solved in cascade as follows:

$$\begin{aligned} \begin{array}{ll} {\min }\qquad {f_1(x)}&{}\qquad {\min }\qquad {f_i(x)}\\ \text {s.t. }\qquad {x\in \Omega } &{} \qquad \text {s.t. }\qquad {x}{\in \Omega } \\ &{}\qquad \qquad {f_j(x)}{=f_j^*\;\forall j=1,\,\ldots ,\,i-1} \end{array} \end{aligned}$$

where \(f_j^*\) is the optimal value of \(f_j\) in the j-th program, \(j=1,\,\ldots ,\,m\)-1.

An alternative way to scalarize an LMO optimization problem is by a linear combination of the objectives according to a careful choice of the weights \(w_1,\,\ldots ,\,w_m\) (Sherali 1982; Sherali and Soyster 1983), i.e., to solve the following program:

$$\begin{aligned} \begin{array}{l} {\min }\qquad {\sum \limits _{{i = 1}}^{m} {w_{i} f_{i} (x)}}\\ \text {s.t. }\qquad {x\in \Omega } \end{array} \end{aligned}$$

In general there are no guarantees that the set of optima of the new problem is the same of the original LMO program but in corner cases, mainly because the weights need to be set a priori, i.e., without enough information on the objective functions, which is a challenging task on its own. Furthermore, weights too big with respect to each other may lead to numerical instabilities at run-time. Other relevant scalarization techniques are Zarepisheh and Khorram (2011) and Zykina (2004).

The majority of the studies on and applications of LMO optimization consider linear programming problems and deterministic algorithms; this mainly because resorting to vector valued functions allows one to represent lexicographic ordering in a clear and efficient way, which requires minimal changes in already existing procedures for scalar optimization. For instance, Isermann (1982) proposes and investigates the lexicographic simplex method, an extension of the standard simplex method by Dantzig (1948) able to solve lexicographic multi-objective linear programs too. The same result has been recently re-proposed by Cococcioni et al. (2016, 2018) within the framework of Grossone Methodology and non-Archimedean approaches (Sergeyev 2017).

Later, (Jones et al. 2007) discusses an application to control problems of a technique originally proposed by Dantzig et al. (1955): the lexicographic perturbation of the constraints of a linear program. The resulting lexicographic (in the constraints) linear program has the advantage to be for sure non-degenerate and cycle-free. This properties are quite useful in the design of the optimal control for a constrained linear system with a linear cost function, since it can be posed as a multi-parametric linear program which is by construction degenerate. The main consequence of such a degeneracy is the risk of chattering of the control input, which can degrade the quality of the control itself. Furthermore, the lexicographic perturbation speeds up the computing of other parametrizations under alternative orderings by reusing a previous optimal solution.

Control theory benefits a lot from LMO optimization and found many fruitful applications over the years. Miksch and Gambier (2011) convert a multi-objective quadratic program for fault-tolerant control into a linear lexicographic one leveraging the \(l_1\) norm and a preemptive scheme. Other works on lexicographic model predictive control are (Vada et al. 2001a, b). The authors in Dueri et al. (2015) proposes a real-time control allocation algorithm to optimally utilize multiple actuators on a momentum control system while explicitly considering actuator constraints. The problem investigated is a two-objective convex lexicographic optimization problem which aims at improving performances by minimizing the torque error. Since in general there exist multiple optimal solutions, the secondary objective forces the choice of minimum power usage ones. The solving algorithm in this case is a customized lexicographic version of the interior point method for convex optimization proposed in Gondzio (2012). A similar study has been conducted by Khosravani et al. (2018) considering, in prioritized order, vehicle lateral stability and minimum torques adjustment.

Two other recent and very promising applications of LMO optimization leveraging deterministic solvers are Camargo et al. (2019) and Datta et al (2019). The former exploits an integer optimizer to preemptively solve the LMO selection of the best renewable energy technologies of hybrid power generation systems for communities located in non-interconnected zones. This task, crucial for the purpose of reducing poverty, takes under consideration the generation cost, the emissions, the energy consumption, as well as environmental sustainability and other related factors. On the other hand, the second is one of the very first attempts to apply lexicographic programming to artificial intelligence problems. In fact, the authors exploit a two-objective linear lexicographic program to cope with the issue of class imbalance in boosting methods, a frequent phenomenon when considering real applications (medical diagnosis, fraud detection, etc.). After having represented the problem as a zero-sum game in the margin space, the preemptive optimization first finds the optimal class-wise boosting weights; then, it computes the overall optimal boosting weights by minimizing their maximum deviation from the former. So far, this is the only approach able to solve such a long-standing problem in machine learning in a parameter-less way, which means guarantees of optimality and reduction of the high computational effort of proper value tuning.

Lexicographic stochastic optimizers are far less common in literature and their investigation as been proposed only recently (Branke 2008), even if with remarkable success. Among the others, (Castro-Gutiérrez et al. 2009) suggests to weakly lexicographically bias the search of a particle swarm optimizer so as to produce a Pareto front closer to the priorities manifested by the decision maker. Meng et al. (2020) implements a lexicographic whale optimization algorithm to design balanced assembly lines integrating knowledge about preemptive maintenance scenarios. This choice, which allows one to more easily bypass unavailable workstations, is realized by two objectives prioritized as follows: first the productivity under regular operation scenario is optimized; then, the production continuity under preventive maintenance scenario is guaranteed as much as possible. Finally, Zhong et al. (2021) investigates road pricing optimization considering both land use and transportation effects. The study revisits a classic genetic algorithm (Haupt and Haupt 2003) in a relaxed lexicographic fashion to solve a two-objective prioritized problem which first tries to model as good as possible the interaction between land use and transportation effects; then, it seeks for an optimal cordon-based road pricing scheme given the interaction inferred. The case study of Jiangyn (city of China) assesses the efficacy of the approach against several other frequently adopted techniques.

Recently, the advent of new mathematical frameworks able to numerically manipulate infinite and infinitesimal quantities (Sergeyev 2017) allowed researchers to tackle multi- and many-objective lexicographic optimization from a new perspective. Among deterministic approaches, (Cococcioni et al. 2016, 2018) investigates theoretically and numerically lexicographic linear programming problems, (Cococcioni et al. 2020) introduces integer constraints, (Cococcioni and Fiaschi 2021) overcomes the inherent issues within Big-M approaches by a lexicographic reformulation of the problem and the use of an infinite penalty constant, (Fiaschi and Cococcioni 2018, 2020a, b; Cococcioni et al. 2021) adopt lexicographic tools and solvers in the field of game theory. On the other hand, studies on stochastic evolutionary optimization based on novel priority structures are (Lai et al. 2020a, b, 2021). The effectiveness of the new methodology stems from the fact that it is able to mix the positive aspects of preemptive approaches (guarantee of convergence to the optimum) and scalarization ones (optimization of just one scalar problem).

To conclude this section it is worth to quote (Ojha and Biswal 2009) which, investigating lexicographic multi-objective geometric programming problems, asserts how in real-world application often happens that “more objectives share the same priority”. This aspect is efficaciously tackled by MPL-MOPs and, in particular, by PL-MPL-MOPs. Next section is devoted to their introduction.

3 Mixed pareto-lexicographic problems (MPL-MOPs)

The previous section introduced both pure Pareto and pure lexicographic many-objective optimization problems (MOPs), which are programming tasks commonly found in the literature. The former are analytically formulated like in Eq. (2), where m is the number of objective functions, \(\Omega\) is the feasible decision space (or search space), and the optimization is performed in the standard Pareto-sense. On the other hand, lexicographic optimization complies with several models, e.g., Eqs. (1) and (3). In both, the priority is introduced by the natural ordering.

$$\begin{aligned}&\begin{aligned} \min&\,\, \{f_1(\mathbf {x}),\,\ldots ,\,f_m(\mathbf {x})\},\\ \text {s.t.}&\,\,\mathbf {x}\in \Omega . \end{aligned} \end{aligned}$$
(2)
$$\begin{aligned}&\begin{aligned} \text {lexmin}&\,\, \left\{ f_1(\mathbf {x}),\,\ldots ,\, f_m(\mathbf {x})\right\} ,\\ \text {s.t.}&\,\, \mathbf {x} \in \Omega . \end{aligned} \end{aligned}$$
(3)

Interestingly, both models represent edge cases of a more general family of problems, which has received little attention in the literature: Mixed-Pareto-lexicographic MOPs (MPL-MOPs). The mathematical description is:

$$\begin{aligned} \begin{aligned} \min&\,\, \{f_1(\mathbf {x}),\,\ldots ,\,f_m(\mathbf {x})\}, \\ \text {s.t.}&\,\, \mathbf {x}\in \Omega , \\&\,\, \mathfrak {p}(f_1(\mathbf {x}),\,\ldots ,\,f_m(\mathbf {x}))\ \text{ supplied, } \end{aligned} \end{aligned}$$
(4)

where \(\mathfrak {p}(\cdot )\) is a generic distribution of priorities among the objectives. The problem of dealing with arbitrary priority policies is crucial for further improvements of EMO algorithms, since routines able to exploit priority information would definitely help in finding better solutions, as suggested in Gaur et al. (2020). In the literature, some studies concerning ad-hoc proposal to cope with priority relations exist, such as \(\varepsilon\)-constrained methods (Deb 2001) or scalarization (Marler and Arora 2004). In spite of their superior performance, they are not specifically designed to deal with MPL-MOPs, resulting in relevant drawbacks such as the inability to find multiple solutions per run, or the need for an appropriate choice of scalarization weights beforehand. However, the class of MPL-MOPs is too heterogeneous to design a one-fits-all approach; it is much more reasonable to narrow the focus on specific instances where the shape of the \(\mathfrak {p}(\cdot )\) function has peculiar properties or is particularly significant to model real-world scenarios.

Fig. 1
figure 1

PL-MPL-MOP structure

A very appealing and effective sub-class of MPL-MOPs is the one of Priority-Levels MPL-MOPs (PL-MPL-MOPs), where the priority structure, \(\mathfrak {p}(\cdot )\), groups the objectives in priority levels (PLs) over which an ordering relation exists, as illustrated in Fig. 1. The key idea is: i) each group clusters objectives by importance; ii) a group contains objectives having the same priority. The mathematical formulation of a PL-MPL-MOP is:

$$\begin{aligned} \text {lex}\min \left[ \min \left( \begin{array}{l} f_1^{(1)}(x)\\ \vdots \\ f_{m_1}^{(1)}(x) \end{array} \right) , \min \left( \begin{array}{l} f_1^{(2)}(x)\\ \vdots \\ f_{m_2}^{(2)}(x) \end{array} \right) ,\ldots , \min \left( \begin{array}{l} f_1^{(p)}(x)\\ \vdots \\ f_{m_p}^{(p)}(x) \end{array} \right) \right] , \end{aligned}$$
(5)

where p is the number of PLs and \(f_i^{(j)}\) is the i-th objective in the j-th PL. Note that an objective can repeat in multiple PL. In such problems, the Pareto-optimal solutions of the objectives in the first PL form the decision space for the Pareto-optimization of those in the second PL, and so on. It should be clear that pure Pareto optimization problems are PL-MPL-MOPs with only one PL, while lexicographic problems are PL-MPL-MOPs with multiple PLs having one objective each. The next section introduces the mathematical framework which makes the numerical solution of this kind of problems possible, namely the Grossone Methodology. Its advent was crucial for our discussion, since there does not exist tools able to concurrently perform Pareto and lexicographic optimization yet.

4 The grossone methodology

Grossone Methodology (GM) (Sergeyev 2017) is a numerical framework that makes working with infinite and infinitesimal numbers on a computer possible. It has already been successfully applied to many different optimization problems (Cococcioni et al. 2018, 2020; De Cosmis and De Leone 2012; De Leone 2018; De Leone et al. 2020; Lai et al. 2020a; Cococcioni and Fiaschi 2021). The GM fundamental element is the infinite unit Grossone, denoted by \(\textcircled {1}\), which allows one to build numerical values composed by finite, infinite and infinitesimal components, known as gross-scalar (G-scalar in brief). The latter are indicated as:

$$\begin{aligned} c=c_{p_m}\textcircled {1}^{p_m} + ... + c_{p_0}\textcircled {1}^{p_0} + ... + c_{p_{-k}}\textcircled {1}^{p_{-k}}, \end{aligned}$$

where \(m, k \in \mathbb {N}\), the exponents \(p_i\) and the digits \({c_p}_{_i}\in \mathbb {R}\) are called gross-powers (G-powers) and gross-digits (G-digits), respectively.

The four basic operations between two G-scalars are reasonably intuitive, easy to implement and inherit all the standard properties like associativity, commutativity and existence of the inverse. For instance, consider the following few lines to get a basic understanding of their arithmetical behavior, which is similar to the one of polynomials:

$$\begin{aligned} \begin{array}{c} \textcircled {1}\cdot (\textcircled {1}+ 2) = \textcircled {1}^{2} + 2\textcircled {1}, \quad \quad 0< \dfrac{1}{\textcircled {1}} = \textcircled {1}^{-1}< \textcircled {1}^0 = 1 < \textcircled {1}^1 = \textcircled {1}, \\ \dfrac{-10.0\textcircled {1}^3 +16.0 +42.0\textcircled {1}^{-3}}{5.0\textcircled {1}^3+7.0} = -2.0 +6.0\textcircled {1}^{-3}. \end{array} \end{aligned}$$

In addition, the operator < induces a total ordering among the set of G-scalars, and the comparison is made considering the G-powers in descending order: if they differ or they are equal but the corresponding G-digits are different, then the biggest corresponds to the largest G-scalar; otherwise, the next pair of G-powers and G-digits is checked, and so on.

For the rest of the paper, we only consider G-scalars having a finite number of components, each with real-valued G-power. In such a scenario, finite numbers are represented by G-scalars with the highest G-power equal to zero, infinitesimal ones with negative highest G-power, and infinite ones with a positive highest G-power. For instance, \(-3.4=-3.4\textcircled {1}^0\) and \(2+0.5\textcircled {1}^{-2}\) are both finite numbers, \(\pi \textcircled {1}^{-1}-\textcircled {1}^{-\sqrt{2}}\) is infinitesimal, while \(3\textcircled {1}^{\pi }-70\textcircled {1}^{-\mathrm {e}}\) is infinite.

5 Use of grossone in PL-MPL-MOPs

This section shows how \(\textcircled {1}\)helps in the design of higher priority-aware routines to deal with PLs better. First, PL-MPL-MOPs are reformulated according to Grossone-based representation where the value of the overall objective function, comprehensive of the Paretian and lexicographic components altogether, can be numerically evaluated. The second and richer part, instead, investigates a novel definition of dominance that possibly fits better the nature of PL-MPL-MOPs. Needless to say, the latter leverages \(\textcircled {1}\)as well.

5.1 Problem reformulation

Similarly to the case of MPL-MOPs with priority chains (Lai et al. 2020a), the lexicographic relation between PLs can be represented both mathematically and in a computer by means of \(\textcircled {1}\). Here, a lexicographic priority between two levels indicates that one group of objectives is infinitely more important than another one. The key idea is to use decreasing powers of \(\textcircled {1}\)to represent such ordering and split up the objectives in well-defined PLs. In mathematical terms, the problem in (5) is reformulated with Grossone as:

$$\min \left[ \left( \begin{array}{l} f_1^{(1)}(x)\\ \vdots \\ f_{m_1}^{(1)}(x) \end{array} \right) + \textcircled {1}^{-1}\left( \begin{array}{l} f_1^{(2)}(x)\\ \vdots \\ f_{m_2}^{(2)}(x) \end{array} \right) + \textcircled {1}^{1-p}\left( \begin{array}{l} f_1^{(p)}(x)\\ \vdots \\ f_{m_p}^{(p)}(x) \end{array} \right) \right] .$$
(6)

In practice, the lexicographic optimization of the priority levels has been transformed into a minimization problem over the weighted average of the PLs contributions. The weights are assigned to the levels by importance: the one with the highest priority is weighted by \(\textcircled {1}^0\) (the digit, 1, is omitted for brevity), the second one by \(\textcircled {1}^{-1}\), and so on, until no more PLs remain.

5.2 A novel definition of dominance for PL-MPL-MOPs

When dealing with a PL-MPL-MOP, the Pareto dominance cannot be a reasonable definition of dominance since it is not clear how to represent the level-structure of the priority information. Defining an appropriate PL-dominance is not trivial, as demonstrated by the following example, that would be a straightforward but actually a broken definition of PL-Dominance:

Definition 1

(PL-Dominance (INCORRECT)) Let x and y be two solutions of a PL-MPL-MOP with n PLs: (\(\mathfrak {p}_1,\,\ldots ,\,\mathfrak {p}_n\)) arranged in the order of reducing priority. Let \(\mathfrak {p}_i^z\) indicate the i-th PL of solution z, and \(x \prec y\) indicates that x Pareto-dominates y. Then, x is said to “PL-dominate” y (\(x\prec _{\text {PL}}y\)) \(\Longleftrightarrow\) \(\exists \, i\) such that \(\mathfrak {p}_i^x \prec \mathfrak {p}_i^y\) and \(\not \exists \,j<i\) such that \(\mathfrak {p}_j^y \prec \mathfrak {p}_j^x\).

This definition surprisingly lacks of the transitivity property, i.e. there may exist three elements xyz such that x PL-dominates y and y PL-dominates z, but z PL-dominates x. The following counterexample is provided for clarity. Consider the following three solutions xyz of a PL-MPL-MOP with two PLs, each with two objectives to be minimized:

$$\begin{aligned} x= & {} \left( \begin{array}{l} 1\\ 6 \end{array}\right) +\textcircled {1}^{-1} \left( \begin{array}{l} 1\\ 1 \end{array}\right) , \quad \quad y =\left( \begin{array}{l} 2\\ 1 \end{array}\right) +\textcircled {1}^{-1} \left( \begin{array}{l} 2\\ 4 \end{array}\right) , \quad \quad \\ z= & {} \left( \begin{array}{l} 0\\ 5 \end{array}\right) +\textcircled {1}^{-1} \left( \begin{array}{l} 5\\ 7 \end{array}\right) \end{aligned}$$

With reference to Definition 1, there is a dominance-loop since: i) \(x \prec _{\text {PL}}y\) because, although they are non-dominated for the first PL, x performs better than y on the second one; ii) \(y \prec _{\text {PL}}z\) for the same reason; iii) \(z \prec _{\text {PL}}x\) because it does better on the first PL.

A way better definition of dominance leverages the concept of non-dominated sub-fronts, a generalization of the notion of fronts which consists in nested fronts generated by the priority. The procedure to partition the population into sub-fronts starts considering only the objectives with the highest priority, and generating a first set of non-dominated fronts as usual, according to Pareto dominance. Then, each front is considered individually, and further split on the basis of the objectives within the second PL, determining new nested fronts (namely, sub-fronts). The same procedure is repeated recursively on the newly-obtained sub-fronts, until no more PLs remain; the result is a population partitioned in a tree-like hierarchy of sub-fronts. Exploiting the natural ordering of the sub-fronts, one can adopt the GM to uniquely identify each leaf-front, i.e., assign an index to each sub-front containing solutions. Such index, which is a G-scalar, is computed as the sum of n components of the form \(d_i\textcircled {1}^{p_i}\), where n is the number of PLs, \(p_i\) is the G-power associated to the i-th PL (see Eq. (6)), and \(d_i\) is the position of the parent front at the \((1-i)\)th level of the hierarchy. For instance, \(F_i\) where \(i=3+2\textcircled {1}^{-1}+5\textcircled {1}^{-2}\) denotes all the solutions in the front 3 with respect to the first PL, front 2 with respect to the second PL (within the individuals in the front 3 for the first one) and front 5 with respect the third PL (among those with primary front 3 and secondary front 2). Given such calculation of sub-fronts, the correct definition of PL-dominance is:

Definition 2

(PL-Dominance (CORRECT)) Let x and y be two solutions such that \(x\in F_i\) and \(y\in F_j\). Then, \(x\prec _{\text {PL}} y\) \(\iff\) \(i<j\).

Notice that \(\prec _{\text {PL}}\) is not just a function of two arguments (solutions), but has a global dependency on all the other individuals, since it previously requires the assignment of the fitness rank to the whole population.

Going back to the previous counterexample, assuming for simplicity that there are no more elements in the population, it turns out that \(x,\,y\) and z belong to sub-fronts indexed by \(2+1\textcircled {1}^{-1}\), \(1+1\textcircled {1}^{-1}\) and \(1+2\textcircled {1}^{-1}\), respectively. Therefore, it holds true that \(y \prec _{\text {PL}} z\) and \(z \prec _{\text {PL}} x\), but it becomes false that x dominates y (\(x\nprec _{\text {PL}}y\)). The proof of transitivity of the PL-dominance comes straightforwardly from the definition, and it is omitted here for brevity.

6 PL-NSGA-II procedure

This section presents an enhanced version of the well-known NSGA-II (Deb et al. 2002) algorithm equipped with PL-dominance and GM such that it is able to effectively reckon with PL-MPL-MOPs too. To implement it, one needs to modify NSGA-II four core operations, which are: i) solutions non-dominance rank assignment; ii) population sorting by rank; iii) crowding distance assignment to each individual; iv) selection of the new population from the fronts in best-to-worst order, using crowding distance to break ties. The improved version is called PL-NSGA-II. It is only right to say that the choice to improve NSGA-II rather than other evolutionary algorithms such as MOEA/D or NSGA-III is totally arbitrary, and driven by the simplicity to do it.

The first step improvement is implemented by the routine described in Sect. 5.2, which assigns a sub-front to each solution. The sub-front index plays exactly the role of a non-dominance rank for the solution. The second step comes directly by the use of GM, which induces a total ordering among G-scalars. Since the non-dominance rank (i.e., the sub-front index) is a G-scalar, one can resort to the order relation mentioned in Sect. 4 to do so. Both steps are expected to generate a larger number of smaller-sized fronts, a benign effect since one of the main weaknesses of many-objective algorithms is indeed represented by large sets of non-dominated solutions.

The novel way to compute the crowding distance proposed here follows an approach somewhat similar to the one leading to Eq. (6), and it is designed to prioritize solutions that are distant in more important PLs rather than in the less important ones. Actually, it states that the crowding distance of a solution belonging to a front \(F_i\) is the weighted sum of the crowding distances computed at every PL, where the weights are exactly the ones assigned to each of them. Of course, only the solutions in \(F_i\) must be considered for this procedure. Its pseudocode is reported in Algorithm 1.

Since the crowding distance is now a G-scalar, a total ordering exists and the fourth step is straightforward. The next population is filled with the solutions from the best sub-front, then with those from the second best sub-front, and so on until enough elements have been picked. If a subfront is too large to be included in its entirety, then the solutions with the best crowding distance value are preferred to build the next population.

Altogether, these four improved pieces give birth to PL-NSGA-II, an algorithm which is specifically designed to deal with PL-MPL-MOPs, as well as with standard Pareto and lexicographic problems, being them nothing but corner cases. The next section is devoted to quantitatively assess the performances achieved by the new algorithm PL-NSGA-II.

figure a

7 Results of PL-MPL-MOP

Since there do not exist benchmark problems featuring priority levels yet, the experiments of this section are mainly performed on modified versions of existing problems: two well-known challenging test-bed and one engineering design problems. In addition, an handcrafted benchmark with high interpretability is used. The performance of PL-NSGA-II is compared with four EMO algorithms: i) NSGA-II, ii) NSGA-III (Seada and Deb 2015), iii) MOEA/D (Zhang and Li 2007), and iv) GRAPH (Schmiedle et al. 2001); the latter can handle supplied preference policies. In order to make the comparison fair, solutions from the non-prioritized algorithms (NSGA-II, NSGA-III, and MOEA/D) are filtered after the optimization on the basis of the supplied PL-dominance information. All algorithms have used the SBX crossover operator and polynomial mutation with a population of 200 individuals and run for 500 generations, but for the third benchmark where they run for 1000. The so small number of solutions is meant to stress how the use of additional priority information since the beginning of the optimization can reduce the computational effort of the task, and then the computational resources needed by the decision maker. For each algorithm, the experiment is repeated 50 times to ensure a statistically stable performance comparison through the metric \(\Delta (\cdot )=\max \{IGD(\cdot ),\,GD(\cdot )\}\), where IGD is the inverted generational distance and GD indicates the generational distance (Schütze et al. 2012) (their computation comes after the normalization of the objective space to the unitary hyper-cube). The use of GD and IGD metrics rather than a unary indicator such as the hypervolume is due to the fact that a clear way to fairly compute the latter in the case of PL-MPL-MOPs is still missing and under investigation, as discussed more in detail in Sect. 8. Concerning MOEA/D, NSGA-II and NSGA-III the implementations from pymoo library (Blank and Deb 2020) have been used in this work. The implementation of PL-NSGA-II used in this work is the most advanced one, i.e., it is the current state of the art.

7.1 PL-C problem

The PL-C problem is an artificial benchmark which serves as a gentle introduction to PL-MPL-MOPs. In spite of a rather simple input space, PL-C is still a quite challenging test-bed; it consists of three PLs of two or three objectives each. Its analytical formulation is:

$$\begin{aligned} \begin{aligned} \min&\,\, \left[ \begin{pmatrix} x_1\\ x_2\\ g(x) \end{pmatrix} + \begin{pmatrix} \cos (2\Vert x\Vert )\\ -\sin (2\Vert x\Vert ) \end{pmatrix}\textcircled {1}^{-1} + \begin{pmatrix} \Vert x-(5,5)\Vert \\ \Vert x-(10,6)\Vert \\ \Vert x-(6,10)\Vert \end{pmatrix}\textcircled {1}^{-2} \right] ,\\ \text {s.t.}&\,\,x\in [0,\,10]^2 \end{aligned} \end{aligned}$$

where

$$\begin{aligned} g(x) = \frac{e^{-\Vert x\Vert }}{1+e^{-\Vert x\Vert }}(3+10\cos ^2(2\Vert x\Vert )). \end{aligned}$$

The optimal front considering first PL only is mainly driven by the function g(x). If we plot the latter as a function of the norm of x, its shape is exactly the red line in Fig. 2. The region in green highlights the points of its graph which are Pareto optimal. Since g is a bivariate function, the true Pareto front coincides with surface of revolution identified by the green points in Fig. 2 rotating around the vertical axis in the positive orthant of the domain. In fact, this surface is the one reported in Fig. 3.

Fig. 2
figure 2

In red is drawn the function g in PL-C as depending on \(\Vert x\Vert\). In green are highlighted the points of its graph which are Pareto optimal. (Color figure online)

The optimal front consists of seven disjoint regions, fact which makes even the optimization of the first PL a quite challenging problem. The second PL is designed to select one every two of these regions, see Fig. 4 to appreciate this effect on the Pareto front. Finally, the third PL preserves only those solutions which continue to be Pareto efficient when considering their distance from the vertices of a triangle in the search space. The latter are located at points (5, 5), (10, 6) and (6, 10); Fig. 5 shows how their presence in the third PL refines the set of optimal solution, leaving only those ones which are non-PL-dominated with respect to the whole PL-MPL-MOP.

Fig. 3
figure 3

PL-C optimal front with first priority level only

Fig. 4
figure 4

PL-C optimal front with first and second priority levels

Fig. 5
figure 5

PL-C optimal front with all the three priority levels

Table 1 shows the performance of the algorithm on the PL-C benchmark. It reports both the mean and the standard deviation of the \(\Delta (\cdot )\) metric, along with the average number of solutions each algorithm output at the end of the optimization. The best values are boldfaced. It is worth notice how as soon as the total number of objectives grows, PL-NSGA-II is able to find a notably higher number of optimal solutions than non-prioritized algorithms. This phenomenon is also testified in Sect. 7.3.

Table 1 Performance (\(\Delta (\cdot ) = \max \{IGD(\cdot ), GD(\cdot )\}\)) on PL-C

7.2 PL-MaF7 problem

MaF7 (Cheng et al. 2017) is a widely recognized benchmark problem for multi- and many-objective evolutionary algorithm testing. We modified it introducing a second PL with three objective functions, as follows:

$$\begin{aligned} \begin{aligned} \min&\,\, \left[ \begin{pmatrix} x_1\\ x_2\\ h(x) \end{pmatrix} + \textcircled {1}^{-1} \begin{pmatrix} f^{(2)}_1(x)\\ f^{(2)}_2(x)\\ h(x) \end{pmatrix} \right] ,\\ \text {s.t.}&\,\, x\in [0,\,1]^{23}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} f^{(2)}_1(x)&= -e^{-0.5\,p(x,\,0.6)},\\ f^{(2)}_2(x)&= \mid p(x,\,0.8)-0.04\mid ,\\ p(x,\,c)&= (x_1-c)^2+(x_2-c)^2,\\ h(x)&= 6+\frac{27}{20}\sum _{i=3}^{23}x_i-\sum _{i=1}^2x_i\left( 1+\sin (3\pi x_i)\right) . \end{aligned} \end{aligned}$$

The first PL has a Pareto front made up by four disjoint regions due to periodic function in \(h(x)\), as illustrated in Fig. 6. The second PL has the effect of isolating just one of them, the one closest to the line \(x_1 = x_2 = 1\) (highlighted in orange in Fig. 6).

Fig. 6
figure 6

NSGA-II and true PL-MPL (orange) points for MaF7. (Color figure online)

Fig. 7
figure 7

Obtained PL-NSGA-II points for PL-MaF7

Fig. 8
figure 8

Obtained MOEA/D (2nd-best) points for PL-MaF7

The function \(f^{(2)}_1\) is a bivariate normal distribution centered in \((0.6,\,0.6)\), which assigns higher importance to solutions with \((x_1,\,x_2)\) closer to the mean point; \(f^{(2)}_2\) favors the candidate solutions whose pair \((x_1,\,x_2)\) is closer to the circumference with center in \((0.8,\,0.8)\) and radius 0.2. Finally, the last function selects a single disjoint region: \(f^{(2)}_1\) helps keeping the optimal part of the front near the origin on the \(x_1\)-\(x_2\) plane, whereas h pushes in the opposite direction, i.e., it privileges those points further from the origin.

For each algorithm, the mean and the standard deviation of \(\Delta (\cdot )\) are reported in Table 2. Figure 7 reports the primary Pareto front obtained by a single run of PL-NSGA-II, demonstrating the ability of the priority-aware algorithm to accurately generate the right part of the Pareto front, whereas the second-best method (MOEA/D) cannot find a well-diverse set (Fig. 8).

Table 2 Performance (\(\Delta (\cdot ) = \max \{IGD(\cdot ), GD(\cdot )\}\)) on MaF7

7.3 PL-MaF11 problem

MaF11 (Cheng et al. 2017) is a benchmark problem from the Walking Fish Group (WFG) test suite, named there WFG2. We consider a three-objective version of the problem. The original objective functions, now coming together in the first PL, are

$$\begin{aligned} \begin{aligned} f_1^{(1)}(x)&= y_3+2\left( 1-\cos \left( y_1\frac{\pi }{2}\right) \right) \left( 1-\cos \left( y_2\frac{\pi }{2}\right) \right) ,\\ f_2^{(1)}(x)&= y_3+4\left( 1-\cos \left( y_1\frac{\pi }{2}\right) \right) \left( 1-\sin \left( y_2\frac{\pi }{2}\right) \right) ,\\ f_3^{(1)}(x)&= y_3+6\left( 1-y_1\cos ^2\left( 5y_1\pi \right) \right) , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} y_i&= {\left\{ \begin{array}{ll} (t_i^{(3)}-0.5)\max (1,\,t_3^{(3)})+0.5, &{} i=1,2,\\ t_3^{(3)}, &{} i=3, \end{array}\right. }\\ t_i^{(3)}&= {\left\{ \begin{array}{ll} t_i^{(2)}, &{} i=1,2,\\ \frac{1}{5}\sum _{j=3}^{7} t_j^{(2)}, &{} i=3, \end{array}\right. }\\ t_i^{(2)}&= {\left\{ \begin{array}{ll} t_i^{(1)}, &{} i=1,2,\\ t_{2i-3}^{(1)}+t_{2i-2}^{(1)}+2\mid t_{2i-3}^{(1)}-t_{2i-2}^{(1)}\mid , &{} i=3:7, \end{array}\right. }\\ t_1^{(1)}&= {\left\{ \begin{array}{ll} z_i, &{} i=1,2,\\ \frac{\mid z_i-0.35\mid }{\mid \lfloor 0.35-z_i \rfloor \mid +0.35}, &{} i=3:12, \end{array}\right. } \end{aligned} \end{aligned}$$

with \(z_i = {x_i}/{2}\) and \(x_i\in [0,2i]\) for \(i=1:12\). The second PL, instead, consists of the following two objective functions:

$$\begin{aligned} f_i^{(2)} = -\left( f_1^{(1)}-2f_2^{(1)}+(-1)^i\frac{11}{4}\right) ^2, \quad i=1,2. \end{aligned}$$

They are two parabolic cylinders which select only three pieces of the original Pareto front: the middle region (regardless the height) and the two farthest points from the origin on the \(f_1^{(1)}-f_2^{(1)}\) plane. Figure 9 shows the final non-dominated solutions of the original problem found by NSGA-III. Figures 10 and 11 report the optimal front considering both the PLs when approximated by GRAPH and PL-NSGA-II, respectively. Table 3 presents the detailed performance comparison, indicating the superior performance of PL-NSGA-II. As in Sect. 7.1, its superiority is not only in the metric but also in the number of optimal solutions it outputs.

Fig. 9
figure 9

NSGA-III front of MaF11

Fig. 10
figure 10

GRAPH front of MaF11

Fig. 11
figure 11

PL-NSGA-II solutions on PL-MaF11

Table 3 Performance (\(\Delta (\cdot ) = \max \{IGD(\cdot ), GD(\cdot )\}\)) on MaF11

7.4 Crashworthiness problem

Finally, we consider the crashworthiness problem (Liao et al. 2008), which was solved using NSGA-like optimizers (Deb and Jain 2014). Two PLs are formed: the first PL involves minimizing the mass (\(f_1\)) and acceleration (\(f_2\)), while the second PL minimizes acceleration (\(f_2\)) and the toe-board intrusion compliance due to a crash (\(f_3\)). This choice reflects the hierarchical interests of a company which seeks to design high performing cars caring also about the driver’s safety. Figure 12 shows the obtained PL-NSGA-II points. It clearly highlights how PL-NSGA-II points are non-dominated for \(f_1\)-\(f_2\) and then non-dominated for \(f_2\)-\(f_3\). Table 4 reports the performance of the algorithms in accordance to the \(\Delta (\cdot )\) metric. PL-NSGA-II significantly outperforms the other algorithms, showing a very small standard deviation.

Fig. 12
figure 12

PL-NSGA-II points lie on edge of \(f_1\)-\(f_2\) trade-off points

Table 4 Performance (\(\Delta (\cdot ) = \max \{IGD(\cdot ), GD(\cdot )\}\)) on Crashworthiness

8 Challenges in unary metrics definition for PL-MPL-MOPs

The use of the \(\Delta (\cdot )\) metric is prone to misleading behaviors and it is not trustworthy in case of close performance (Ishibuchi et al. 2015) (which is not the case of this work). The implementation of the more robust \(\Delta ^+(\cdot )\) metric (again see (Ishibuchi et al. 2015)) for PL-MPL-MOPs is currently under implementation, so as to provide a weakly PL-compliant performance indicator. However, some efforts have been already spent to reason about the implementation of a unary performance indicator, i.e., the transposition in the PL-MPL-MOPs domain of the hypervolume metric (Shang et al. 2020).

So far, investigations showed that a proper definition of the PL-hypervolume is far from being easy. The reason is mainly the difficulty in guaranteeing the PL-compliance of the indicator, which is crucial since the Pareto compliance is what makes the hypervolume so appealing in the standard MOPs. The remaining of the section discusses three apparently reasonable approaches to the PL-hypervolume definition, which are of little help since there exist cases where the PL-compliance is not respected. Finally, a first feasible approach (which deserves further investigation) is proposed.

The first proposal is the most naive one and suggests flattening the PLs into one single standard MOP and then resorting to the standard hypervolume as performance indicator. With reference to the PL-C benchmark in Sect. 7.1, the objective values of each candidate optimal solution \(x\in \Omega\) found by an algorithm would be rewritten as the single vector:

$$\begin{aligned} x\mapsto \begin{pmatrix} x_1,&x_2,&g(x),&f_1^{(2)}(x),&f_2^{(2)}(x),&f_1^{(3)}(x),&f_2^{(3)}(x),&f_3^{(3)(x)} \end{pmatrix}^T, \end{aligned}$$

where

$$\begin{aligned} f_1^{(2)}(x)= & {} \cos (2\Vert x\Vert ),\, f_2^{(2)}(x) = -\sin (2\Vert x\Vert ),\, \\f_i^{(3)} = & {} {\left\{ \begin{array}{ll} \Vert x, (5,5)\Vert &{} i=1\\ \Vert x, (10,6)\Vert &{} i=2\\ \Vert x, (6,10)\Vert &{} i=3. \end{array}\right. } \end{aligned}$$

This idea has the obvious drawback of loosing the priority information but guarantees the Pareto compliance in the new objective space. However, the latter does not enforce the PL-compliance in the original one, as shown in the example proposed later.

The second approach proposes to transpose the problem from a PL-MPL-MOP to a PC-MPL-MOP by summing up the levels. The reason is that for PC-MPL-MOPs a definition of a PC-hypervolume comes quite naturally and can be proved to be PC-compliant (the proof is out of the scope of this brief discussion). Its implementation is exactly the same of the standard hypervolume with the difference that the objective functions are G-numbers rather than real ones. Again resorting to the PL-C benchmark, the mapping would be:

$$\begin{aligned} x\mapsto \begin{pmatrix} x_1 &{}+&{} f_1^{(2)}(x)\textcircled {1}^{-1} &{}+&{} f_1^{(3)}(x)\textcircled {1}^{-2}\\ x_2 &{}+&{} f_2^{(2)}(x)\textcircled {1}^{-1} &{}+&{} f_2^{(3)}(x)\textcircled {1}^{-2}\\ g(x) &{}+&{} 0\textcircled {1}^{-1} &{}+&{} f_3^{(3)}(x)\textcircled {1}^{-2} \end{pmatrix}. \end{aligned}$$

However, even in this case the PC-compliance is not enough to guarantee the PL-compliance in the original objective space.

The third and final idea considers each PL separately, computes the standard hypervolume within each, and then sums up the contribution of each PL scaling it down according to its priority. Indicating with \(h^{(i)}\) the standard hypervolume along the i-th PL, in the PL-C benchmark the overall PL-hypervolume would be

$$\begin{aligned} h = h^{(1)}+h^{(2)}\textcircled {1}^{-1}+h^{(3)}\textcircled {1}^{-2}. \end{aligned}$$

In spite of this approach seems very promising since it is quite in line with what one naturally would expect from a PL-MPL-MOP, the PL-compliance is not guaranteed as in the previous cases. Such a behavior reminds a lot what happens with the PL-Dominance: Definition 1 seems very appealing and straightforwardly descending from the structure of a PL-MPL-MOP; however, it is not transitive and a more technical proposal (Definition 2) is needed. This suggests that further investigation is needed to end up with a definition of PL-hypervolume, possibly even more technical and complex than those just proposed, which is also PL-compliant. However, the study is just at the beginning, and therefore there is still a lot of optimism about this research direction.

To support the assertions of non-PL-compliance made so far, we use the following counterexample involving a generic minimization PL-MPL-MOP. The idea is to verify that there exist cases where worse fronts have higher hypervolumes. The previously proposed indicators are applied to three candidate fronts, namely \(\mathcal {F}_1\), \(\mathcal {F}_2\), \(\mathcal {F}_3\). For the sake of simplicity, each front consists of one single solution; this assumption does not affect the generality of the study. They are:

$$\begin{aligned} \mathcal {F}_1= & {} \left\{ \begin{pmatrix}2\\ 5\end{pmatrix}+\begin{pmatrix}2\\ 2\end{pmatrix}\textcircled {1}^{-1}\right\} ,\quad \mathcal {F}_2 = \left\{ \begin{pmatrix}4\\ 3\end{pmatrix}+\begin{pmatrix}7\\ 6\end{pmatrix}\textcircled {1}^{-1}\right\} , \\ \mathcal {F}_3= & {} \left\{ \begin{pmatrix}1\\ 4\end{pmatrix}+\begin{pmatrix}9\\ 8\end{pmatrix}\textcircled {1}^{-1}\right\} . \end{aligned}$$

Table 5 reports the PL-hypervolume in accordance to the three definitions above: \(h_1\) refers to the first one, \(h_2\) to the second and \(h_3\) to the third. As reference point for the indicator calculation we used (20, 20, 20, 20) for the first one and (20, 20) for both the second and the third. In accordance to Definition 2 and indicated with \(s_i\) the solution in the front \(\mathcal {F}_i\), it holds true that \(s_2\prec _{\text {PL}}s_3\prec _{\text {PL}}s_1\). Therefore, the PL-compliance of the first proposal is disproved by solutions \(s_1\) and \(s_3\) since \(h_1(\mathcal {F}_1)>h_1(\mathcal {F}_3)\); on the other hand, the PL-compliance of the second and third proposals are disproved by solutions \(s_2\) and \(s_3\) since \(h_2(\mathcal {F}_3)>h_2(\mathcal {F}_2)\) and \(h_3(\mathcal {F}_3)>h_3(\mathcal {F}_2)\).

Table 5 The value of the hypervolume for the three reference populations

Since none of the three PL-hypervolume proposals is PL-compliant, there is not a clear way to decide which of them is better to be used, at least at first sight. For sure, the first one would be the most resource consuming one, since it is known that the hypervolume computation does not scale well with the number of dimensions of the objective space. About the other two definitions, they seem more or less comparable from such a perspective. Indeed, the second one requires the computation of just one hypervolume but in a more complex (but low dimensional) vectorial space, i.e., the one induced by \(\textcircled {1}\); while the third one needs to calculate p (low dimensional) standard hypervolumes. However, some additional preliminary results seem to suggest that the third definition should be preferred. Ideally, if the front output of the evolutionary algorithms is made up by solutions all belonging to the true optimal front, then one should be able to prove that the third definition of PL-hypervolume is also PL-compliant. Therefore, if the true front was available, it would be enough to filter out each solution of an approximated front according to the PL-domination and then apply the unary metric. The main drawback of this approach is that it does not preserve any information about the discarded solutions, which means that two fronts are evaluated on the truly optimal solutions they find and not on the overall quality of their output. Furthermore, the availability of the true front makes the use of binary metrics a far more reliable tool, especially because they consider the whole front found by an algorithm.

9 Conclusions

The goal of this paper has been to review the state of the art in Pareto and lexicographic many-objective optimization, focusing on PL-MPL-MOPs, a sub-category of MPL-MOPs where the objectives are grouped in levels of priority. Furthermore, an enhanced version of NSGA-II (named PL-NSGA-II) able to deal with level-like structured priorities has been presented, implemented and analyzed. It exploits a novel mathematical framework called Grossone methodology – a numerical approach to handle infinite and infinitesimal quantities oriented to scientific calculations. The Grossone and a custom non-dominance definition – the PL-dominance – have enabled a significantly improved performance. This has been demonstrated by four illustrative experiments carried out on variations of standard benchmark problems and engineering designing ones. The numerical results have demonstrated how the PL-extension of NSGA-II is consistently able to outperform other EMO algorithms of various kinds, some aware of the priority and some not, some designed for many-objective problems and some for multiple objectives only. A brief discussion on recent results about the possibility to implement a unary performance indicator for PL-MPL-MOPs is given. As a future work, the authors are considering the study of PL-based crossover and mutation operators, as well as the design of more challenging benchmarks for PL-MPL-MOPs. Finally, a deeper analysis of the effectiveness of this technique on real problems is also ongoing.