1 Introduction

Multiobjective optimization is an important field of research, strongly connected with applications. Examples of multiobjective optimization problems appear in different scientific domains such as engineering, environment, finance, or medicine [1, 2, 25, 29]. It is common to have more than one objective to optimize, being these objectives conflicting among each other. In this situation, the problem solution is no longer a single point, but it is now a set of points, for which it is not possible to improve one objective, without deteriorating the value of another, known as the Pareto front of the problem.

Typical approaches to this class of problems include the use of metaheuristics, like is the case of evolutionary algorithms, when we aim at an approximation to the complete Pareto front of the problem [12, 21], or aggregation techniques, that combine the different objectives into a single function, which is then optimized, generating a single point in the Pareto front of the problem [11, 20]. General theoretical properties have not yet been established for the class of methods based on metaheuristics. Regarding aggregation techniques, when addressing general nonconvex problems, if the goal is to compute an approximation to the complete Pareto front, care must be taken on the selection of an adequate scalarization technique, since the Pareto front cannot be simply retrieved with linear combinations of the corresponding objectives [7, 8]. Recently, some algorithms based on SQP techniques were successfully proposed [3, 15] for this task.

When the goal is just to compute a single Pareto point, many approaches can be found in the literature, that correspond to multiobjective variants of single objective algorithms [17]. These include Steepest Descent [14], Newton’s Method [13], Quasi-Newton’s approaches [23], among many others. Often, in the numerical experiments, algorithms are run from different initializations in an attempt of generating different points in the Pareto front. However, there is no guarantee of success and the algorithms do not incorporate any explicit mechanism for that purpose.

In the present work, we will follow a trust-region approach, based on quadratic Taylor models, to develop an algorithm that computes an approximation to the complete Pareto front of a multiobjective problem, by approximating the set of Pareto critical points. Trust-region methods have been extensively used in single objective optimization [5], leading to competitive implementations and commercial software [22]. The general framework of a trust-region based algorithm is relatively simple. At each iteration, a model of the objective function is computed and minimized in a trust-region, typically a ball around the current iterate, with a given radius. Depending on how well the model predicts the actual decrease in the objective function value, the model minimizer is accepted or rejected and the trust-region radius is updated.

The update rule for the trust-region radius is crucial for the success of the algorithm, both for single objective optimization or when simultaneously minimizing different models, corresponding to different components of the objective functions. Recently [28], encouraging numerical results have been provided regarding an updating strategy based on the hypervolume indicator [16]. Although, no theoretical results associated with the behavior of the algorithm were provided.

Trust-region algorithms with a well-established convergence analysis were already proposed for multiobjective optimization [4, 26, 27]. The original Newton’s method [13], developed for convex unconstrained optimization, was extended in [4] to nonconvex problems in a trust-region framework, by considering an additional set of linear inequality constraints that enforce descent. This set of additional constraints is also considered in [27], but now under a nonmonotone globalization technique. In [26], the additional constraints are not used but a strong local convexity assumption is needed. Quasi-Newton quadratic models are considered, where the Hessian matrix is positive definite. All algorithms presented in [4, 26, 27] are designed to obtain a single Pareto critical point for the multiobjective problem.

In our approach, we consider unconstrained general multiobjective optimization problems. The proposed algorithm will use quadratic Taylor models, not necessarily convex. No additional constraints are imposed. The algorithm incorporates an explicit strategy to compute an approximation to the complete Pareto front of the multiobjective problem, that relies on an extreme point step and a scalarization step. The extreme point step attempts to extend the approximation to the Pareto front by moving towards its extreme points, corresponding to the individual minimization of each function component. The scalarization step attempts to close the gaps in the Pareto front by selecting an adequate point and performing an aggregated minimization of the models corresponding to the different components of the objective function.

Section 2 formalizes the problem to be addressed and details the proposed algorithmic structure. Convergence is analyzed in Sect. 3. Section 4 provides numerical results that illustrate the relevance of each one of the proposed algorithmic features and compares the numerical performance of the method against the state-of-art Multiobjective Sequential Quadratic Programming (MOSQP) algorithm, proposed in [15]. Finally, some conclusions are presented in Sect. 5.

2 Algorithm description

In this work, we are interested in solving the multiobjective optimization problem defined as:

$$\begin{aligned} \begin{array}{c} \min F(x)\\ \text {s.t.} \, x \in \mathbb {R}^n, \end{array} \end{aligned}$$
(1)

where \( F: \mathbb {R}^{n} \rightarrow \mathbb {R}^{q}\), \( F(x)=(f_{1}(x),\ldots ,f_{q}(x)) \), \(n,q \in \mathbb {N}\), and \(q \ge 2 \). The objective functions \( f_{i}: \mathbb {R}^{n} \rightarrow \mathbb {R}, \, i=1,\ldots ,q\) are assumed to be twice continuously differentiable, with available gradients and Hessians, and conflicting among each other, meaning that it is not possible to find a single point that simultaneously minimizes all function components. The problem solution will be the so-called Pareto front of the problem, namely a set of nondominated points. In multiobjective optimization, we say that point x dominates point y, and represent it by \(x\prec _F y\), if and only if \(F(y)-F(x)\in \mathbb {R}^q_+{\setminus }\{0\}\).

When trying to develop an algorithm that approximates the Pareto front of a multiobjective optimization problem, two goals should be taken into consideration: the extent of the Pareto front, meaning the capability of being able to compute the extreme points of it, corresponding to the individual minimization of each component of the objective function, and the density of the Pareto front, associated with the ability of the algorithm in fulfilling the gaps between points lying in the computed approximation to the Pareto front. The proposed Multiobjective Trust-Region (MOTR) algorithm addresses each one of these goals with two different steps, namely the Extreme Point Step and the Scalarization Step, which are performed at alternate iterations. The latter will make use of an additional step, the Middle Point Step, to select the points where scalarization problems will be solved. Algorithm 1 formalizes the main procedure.

Algorithm 1.
figure a

MOTR

MOTR keeps a list of nondominated points and associated quantities, defined as

$$\begin{aligned} L=\left\{ (x^{j},F(x^{j}),\Delta _{ep}^{j},\Delta _{sc}^{j})\,|\,j \in J \right\} , \end{aligned}$$
(2)

where \( J \subset \mathbb {N} \) is the set of indexes of the points in the list, \(\Delta _{ep}^{j}\) is a \(q \times 1\) vector storing at each component the trust-region radius associated with the point and the corresponding component of the objective function, to be used at the extreme point step, and \(\Delta _{sc}^{j}\in \mathbb {R^+}\) represents the trust-region radius to be used at the scalarization step. Through this work, in a clear abuse of notation that facilitates reading, it will be often stated \(x^j\in L\), meaning that \((x^{j},F(x^{j}),\Delta _{ep}^{j},\Delta _{sc}^{j})\in L\).

In any of the two steps, points are selected from this list and quadratic Taylor models are built centered at the selected point, to replace the different components of the objective function. Thus, for \(i=1,\ldots ,q\), the quadratic model \(m_{i}\) approximating \(f_{i}\) around a given point \( x_{step} \) is defined as

$$\begin{aligned} m_{i}(x)= & {} f_{i}( x_{step})+\nabla f_{i}( x_{step})^{\top }(x- x_{step})\nonumber \\{} & {} +\frac{1}{2}(x- x_{step})^{\top } \nabla ^{2}f_{i}( x_{step})(x- x_{step}), \end{aligned}$$
(3)

where \(\nabla f_{i}( x_{step})\) and \(\nabla ^{2}f_{i}( x_{step})\) represent the gradient vector and the Hessian matrix of \(f_i\) computed at \( x_{step}\).

Depending on the step, models are used in different ways, to compute new nondominated points that are added to the list. Each time that a new point is added to the list, all dominated points are removed from it. Sections 2.1 and 2.2 detail the extreme point and the scalarization steps, respectively.

2.1 Extreme point step

The main goal of the extreme point step, described in Algorithm 2, is to expand the approximation to the Pareto front by moving towards the extreme points of it, corresponding to the individual minimization of each of the objective function components.

Algorithm 2.
figure b

Extreme point step

At each iteration k, for each component of the objective function \( f_{i},\, i=1,\ldots ,q\), the point \(x_{ep}^{i,k}\), corresponding to the minimum value of \(f_{i}\) for the points in the list, is selected. Ties are broken by the largest extreme point trust-region radius corresponding to \(f_{i}\), promoting successful iterations. Larger trust-region radii indicate points not yet selected or successful in their exploration. Once that \(x_{ep}^{i,k}\) is selected, the extreme point trust-region radius corresponding to \(f_{i}\) is set equal to zero for all the other points in the list. They are no longer options as best candidates for the extreme point corresponding to the selected objective function component.

The quadratic Taylor model \(m^k_{i}\) (3), centered at \(x_{ep}^{i,k}\), is computed and an iteration of a single objective trust-region algorithm is performed. The model is minimized in \(B(x_{ep}^{i,k},\Delta _{ep}^{i,k}(i))\), the closed ball centered at \(x_{ep}^{i,k}\) with radius \(\Delta _{ep}^{i,k}(i)\).

The ratio of agreement between the decrease obtained in the model and the variation obtained in the corresponding objective function component, \(\rho _{ep}^{i,k}\), is computed, dictating the acceptance or rejection of the model minimizer, \(x_{ep}^{i,k *}\), and the update strategy for the trust-region radius. Rules identical to the single objective case are adopted.

A high value of \(\rho _{ep}^{i,k}\) indicates that the model is adequately predicting the reduction in the function value. The model minimizer will be accepted and the trust-region radius will be increased. In fact, when \(\rho _{ep}^{i,k}>0\), the value of the objective function \(f_{i}(x_{ep}^{i,k *})\) is smaller than the one of \(f_{i}(x_{ep}^{i,k })\). Thus, it is clear that \(x_{ep}^{i,k *}\) is nondominated by all points in the list \(L_{k}\). The point is added to the list, and all dominated points are removed from it. If \(\rho _{ep}^{i,k}\) is low, then \(x_{ep}^{i,k *}\) will not be accepted and the trust-region radius will be decreased, in an attempt of conferring more quality to the Taylor model as approximation to the real function component.

When \(m^k_{i}(x_{ep}^{i,k})-m^k_{i}(x_{ep}^{i,k *})=0\), preventing the computation of the agreement ratio, there was no model improvement, meaning that the current model center \(x_{ep}^{i,k}\) is the model minimizer. At this situation, \( \rho _{ep}^{i,k} \) is set equal to zero, forcing the decrease of the trust-region radius and the agreement between the Taylor model and the objective function component.

The described procedure for point acception/rejection and the update of the trust-region radius is identical to the single objective case. However, there is the possibility of \(x_{ep}^{i,k}\) continue to be a nondominated point and still remain in the list. In this situation, it is no longer a candidate as extreme point of the objective function component \(f_i\). Thus, \(\Delta _{ep}^{i,k}(i)\) is set equal to zero.

2.2 Scalarization step

In the scalarization step, there is an attempt of filling the gaps in the current approximation to the Pareto front by selecting points associated with the largest gap for the objective function component under analysis. An adequate scalarization problem is solved and the corresponding minimizer is added to the list, if nondominated.

For each component of the objective function, \(f_i, i=1,\ldots , q\), a point \(x_{sc}^{i,k},\) is selected, following the procedure described in Algorithm 3.

Algorithm 3.
figure c

Middle point step

In the middle point step, there is an attempt of identifying the largest gap in the Pareto front, according to the objective function component under analysis, and filling it by considering the middle point of the line segment associated with the points defining the gap as the new iterate, \(x_{sc}^{i,k}\). In case of ties between gaps, priority is given to the ones associated with the larger scalarization step trust-region radius, again promoting the progress of the algorithm. The procedure is repeated until a nondominated point has been found for the function component under analysis or all the gaps have been exhausted. In the last case, the algorithm moves to the next objective function component. As it will be illustrated in the numerical results, reported in Sect. 4, this strategy is one of the key features of the new algorithm.

Once that \(x_{sc}^{i,k}\) is selected, models are built for each objective function component, centered at the new point. A joint minimization of the models is performed, by solving the following scalarization problem, computing the new point \(x_{sc}^{i,k*}\):

$$\begin{aligned} \begin{array}{c c} \min &{}t \\ \text {s.t. }&{} m^k_{l}(x) - m^k_{l}(x_{sc}^{i,k}) \le t,\,\, l=1,\ldots ,q,\\ &{}x \in B(x_{sc}^{i,k},\Delta _{sc}^{i,k}),\\ &{}t \in \mathbb {R}. \end{array} \end{aligned}$$
(4)

A similar scalarization approach was considered in [26] and [4], in the last case enriched with an extra set of linear inequality constraints. Positive definite Hessians are required by [26], for the theoretical analysis of the algorithm. In both cases, the proposed algorithms generate a single point, with no explicit attempt of computing any approximation to the complete Pareto front of the problem.

The criterion for accepting \(x_{sc}^{i,k*}\) and the update rules for the scalarization step trust-region radius are the ones of [30, 31]. The auxiliary functions \(\phi (x)=\max _{j=1,\ldots ,q}f_{j}(x)\) and \(\phi ^k_{m}(x)=\max _{j=1,\ldots ,q} m^k_{j}(x)\) are defined and used to compute the ratio

$$\begin{aligned} \rho _{sc}^{i,k}=\frac{\phi (x_{sc}^{i,k})-\phi (x_{sc}^{i,k *})}{\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k *})}. \end{aligned}$$
Algorithm 4.
figure d

Scalarization step

In [30], it is established that when \(\rho _{sc}^{i,k}>0\), descent is guaranteed for at least one component of the objective function. When the denominator \( \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k *}) \) is equal to zero, \( \rho _{sc}^{i,k} \) is set equal to zero, forcing the decrease of the scalarization step trust-region radius and consequently the agreement between the models and the objective function components.

Having defined the ratio \( \rho _{sc}^{i,k}\), the strategy to accept/reject the new point and to update the trust-region radius is similar to the one of any trust-region method, with the exception that successful steps are obligatory associated to nondominated points. If \( \rho _{sc}^{i,k} \) is large enough but the new point is dominated, the trust-region radius is reduced to increase the model quality. Algorithm 4 details the procedure.

3 Convergence analysis

For analyzing the theoretical behavior of the algorithm, we will consider that no stoping criteria are defined. In particular, \(\Delta _{ep}^{min}=\Delta _{sc}^{min}=0\). Convergence will be established for linked sequences of points \(\{x^{k}\}_{k\in K}\) generated by MOTR, a concept introduced in [19], in a Derivative-free Optimization setting.

Definition 1.1

Consider \(\{L_k\}_{k\in \mathbb {N}}\) the sequence of sets of nondominated points generated by Algorithm 1. A linked sequence is a sequence \(\{x^k\}_{k\in K}\), where \( K \subseteq \mathbb {N} \) denotes the indexes of the points belonging to the linked sequence, such that for any \(k\in K\), the element \((x^{k},F(x^{k}),\Delta _{ep}^{k},\Delta _{sc}^{k})\in L_k\) is generated from the element \((x^{k-1},F(x^{k-1}),\Delta _{ep}^{k-1},\Delta _{sc}^{k-1})\in L_{k-1}\).

An initial point of a linked sequence can be:

  • A point in the initial list, provided by the user;

  • Middle points generated by MOTR in the middle point step, such that these points are not currently in the list and are added to the list by the algorithm.

We are going to establish that every linked sequence generated by MOTR converges to a Pareto critical point, a necessary condition for being a solution of Problem (1), formalized in the following definition.

Definition 1.2

Let \(f_{i}:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be continuously differentiable functions, for \(i\in \{1,\ldots ,q\}\). A point \(x^* \in \mathbb {R}^{n}\) is said to be Pareto critical for Problem (1) if

$$\begin{aligned} \forall d \in \mathbb {R}^{n}, \, \exists i \in \{ 1,\ldots ,q \}:\nabla f_{i}(x^*)^{\top }d\ge 0. \end{aligned}$$

The next lemma, originally stated in [14], provides a criticality measure for multiobjective optimization.

Lemma 1.3

Let \( f_{i}:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) be continuously differentiable functions, for \( i\in \{1,\ldots ,q \}\). Define

$$\begin{aligned} \omega (x)=-\min _{\Vert d \Vert \le 1}\max _{i=1,\ldots ,q}\nabla f_{i}(x)^{\top }d. \end{aligned}$$
(5)

The following statements hold:

  • The mapping \( x\mapsto \omega (x)\) is continuous;

  • \(\omega (x) \ge 0\) for all \( x \in {\mathbb {R}}^{n} \);

  • A point \( x^* \in {\mathbb {R}}^{n} \) is Pareto critical if and only if \(\omega (x^*) = 0\).

Under reasonable assumptions, we are going to establish that for every linked sequence \(\{ x^{k} \}_{k\in K}\) generated by MOTR,

$$\begin{aligned} \lim _{k\rightarrow +\infty ;\,k\in K}\, \omega (x^{k})=0. \end{aligned}$$

The structure of the proof follows the arguments considered by [4, 30, 31].

For \( i\in \{1,\ldots ,q \}\), we assume that the objective function component \(f_{i}\) is lower bounded and twice continuously differentiable, with uniformly bounded Hessian. Thus, function \(\phi (x)=\max _{i=1,\ldots ,q}f_{i}(x)\) is also lower bounded.

Assumption 1.1

For \( i\in \{1,\ldots ,q\} \), the Hessian matrix of the objective function component \(f_i\) is uniformly bounded, meaning that there is a constant \( \kappa _{h} > 0 \) such that

$$\begin{aligned} \Vert \nabla ^{2} f_{i}(x) \Vert \le \kappa _{h} \end{aligned}$$

for all \( x \in {\mathbb {R}}^{n} \) and for all \( i\in \{1,\ldots ,q\} \).

Assumption 3.1 implies that \( \nabla f_{i} \) is Lipschitz continuous, for each \(i\in \{1,\ldots ,q\} \). From it, we can deduce that function \( \omega \), defined by (5), is uniformly continuous [31].

At each iteration k, for \(i\in \{1,\ldots ,q\} \), model \( m^k_{i},\) centered at \(x^k\), is a quadratic Taylor model, again twice continuously differentiable and satisfying:

$$\begin{aligned} m^k_{i}(x^{k})= & {} f_{i}(x^{k}),\\ \nabla m^k_{i}(x^{k})= & {} \nabla f_{i}(x^{k}),\\ \nabla ^{2} m^k_{i}(x^{k})= & {} \nabla ^{2} f_{i}(x^{k}). \end{aligned}$$

In this paper, \( B_{k} \) denotes the current iteration ball, defined as

$$\begin{aligned} B_{k}=B(x^{k},\Delta _{k}) = \left\{ x \in {\mathbb {R}}^{n} \;|\; \Vert x-x^{k}\Vert \, \le \Delta _{k} \right\} , \end{aligned}$$

where \( x^{k} \) is the current iterate and \( \Delta _{k} \) represents the current trust-region radius.

Assumption 3.1 allows us to establish the well-known error bounds for Taylor models.

Lemma 1.4

Let Assumption 3.1 hold. At every iteration k, the model \( m^k_{i} \) is valid for \( f_{i} \) in \( B_{k} \), for all \( i \in \{ 1,\ldots ,q \} \), that is, there exists a constant \( \kappa _{fm}>0 \) such that

$$\begin{aligned} | f_{i}(x)-m^k_{i}(x)| \le \kappa _{fm}\Delta _{k}^{2} \end{aligned}$$

holds for all \( x \in B_{k} \).

Function \(\omega \), defined by equation (5), can be generalized to models through:

$$\begin{aligned} \omega _{m^k}(x)=-\min _{\Vert d \Vert \le 1}\max _{i=1,\ldots ,q}\nabla m^k_{i}(x)^{\top }d. \end{aligned}$$

The use of Taylor models guarantees that \(\omega _{m^k}(x^{k})= \omega (x^{k})\), at each iteration k, where \(x^k\) represents the point where the model was built. This equality ensures that when the iteration point \(x^{k}\) is Pareto critical or close to criticality for the model, the same applies to the objective function.

To prove convergence, model minimization should provide a sufficient decrease at each iteration. Following [31], for the scalarization step, we quantify the best model reduction obtained along a direction belonging to \(\mathcal {D}(x)\), the set of directions associated with the solution of (5), within the trust-region \( B_{k}\). For \( i\in \{1,\ldots ,q\} \), let \(d^{*}_{k} \in \mathcal {D}(x_{sc}^{i,k})\) and compute \( \alpha _{k} \) by solving

$$\begin{aligned} \min _{ \alpha \ge 0}\{\phi ^k_{m}(x_{sc}^{i,k}+ \alpha d^{*}_{k}):x_{sc}^{i,k}+\alpha d^{*}_{k} \in B_{k}\}. \end{aligned}$$
(6)

The Pareto-Cauchy point is defined as

$$\begin{aligned} x^{C}_{k}=x_{sc}^{i,k}+ d^{C}_{k}, \end{aligned}$$
(7)

where \( d^{C}_{k}:= \alpha _{k} d^{*}_{k}\) and \( B_{k}=B(x_{sc}^{i,k},\Delta _{sc}^{i,k}) \).

Lemma 1.5

Let Assumption 3.1 hold. For \( i \in \{ 1,\ldots ,q \} \) and \( k \in \mathbb {N} \), the Pareto-Cauchy point \( x^{C}_{k}=x_{sc}^{i,k}+ d^{C}_{k} \), defined by (7), satisfies

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x^{C}_{k}) \ge \frac{1}{2} \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k} \right\} . \end{aligned}$$
(8)

Proof

Since \( d^{*}_{k} \in \mathcal {D}(x_{sc}^{i,k}) \), we have \( \Vert d^{*}_{k} \Vert \le 1 \), which implies that, for all \( \alpha \in [ 0, \Delta _{sc}^{i,k} ] \), \( x_{sc}^{i,k}+\alpha d^{*}_{k} \in B_{k} \). Problem (6) has the same solution than

$$\begin{aligned} \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\{\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k}+ \alpha d^{*}_{k})\}. \end{aligned}$$

On the other hand, for all \( \alpha \ge 0 \), we have

$$\begin{aligned}&\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k}+ \alpha d^{*}_{k}) =\max _{j=1,\ldots ,q}m^k_{j}(x_{sc}^{i,k})-\max _{j=1,\ldots ,q}m^k_{j}(x_{sc}^{i,k}+\alpha d^{*}_{k})\\&\quad \ge -\max _{j=1,\ldots ,q} \alpha \nabla {m^k_j}^{\top }(x_{sc}^{i,k})d^{*}_{k}-\max _{j=1,\ldots ,q}\dfrac{1}{2}\alpha ^{2}d^{*^{\top }}_{k} \nabla ^{2} m^k_{j}(x_{sc}^{i,k})d^{*}_{k}. \end{aligned}$$

According to (5), Assumption 3.1, \( \Vert d^{*}_{k} \Vert \le 1 \), and the Cauchy–Schwarz inequality we have

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k}+ \alpha d^{*}_{k}) \ge \alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h}. \end{aligned}$$

Then, it is clear that

$$\begin{aligned} \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\left\{ \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k}+ \alpha d^{*}_{k})\right\} \ge \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\left\{ \alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h}\right\} . \end{aligned}$$

In other words,

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k}+ d^{C}_{k}) \ge \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\left\{ \alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h}\right\} . \end{aligned}$$

Let us consider the concave function g, defined by \(g(\alpha )=\alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h},\) with unconstrained maximizer \(\alpha ^{*}=\dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}} \ge 0\), corresponding to the optimum value \(g(\alpha ^{*})=\dfrac{1}{2}\dfrac{\omega (x_{sc}^{i,k})^{2}}{\kappa _{h}} \ge 0.\) Two cases can occur:

  • If \( 0 \le \alpha ^{*} \le \Delta _{sc}^{i,k}\) then \(\displaystyle \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\left\{ \alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h}\right\} =\dfrac{1}{2}\dfrac{\omega (x_{sc}^{i,k})^{2}}{\kappa _{h}};\)

  • If \( \alpha ^{*} > \Delta _{sc}^{i,k}\) then \(\displaystyle \max _{ 0 \le \alpha \le \Delta _{sc}^{i,k}}\left\{ \alpha \omega (x_{sc}^{i,k}) -\dfrac{1}{2}\alpha ^{2} \kappa _{h}\right\} = \Delta _{sc}^{i,k} \omega (x_{sc}^{i,k}) -\dfrac{1}{2}(\Delta _{sc}^{i,k})^{2} \kappa _{h}.\)

In this last case, since \( \alpha ^{*} =\dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}}> \Delta _{sc}^{i,k}\), we have \(\Delta _{sc}^{i,k} \omega (x_{sc}^{i,k}) -\dfrac{1}{2}(\Delta _{sc}^{i,k})^{2} \kappa _{h} \ge \dfrac{1}{2}\Delta _{sc}^{i,k} \omega (x_{sc}^{i,k}),\) resulting in

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x^{C}_{k})\ge & {} \min \left\{ \dfrac{1}{2}\dfrac{\omega (x_{sc}^{i,k})^{2}}{\kappa _{h}}, \dfrac{1}{2}\Delta _{sc}^{i,k} \omega (x_{sc}^{i,k}) \right\} \\= & {} \frac{1}{2} \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k}\right\} . \end{aligned}$$

\(\square \)

Let \( (x_{sc}^{i,k*},t^{*}) \) be the solution of Problem (4). The following lemma states an important property of \( t^{*} \).

Lemma 1.6

At each iteration k and for each \( i \in \{ 1,\ldots ,q \}\), \(x_{sc}^{i,k}\) is not a Pareto critical point for \( \min _{x \in B_{k}} (m_{1}^k(x),\ldots ,m_{q}^k(x)) \), if and only if \( t^{*} < 0 \).

Proof

Since \((x_{sc}^{i,k},0)\) is feasible for Problem (4), \( t^{*} \le 0 \). It is clear that \(x_{sc}^{i,k}\) is not a Pareto critical point when \( t^{*} < 0 \).

Now, assume that \(x_{sc}^{i,k}\) is not a Pareto critical point. Then, it is not a weakly efficient point, meaning that there exists a point \( x^{'} \in B_{k} \) such that for all \( j \in \{ 1,\ldots ,q \}\), \(m^k_{j}(x^{'}) < m^k_{j}(x_{sc}^{i,k}) \). So,

$$\begin{aligned} m^k_{j}(x^{'}) - m^k_{j}(x_{sc}^{i,k})<0, \;\; \forall j\in \{ 1,\ldots ,q \}. \end{aligned}$$

Considering \( t^{'}=\max _{j=1,\ldots ,q}(m^k_{j}(x^{'}) - m^k_{j}(x_{sc}^{i,k})) \), we have

$$\begin{aligned} m^k_{j}(x^{'}) - m^k_{j}(x_{sc}^{i,k}) \le t^{'} < 0, \;\; \forall j \in \{ 1,\ldots ,q \}. \end{aligned}$$

Hence, \( t^{*} \) should be strictly negative because \( (x^{'}, t^{'} )\) is feasible for Problem (4). \(\square \)

Lemma 1.7

Let Assumption 3.1 hold. At each iteration k and for each \( i \in \{ 1,\ldots ,q \} \), there exists \( j \in \mathbb {N} \) such that

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge \left( \frac{1}{2}\right) ^{j} \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k} \right\} , \end{aligned}$$

where \( (x_{sc}^{i,k*},t^{*}) \) is the solution of Problem (4).

Proof

Two different cases need to be analyzed. Assume that \(x_{sc}^{i,k}\) is not Pareto critical. According to Lemma 3.6, \( t^{*} \) is strictly negative. So, for each \( l \in \{ 1,\ldots ,q \} \), we have

$$\begin{aligned} m^k_{l}(x_{sc}^{i,k})-m^k_{l}(x_{sc}^{i,k*}) \ge -t^{*} > 0. \end{aligned}$$

By considering \(\phi ^k_{m}(x)=\max _{l=1,\ldots ,q}m^k_{l}(x)\), it results

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-m^k_{l}(x_{sc}^{i,k*}) \ge m^k_{l}(x_{sc}^{i,k})-m^k_{l}(x_{sc}^{i,k*})\text {, for all }l \in \{ 1,\ldots ,q \}. \end{aligned}$$

Let j be the index such that \(\phi ^k_{m}(x_{sc}^{i,k*})=m^k_{j}(x_{sc}^{i,k*})\). Then

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge m^k_{j}(x_{sc}^{i,k})-m^k_{j}(x_{sc}^{i,k*}). \end{aligned}$$

Hence,

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge -t^{*} > 0. \end{aligned}$$
(9)

Since \(\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x^{C}_{k})\ge 0\), there must exist \( j \in \mathbb {N} \) such that

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge \left( \frac{1}{2}\right) ^{j-1} (\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x^{C}_{k})). \end{aligned}$$

Hence, considering (8), it implies

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge \left( \frac{1}{2}\right) ^{j} \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k} \right\} . \end{aligned}$$

If \(x_{sc}^{i,k}\) is Pareto critical, then \( \omega (x_{sc}^{i,k})=0 \). So, the right side of this inequality is equal to zero and, since \(\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k*}) \ge 0\), the inequality holds. \(\square \)

The last three lemmas motivate us to consider the following assumption, stating that, at each scalarization step, a sufficient reduction in the model is ensured.

Assumption 1.2

There is a constant \(\kappa _{\phi } \in (0,1)\) such that at each iteration k, where the scalarization step is performed, for all \( i \in \{ 1,\ldots ,q \} \), we have

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k })-\phi ^k_{m}(x_{sc}^{i,k *}) \ge \kappa _{\phi }\, \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k} \right\} . \end{aligned}$$
(10)

As long as \( x_{sc}^{i,k } \) is not Pareto critical, the left side of (10) is strictly positive.

The following lemma provides an error bound for \( \phi ^k_{m}\) as approximation of \(\phi \) and is a keystone to establish convergence.

Lemma 1.8

Let Assumption 3.1 hold. At every iteration k, the model \( \phi ^k_{m} \) is valid for \( \phi \) at \(x_{sc}^{i,k *}\), for all \( i \in \{ 1,\ldots ,q \} \), that is

$$\begin{aligned} \left| \phi (x_{sc}^{i,k *})-\phi ^k_{m}(x_{sc}^{i,k *})\right| \le \kappa _{fm}(\Delta _{sc}^{i,k})^{2}, \end{aligned}$$

where \( \kappa _{fm} \) is defined in Lemma 3.4.

Proof

Two situations should be analyzed. Assume that \( \phi (x_{sc}^{i,k *}) \ge \phi ^k_{m}(x_{sc}^{i,k *})\). Consider \( j \in \{ 1,\ldots ,q \} \) such that \( \phi (x_{sc}^{i,k *})=f_{j}(x_{sc}^{i,k *})\). Using Lemma 3.4 and the fact that \( \phi ^k_{m}(x_{sc}^{i,k *})\ge m^k_{j}(x_{sc}^{i,k *}) \), we have

$$\begin{aligned} \left| \phi (x_{sc}^{i,k *})-\phi ^k_{m}(x_{sc}^{i,k *})\right|= & {} f_{j}(x_{sc}^{i,k *})-\phi ^k_{m}(x_{sc}^{i,k *}) \le f_{j}(x_{sc}^{i,k *})-m^k_{j}(x_{sc}^{i,k *}) \\\le & {} \kappa _{fm}(\Delta _{sc}^{i,k})^{2}. \end{aligned}$$

Assume now that \( \phi (x_{sc}^{i,k *}) < \phi ^k_{m}(x_{sc}^{i,k *}) \). Consider \( j \in \{ 1,\ldots ,q \} \) such that \( \phi ^k_{m}(x_{sc}^{i,k *})=m^k_{j}(x_{sc}^{i,k *}) \). Again, Lemma 3.4 and the fact that \( \phi (x_{sc}^{i,k *})\ge f_{j}(x_{sc}^{i,k *}) \) allow us to conclude that

$$\begin{aligned} \left| \phi (x_{sc}^{i,k *})-\phi ^k_{m}(x_{sc}^{i,k *})\right|= & {} m^k_{j}(x_{sc}^{i,k *})-\phi (x_{sc}^{i,k *}) \\\le & {} m^k_{j}(x_{sc}^{i,k *})-f_{j}(x_{sc}^{i,k *})\le \kappa _{fm}(\Delta _{sc}^{i,k})^{2}. \end{aligned}$$

\(\square \)

In the remaining of the analysis, we classify the successful scalarization step iterations. The set of indexes of successful scalarization step iterations is denoted by

$$\begin{aligned} S=\left\{ (k,i),\, k\in \mathbb {N}, i \in \{1,\ldots ,q\}: k \text { is a scalarization step iteration and } \rho _{sc}^{i,k} \ge \eta _{sc}^{1} \right\} \end{aligned}$$

and the set of indexes of very successful scalarization step iterations corresponds to

$$\begin{aligned} V=\left\{ (k,i),\, k\in \mathbb {N}, i \in \{1,\ldots ,q\}: k \text { is a scalarization step iteration and } \rho _{sc}^{i,k} \ge \eta _{sc}^{2} \right\} . \end{aligned}$$

Two different scenarios may occur for each linked sequence of points \( \{x^{k}\}_{k\in K}\) generated by MOTR:

  1. 1.

    There exists \( i \in \{ 1,\ldots ,q \} \), such that for each \( k \in \mathbb {N}\),

    $$\begin{aligned} \Delta _{ep}^{i,k}>0. \end{aligned}$$
  2. 2.

    For each \( i \in \{ 1,\ldots ,q \} \), there exists \( k_{i} \in \mathbb {N} \), such that for all \( k > k_{i}\),

    $$\begin{aligned} \Delta _{ep}^{i,k}=0. \end{aligned}$$

Remark 3.1

In the first scenario, the linked sequence, updated at the extreme point step, matches the set of iterates generated by a single objective trust-region method, when applied to the objective function component \(f_{i}\). Stationarity is then guaranteed for \(f_{i}\) and the corresponding limit point is a Pareto critical point. The proof is similar to the single objective trust-region case (see [5, 18, 24]).

Remark 3.2

In the second scenario, define \(k_{ep}=\max \{ k_{i}\,|\,i=1,\ldots ,q \}\). For \( k > k_{ep}\), it holds \(\Delta _{ep}^{i,k}=0, \; \forall i=1,\ldots ,q\). Therefore, for \( k > k_{ep}\), all points of the linked sequence have been generated in the scalarization step. The remaining of the analysis focuses on this situation.

The following two lemmas clarify the behavior of MOTR when the current point is not Pareto critical.

Lemma 1.9

Let Assumptions 3.1 and 3.2 hold. Suppose that at the scalarization step iteration k, for \( i\in \{1,\ldots ,q\}\), \( x_{sc}^{i,k} \) is not a Pareto critical point and

$$\begin{aligned} \Delta _{sc}^{i,k} \le \dfrac{\kappa _{\phi } \omega (x_{sc}^{i,k}) (1-\eta _{sc}^{2})}{\kappa _{v}}, \end{aligned}$$
(11)

with \( \kappa _{v}=\max \{\kappa _{fm},\kappa _{h}\} \). Then the pair (ki) corresponds to a very successful scalarization step iteration, and \(\Delta _{sc}^{i,k*} > \Delta _{sc}^{i,k}\).

Proof

According to Lemma 3.3, \( \omega (x_{sc}^{i,k})>0 \), because \( x_{sc}^{i,k} \) is not a Pareto critical point. On the other hand, \( \kappa _{\phi }, \eta _{sc}^{2} \in (0,1) \). Thus, \( \kappa _{\phi }(1-\eta _{sc}^{2}) \in (0,1) \) and

$$\begin{aligned} \Delta _{sc}^{i,k} \le \dfrac{\kappa _{\phi } \omega (x_{sc}^{i,k}) (1-\eta _{sc}^{2})}{\kappa _{v}}<\dfrac{ \omega (x_{sc}^{i,k}) }{\kappa _{v}}. \end{aligned}$$

From Assumption 3.2, we have

$$\begin{aligned} \phi ^k_{m}(x_{sc}^{i,k })-\phi ^k_{m}(x_{sc}^{i,k *}) \ge \kappa _{\phi } \omega (x_{sc}^{i,k}) \min \left\{ \dfrac{\omega (x_{sc}^{i,k})}{\kappa _{h}},\Delta _{sc}^{i,k} \right\} = \kappa _{\phi }\omega (x_{sc}^{i,k})\Delta _{sc}^{i,k}. \end{aligned}$$

Considering this inequality, the equation \( \rho _{sc}^{i,k}=\frac{\phi (x_{sc}^{i,k})-\phi (x_{sc}^{i,k *})}{\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k *})} \), Lemma 3.8, and (11), we have

$$\begin{aligned} |\rho _{sc}^{i,k}-1|=\left| \dfrac{\phi ^k_{m}(x_{sc}^{i,k *}) - \phi (x_{sc}^{i,k *})}{\phi ^k_{m}(x_{sc}^{i,k})-\phi ^k_{m}(x_{sc}^{i,k *})}\right| \le \dfrac{\kappa _{fm}}{\kappa _{\phi }\omega (x_{sc}^{i,k})}\Delta _{sc}^{i,k}\le \dfrac{\kappa _{v}}{\kappa _{\phi }\omega (x_{sc}^{i,k})}\Delta _{sc}^{i,k}\le (1-\eta _{sc}^{2}). \end{aligned}$$

Consequently, \( \rho _{sc}^{i,k} \ge \eta _{sc}^{2} \). So, the pair (ki) corresponds to a very successful scalarization step iteration and \(\Delta _{sc}^{i,k*} > \Delta _{sc}^{i,k}\). \(\square \)

The following lemma states that, as long as \(x_{sc}^{i,k}\) is not Pareto critical, the scalarization step trust-region radius can not be too small. In fact, it should be lower bounded by a strictly positive constant.

Lemma 1.10

Let Assumptions 3.1 and 3.2 hold and consider the constant \(\sigma >0 \). If \( \omega (x_{sc}^{i,k}) \ge \sigma \) holds for the pair (ki), with k a scalarization step iteration and \(i\in \{1,\ldots ,q\}\), then there is a constant \(\Delta >0\), depending on \(\sigma \), such that \( \Delta _{sc}^{i,k} \ge \Delta \).

Proof

Assume, as mean of a contradiction, that for each \( \Delta >0 \) there is a pair (ki), with k a scalarization step iteration and \(i\in \{1,\ldots ,q\}\), satisfying \( \omega (x_{sc}^{i,k}) \ge \sigma >0\), such that

$$\begin{aligned} \Delta _{sc}^{i,k}<\Delta . \end{aligned}$$

In particular, consider

$$\begin{aligned} \Delta =\dfrac{\mu _{1} \sigma \kappa _{\phi } (1-\eta _{sc}^{2})}{\kappa _{v}}, \end{aligned}$$

with \( \kappa _{v}=\max \{\kappa _{fm},\kappa _{h}\} \). Let \((\overline{k},i)\) be the first pair such that \( \omega (x_{sc}^{i,\overline{k}}) \ge \sigma >0\) and

$$\begin{aligned} \Delta _{sc}^{i,\overline{k}^*} < \dfrac{\mu _{1} \sigma \kappa _{\phi } (1-\eta _{sc}^{2})}{\kappa _{v}}. \end{aligned}$$

Then, it holds \(\Delta _{sc}^{i,\overline{k}^*}<\Delta _{sc}^{i,\overline{k}}\). Thus,

$$\begin{aligned} \Delta _{sc}^{i,\overline{k}} =\dfrac{\Delta _{sc}^{i,\overline{k}^*}}{\mu _{1}} <\dfrac{ \sigma \kappa _{\phi } (1-\eta _{sc}^{2})}{\kappa _{v}} \le \dfrac{ \omega (x_{sc}^{i,\overline{k}}) \kappa _{\phi } (1-\eta _{sc}^{2})}{\kappa _{v}}. \end{aligned}$$

Since point \(x_{sc}^{i,\overline{k}}\) is not Pareto critical, according to Lemma 3.9, the pair \((\overline{k},i)\) corresponds to a very successful scalarization step iteration, and \(\Delta _{sc}^{i,\overline{k}^*} > \Delta _{sc}^{i,\overline{k}}\). This contradicts \(\Delta _{sc}^{i,\overline{k}^*}<\Delta _{sc}^{i,\overline{k}}\) and the initial assumption. \(\square \)

Considering Remarks 3.1 and 3.2, the following lemma states the first convergence result for linked sequences of MOTR, generated at the scalarization step.

Lemma 1.11

Suppose that Assumptions 3.1 and 3.2 hold. Let \( \{x^{k}\}_{k\in K} \) be a linked sequence generated by MOTR at the scalarization step, with finitely many successful iterations at the scalarization step. Then this linked sequence converges to a Pareto critical point.

Proof

Assume that \(x^{k_{0}+l}=x^{*}\) for all \( l \in \mathbb {N} \), where \( k_{0}=\max \{ k_{s},k_{ep}\} \), \( k_{s} \) is the index of the last successful scalarization step iteration and \( k_{ep} \) is defined as in Remark 3.2. So, \( \Delta _{sc}^{k_{0}+l} \) converges to zero, because in the scalarization step all iterations are unsuccessful, for sufficiently large l.

Suppose that \( x^{*} \) is not a Pareto critical point. According to Lemma 3.9, there must be a very successful scalarization step iteration, with index larger than \( k_{0} \), which is a contradiction because all iterations after \(k_{0}\) are unsuccessful. Therefore, \( x^{*} \) is a Pareto critical point. \(\square \)

The following lemma clarifies the behavior of MOTR, when a linked sequence has an infinite number of distinct points generated at the scalarization step.

Lemma 1.12

Suppose that Assumptions 3.1 and 3.2 hold. Let \(\{x^{k}\}_{k\in K}\) be a linked sequence of points generated by MOTR, with infinitely many successful iterations at the scalarization step. Then

$$\begin{aligned} \liminf _{k\rightarrow +\infty ;\, k\in K} \omega (x^{k})=0. \end{aligned}$$

Proof

Suppose that \(\displaystyle \liminf _{k\rightarrow +\infty \,;\, k\in K} \omega (x^{k}) \ne 0\). So, there must exist a constant \( \epsilon >0 \) such that for all \( k \in K \),

$$\begin{aligned} \omega (x^{k}) \ge \epsilon . \end{aligned}$$

Lemma 3.10 guarantees the existence of \( \Delta >0\) such that \( \Delta _{sc}^{k} \ge \Delta \), for all \(k \in K\).

Consider S, the set of indexes of successful scalarization step iterations, \( k \in S\cap K \), and \( k > k_{eps} \), where \( k_{eps}\) is the index of the first successful scalarization step iteration after \( k_{ep} \), and \( k_{ep} \) is defined as in Remark 3.2. Thus, \( \rho _{sc}^{k} \ge \eta _{sc}^{1}\). According to Assumption 3.2, we have

$$\begin{aligned} \phi (x^{k})-\phi (x^{k+1})&\ge \eta _{sc}^{1}(\phi ^k_{m}(x^{k})-\phi ^k_{m}(x^{k+1}) )\\&\ge \eta _{sc}^{1} \kappa _{\phi } \omega (x^{k}) \min \left\{ \dfrac{\omega (x^{k})}{\kappa _{h}} ,\Delta _{sc}^{k} \right\} \ge \eta _{sc}^{1} \kappa _{\phi } \epsilon \min \left\{ \dfrac{\epsilon }{\kappa _{h}} ,\Delta \right\} . \end{aligned}$$

Summing over all successful iterations at the scalarization step, from \( k_{eps} \) to k results in

$$\begin{aligned} \phi (x^{k_{eps}})-\phi (x^{k+1})&=\sum _{i=k_{eps}, i\in S\cap K}^{k} \phi (x^{i})-\phi (x^{i+1}) \\&\ge \sigma _{k} \, \eta _{sc}^{1} \, \kappa _{\phi } \, \epsilon \, \min \left\{ \dfrac{\epsilon }{\kappa _{h}} ,\Delta \right\} , \end{aligned}$$

where \( \sigma _{k} \) represents the number of successful scalarization step iterations in the linked sequence from \( k_{eps} \) to k. It is clear that \(\displaystyle \lim _{k\rightarrow +\infty \,;\, k\in K}\sigma _{k}=+\infty \), because there are infinitely many such iterations. Thus, \( \phi (x^{k_{ep}})-\phi (x^{k+1}) \) is unbounded. So, \( \phi (x) \) can not be bounded from below and this is a contradiction. Therefore, \(\displaystyle \liminf _{k\rightarrow +\infty \,;\, k\in K} \omega (x^{k}) = 0\). \(\square \)

We are now ready to prove the main result, for linked sequences generated by MOTR.

Theorem 1.1

Let Assumptions 3.1 and 3.2 hold. For every linked sequence of points \( \{x^{k}\}_{k\in K} \) generated by MOTR, we have

$$\begin{aligned} \lim _{k\rightarrow +\infty ;\, k\in K} \omega (x^{k})=0. \end{aligned}$$

Proof

Let \( \{x^{k}\}_{k\in K} \) be a linked sequence generated by MOTR. If there exists \( i \in \{ 1,\ldots ,q \} \), such that for each \( k \in \mathbb {N},\Delta _{ep}^{i,k}>0\) or if for each \( i \in \{ 1,\ldots ,q \} \), there exists \( k_{i} \in \mathbb {N} \), such that for all \( k > k_i,\Delta _{ep}^{i,k}=0\), but there are only finitely many successful iterations at the scalarization step, Remarks 3.13.2, and Lemma 3.11, guarantee the convergence of \( \{x^{k}\}_{k\in K} \) to a Pareto critical point, which implies \(\displaystyle \lim _{k\rightarrow +\infty \,;\, k\in K}\omega (x^{k})=0\), by Lemma 3.3.

Now, assume that for all \( k > k_i,\Delta _{ep}^{i,k}=0\) and there is an infinite number of successful scalarization step iterations. Suppose that there exists a subsequence of successful scalarization step iterations, indexed by \( \{t_{j}>k_{ep}\} \subset S\cap K\), such that

$$\begin{aligned} \omega (x^{t_{j}}) \ge 2 \epsilon >0, \end{aligned}$$
(12)

for some \( \epsilon >0 \) and for all j, where \( k_{ep} \) is defined as in Remark 3.2.

From Lemma 3.12, for each \( t_{j} \), there exists a first successful scalarization step iteration \( l_{j}>t_{j} \) such that \(\omega (x^{l_{j}}) < \epsilon \). Thus, there exists another subsequence of \( S\cap K \), indexed by \( \{l_{j}\}\), such that

$$\begin{aligned} \omega (x^{k}) \ge \epsilon \text { for } t_{j} \le k<l_{j} \text { and } \omega (x^{l_{j}}) < \epsilon . \end{aligned}$$
(13)

Consider the successful iterates whose indexes are in

$$\begin{aligned} \mathcal {K}=\{ k \in S\cap K \,|\, \exists j \in \mathbb {N}: t_{j} \le k <l_{j} \}, \end{aligned}$$

where \( t_{j} \) and \( l_{j} \) belong to the two subsequences defined above.

Assumption 3.2, the fact that \( \mathcal {K} \subset S\cap K \), and inequalities (13) guarantee that, for \( k \in \mathcal {K} \),

$$\begin{aligned} \begin{aligned} \phi (x^{k})-\phi (x^{k+1})&\ge \eta _{sc}^{1}(\phi ^k_{m}(x^{k})-\phi ^k_{m}(x^{k+1}) )\\&\ge \eta _{sc}^{1} \kappa _{\phi } \omega (x^{k}) \min \left\{ \dfrac{\omega (x^{k})}{\kappa _{h}},\Delta _{sc}^{k} \right\} \\&\ge \eta _{sc}^{1} \kappa _{\phi } \epsilon \min \left\{ \dfrac{\epsilon }{\kappa _{h}},\Delta _{sc}^{k} \right\} . \end{aligned} \end{aligned}$$
(14)

The sequence \( \{\phi (x^{k})\}_{k\in K}\) is convergent, since it is monotonically decreasing and bounded from below. Thereby, \(\displaystyle \lim _{k\rightarrow +\infty \,;\,k\in K} \left( \phi (x^{k})-\phi (x^{k+1})\right) =0\). Consequently, considering the minimum part in the last term of (14), for \( k \in \mathcal {K} \) sufficiently large, it implies that

$$\begin{aligned} \Delta _{sc}^{k} \le \dfrac{1}{\eta _{sc}^{1} \kappa _{\phi } \epsilon }(\phi (x^{k})-\phi (x^{k+1})). \end{aligned}$$

Therefore, for j sufficiently large, it holds that

$$\begin{aligned} \begin{aligned} \Vert x^{t_{j}}- x^{l_{j}}\Vert&\le \sum _{i=t_{j}, i\in \mathcal {K}}^{l_{j}-1} \Vert x^{i}- x^{i+1}\Vert \\&\le \sum _{i=t_{j}, i\in \mathcal {K}}^{l_{j}-1}\Delta _{sc}^{i} \\&\le \dfrac{1}{\eta _{sc}^{1} \kappa _{\phi } \epsilon }(\phi (x^{t_{j}})-\phi (x^{l_{j}})). \end{aligned} \end{aligned}$$

Again, the convergence of \( \{\phi (x^{k})\}_{k\in K} \) implies

$$\begin{aligned} \lim _{j\rightarrow +\infty }\Vert x^{t_{j}}- x^{l_{j}}\Vert =0. \end{aligned}$$

Assumption 3.1 and the uniform continuity of \( \omega \) allow us to conclude that

$$\begin{aligned} \lim _{j\rightarrow +\infty } | \omega (x^{t_{j}})- \omega (x^{l_{j}}) |=0, \end{aligned}$$

contradicting the fact that \( | \omega (x^{t_{j}})- \omega (x^{l_{j}})| \ge \epsilon \), a consequence of the definition of sequences \( \{t_{j}\} \) and \( \{l_{j}\} \), in (12) and (13).

So, no subsequence of successful iterations satisfying (12) can exist and \(\displaystyle \lim _{k\rightarrow +\infty \,;\,k\in K} \omega (x^{k})=0\). \(\square \)

4 Numerical results

The numerical experiments were conducted with two main goals. The first was to illustrate the importance of each of the key algorithmic features of MOTR, namely the extreme point step, the scalarization step, and the middle point strategy, corresponding to Algorithms 24, and 3, respectively. Even if not required for establishing convergence, the middle point strategy is relevant for the numerical performance of the algorithm.

With this purpose, three different versions of MOTR were implemented, each omitting one of the above mentioned strategies:

  • MOTRep: MOTR without the extreme point step;

  • MOTRsc: MOTR without the scalarization step;

  • MOTRmiddle: MOTR using a different strategy than the one described in Algorithm 3 to select the point where to solve the scalarization problem.

In MOTRmiddle, points are sorted according to the value of the objective function component under analysis and an average gap is computed for each point by considering the distances to the two closest points to it, for the objective function component under analysis. The average gaps of the initial and final points of the sorted set of points are equal to the distance of these points to the next or previous one, respectively. The point with the largest average gap, holding a scalarization step trust-region radius larger than or equal to the minimum value allowed for it, is the one selected.

A second goal of the numerical section was to compare the performance of MOTR against other derivative-based multiobjective optimization solvers that intrinsically attempt to generate approximations to the complete Pareto front of a multiobjective optimization problem. With this purpose, MOSQP [15] was selected.

All codes were implemented in MATLAB (version R2021b, was considered). The minimization subproblems of MOTR, defined at the extreme point and scalarization steps, were solved with the MATLAB function fmincon.m.

MOTR was described and analyzed for unconstrained optimization. However, the algorithmic description can be easily adapted to incorporate bound constraints, by adding these constraints to the subproblems to be solved. Notice that for convex feasible regions, like is the case of bound constraints, the middle point, computed using Algorithm 3, will remain feasible. Thus, as test set, we considered 54 twice continuously differentiable bound constrained multiobjective optimization problems, available at:

https://docentes.fct.unl.pt/algb/pages/problems-collections,

with a number of variables between 1 and 30, and involving 2 or 3 objective function components. A complete list of the problem dimensions can be found in Table 1.

Table 1 The set of problems considered in the numerical experiments

MOTR was run with the parameters \(\mu _{1}= 0.5\), \(\mu _{2}=2\), \(\eta _{ep}^{1}=\eta _{sc}^{1}=0.001\), \(\eta _{ep}^{2}=\eta _{sc}^{2}=0.9 \), \(\Delta _{ep}^{init}=(1,\ldots ,1)^{\top } \in \mathbb {R}^{q}\), and \(\Delta _{sc}^{init}=1\). Regarding the update of the trust-region radius at very successful iterations, this was only increased if it limited the progress of the algorithm, meaning that the boundary of the trust-region was reached. In this case, a maximum value of \(\Delta ^{max}=\Vert u-l \Vert /2 \) is allowed, where u and l represent the upper and lower bounds of the problem variables. The algorithm was always initialized with a singleton, namely the centroid of the feasible region. As stopping criteria, minimum trust-region radius \(\Delta _{ep}^{min}=\Delta _{sc}^{min}=10^{-5}\) were allowed, in the case of \( \Delta _{ep} \) componentwise. Three different budgets were considered in terms of function evaluations, namely 500, 5000, and 20000. For each problem, the approximation to the Pareto front generated by each algorithm corresponds to all the current nondominated points, stored in the list L.

4.1 Performance assessment and metrics

As performance tool, we considered the performance profiles proposed by Dolan and Moré [10], which allow to simultaneously assess the numerical performance of different solvers, for different metrics. The performance of solver \( s\in S \) on a given set of problems P is represented by a cumulative function

$$\begin{aligned} \rho _{s}(\tau )=\frac{1}{|P|}\left| \{ p \in P: r_{p,s} \le \tau \}\right| , \end{aligned}$$

where \(\tau \ge 1\) and the performance ratio is defined by

$$\begin{aligned} r_{p,s}=\frac{t_{p,s}}{\min \{ t_{p,s}:\, s \in S \}}. \end{aligned}$$

Here \(t_{p,s}\) represents the value of the selected metric, obtained by solver \(s \in S\) when solving problem \(p \in P\). Larger values of \(\rho _{s}(\tau )\) indicate a better numerical performance of solver s. In particular, the solver with the largest value of \(\rho _{s}(1)\) is the most efficient. On the other hand, the solver with the largest value of \(\rho _{s}(\tau )\) for large values of \( \tau \) is the most robust.

Selecting a single metric to compare the performance of multiobjective optimization solvers is always reductive, since advantages and disadvantages can be pointed to each of these indicators. Considering that a good multiobjective optimization solver should be able to generate a large percentage of nondominated points and should also be able to capture the extent of the Pareto front of the multiobjective optimization problem, we decided to consider four metrics that attempt to quantify these features, namely purity, hypervolume and the spread metrics \(\Gamma \) and \(\Delta \).

Purity measures the percentage of nondominated points generated by a given solver

$$\begin{aligned} \bar{t}_{p,s}={Pur}_{p,s}=\frac{|F_{p,s} \cap F_{p}|}{|F_{p,s}|}, \end{aligned}$$

where \( F_{p,s} \) represents the approximation to the Pareto front of problem p computed by solver s and \( F_{p} \) is a reference Pareto front for problem p, computed by considering the union of the Pareto approximations corresponding to all solvers, \( \cup _{s \in S} F_{p,s} \), and discarding from it all the dominated points [6].

Hypervolume [32], additionally to nondominance, attempts to capture spread, by measuring the volume of the region dominated by the current approximation to the Pareto front and a reference point \(U_p \in \mathbb {R}^q\), that is dominated by all points belonging to the different approximations computed for the Pareto front of problem \(p\in P\) by all solvers tested. Mathematically, it can be formalized as:

$$\begin{aligned} \bar{t}_{p,s}=HV_{p,s} = Vol\{y \in \mathbb {R}^q\,| \, y \le U_p \wedge \exists x \in F_{p,s}: x \le y\} = Vol \left( \bigcup _{x \in F_{p,s}} [x, U_p]\right) , \end{aligned}$$

where Vol(.) denotes the Lebesgue measure of a q-dimensional set of points and \([x, U_p]\) denotes the interval box with lower corner x and upper corner \(U_p\).

For computing the performance profiles, for purity and hypervolume metrics, since larger values indicate a better performance, the inverse value of each one of the metrics was used (\(t_{p,s}=1/ \bar{t}_{p,s}\)).

Finally, to directly assess spread across the Pareto front, two additionally metrics were considered: the \( \Gamma \) metric, that measures the size of the largest gap in the computed approximation to the Pareto front, and the \( \Delta \) metric, that assesses how uniformly the nondominated points are distributed along the generated approximation. In a simplified way, consider that solver \(s \in S\) has computed, for problem \(p \in P\), an approximated Pareto front with points \(y_1,y_2,\ldots ,y_N\), to which we add the so-called extreme points, \(y_0\) and \(y_{N+1}\), corresponding to the points with the best and worst values for each objective function component. Then

$$\begin{aligned} \Gamma _{p,s} \; = \; \max _{j \in \{1,\dots ,q\}}\left( \max _{i\in \{0,\dots ,N\}}\{\delta _{j,i}\}\right) , \end{aligned}$$
(15)

where \(\delta _{j,i}=f_{j}(y_{i+1})-f_{j}(y_i)\), assuming that the objective function values have been sorted by increasing order for each objective function component j. Metric \(\Delta \) [9] is computed by:

$$\begin{aligned} \displaystyle \Delta _{p,s} \; = \; \max _{j\in \{1,\dots ,q\}}\left( \frac{\delta _{j,0}+\delta _{j,N}+\sum _{i=1}^{N-1}|\delta _{j,i}- \bar{\delta }_j|}{\delta _{j,0}+\delta _{j,N}+(N-1)\bar{\delta }_j}\right) , \end{aligned}$$
(16)

where \(\bar{\delta }_j\), for \(j=1,\ldots ,q\), represents the average of the distances \(\delta _{j,i}\), \(i=1,\dots ,N-1\).

4.2 Adequacy of the algorithmic structure of MOTR

Performance profiles comparing MOTR and MOTRep, a version of MOTR without performing the extreme point step, can be found in Figs. 1 and 2.

Fig. 1
figure 1

Comparing MOTR and MOTRep based on performance profiles of purity and hypervolume metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Fig. 2
figure 2

Comparing MOTR and MOTRep based on performance profiles of \(\Gamma \) and \(\Delta \) metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Figures 3 and 4 report the comparison between MOTR and MOTRsc, the latter corresponding to a version of MOTR omitting the scalarization step.

Fig. 3
figure 3

Comparing MOTR and MOTRsc based on performance profiles of purity and hypervolume metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Fig. 4
figure 4

Comparing MOTR and MOTRsc based on performance profiles of \(\Gamma \) and \(\Delta \) metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Finally, MOTR numerical performance is assessed against MOTRmiddle, where the use of Algorithm 3 is replaced by the strategy described in Sect. 4 to compute the point where to solve the scalarization problems. Figures 5 and 6 report the obtained results.

Fig. 5
figure 5

Comparing MOTR and MOTRmiddle based on performance profiles of purity and hypervolume metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Fig. 6
figure 6

Comparing MOTR and MOTRmiddle based on performance profiles of \(\Gamma \) and \(\Delta \) metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

It is clear the advantage of MOTR over each one of its variants, both in terms of efficiency and robustness, for each one of the metrics considered, independently of the budget of function evaluations allowed. The exception appears in the results for the \(\Delta \) metric, when comparing MOTR with MOTRsc or MOTRmiddle, where MOTR continues to present a better performance in terms of efficiency, but the results are comparable for robustness.

Although, the clear differences for the remaining metrics and for each one of the variants tested, allow us to conclude that indeed all the three strategies incorporated in the algorithmic structure of MOTR are essential for the good numerical performance of the solver.

4.3 Comparing MOTR with MOSQP

MOSQP was proposed in [15], incorporating in its algorithmic structure strategies to compute approximations to the complete Pareto front of a given multiobjective optimization problem. The solver also keeps a list of points that is updated at each iteration by solving single-objective constrained optimization problems derived as SQP problems. In [15], the authors compared the solver against a classical scalarization approach for biobjective problems and also genetic algorithms. The numerical results reported establish the superiority of MOSQP over the remaining solvers tested.

A MATLAB implementation of MOSQP is distributed by the authors, providing different algorithmic choices. We selected MOSQP \((H = (I,\nabla ^{2}f), line)\), which corresponds to a line initialization strategy, by computing 200 initial points evenly spaced in the line segment joining the lower and upper bounds of the variables, and where the identity matrix and the true Hessians are used in the second and third algorithmic stages, respectively [15]. This version is reported in [15] as the one that presents the best computational performance. In regard to stopping criteria, we kept all the default values, but tried the three different budgets of function evaluations. Again, for each problem the approximation to the Pareto front generated by each one of the solvers corresponds to all current nondominated points, stored in the corresponding lists.

Figures 7 and 8 report the performance profiles obtained for the two solvers for purity, hypervolume, and the spread metrics.

Fig. 7
figure 7

Comparing MOTR and MOSQP based on performance profiles of purity and hypervolume metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

Fig. 8
figure 8

Comparing MOTR and MOSQP based on performance profiles of \(\Gamma \) and \(\Delta \) metrics. Budgets of 500, 5000, and 20000 function evaluations were allowed

MOTR is clearly competitive, with remarkably good results in terms of efficiency for purity, hypervolume, and \(\Gamma \). Regarding the uniformity of the distribution of points across the approximation to the Pareto front, MOSQP presents a better performance. These conclusions hold, independently of the budget of function evaluations considered.

Table 2 reports the number of nondominated points obtained by each solver, for a maximum budget of 5000 function evaluations, corroborating the results already reported for the purity metric.

Table 2 Number of nondominated points in the final approximation of the Pareto front, generated for each problem by MOTR and MOSQP, considering a budget of 5000 function evaluations

Figures 9 and 10 illustrate the final approximations to the Pareto fronts obtained by MOTR and MOSQP on two biobjective and two triobjective problems, respectively.

Fig. 9
figure 9

Approximations to the Pareto fronts of problems ZDT2 and MOP1, obtained by solvers MOTR and MOSQP, for a budget of 5000 function evaluations

Fig. 10
figure 10

Approximations to the Pareto fronts of problems ZLT1 and IKK1, obtained by solvers MOTR and MOSQP, for a budget of 5000 function evaluations

5 Conclusions

In this work, we proposed a new algorithm, based on a trust-region approach, to compute approximations to the complete Pareto front of multiobjective optimization problems. The algorithmic structure is organized in two main steps: the extreme point and the scalarization steps, that are alternately performed. As previously mentioned, in the extreme point step, the algorithm tries to reach the extreme points of the Pareto front. On the other hand, in the scalarization step, the focus is on the large gaps in the Pareto front, attempting to reduce them. With this purpose, a new strategy, based on the computation of middle points, was used to select the points to be explored in the scalarization step, by solving adequate scalarization problems.

Convergence was analyzed for linked sequences of points generated by MOTR, establishing that any limit point of a linked sequence is a Pareto critical point. As illustrated in the numerical experiments, the existence of each one of these steps is essential for the good numerical performance of the solver, which is quite competitive against MOSQP, a reference solver for multiobjective optimization, when the goal is to compute approximations to complete Pareto fronts.