Inverse multiobjective optimization: Inferring decision criteria from data

It is a very challenging task to identify the objectives on which a certain decision was based, in particular if several, potentially conflicting criteria are equally important and a continuous set of optimal compromise decisions exists. This task can be understood as the inverse problem of multiobjective optimization, where the goal is to find the objective vector of a given Pareto set. To this end, we present a method to construct the objective vector of a multiobjective optimization problem (MOP) such that the Pareto critical set contains a given set of data points or decision vectors. The key idea is to consider the objective vector in the multiobjective KKT conditions as variable and then search for the objectives that minimize the Euclidean norm of the resulting system of equations. By expressing the objectives in a finite-dimensional basis, we transform this problem into a homogeneous, linear system of equations that can be solved efficiently. There are many important potential applications of this approach. Besides the identification of objectives (both from clean and noisy data), the method can be used for the construction of surrogate models for expensive MOPs, which yields significant speed-ups. Both applications are illustrated using several examples.


Introduction
When applying optimization to real-world problems, there are often multiple quantities that have to be optimized at the same time.In production for example, typical goals are the maximization of the quality of a product and the minimization of the production cost.When the objectives are conflicting, there cannot be a single solution that is optimal for all objectives at the same time.This is called a multiobjective optimization problem (MOP).To solve a problem like this, we search for the set of all optimal compromises, the so-called Pareto set, containing all Pareto optimal points.A point x * is called Pareto optimal if there exists no other point that is at least as good as x * in all objectives, but strictly better than x * in at least one objective.
While most of the research in multiobjective optimization is concerned with efficiently computing the Pareto set of a given MOP, we here address the inverse problem of multiobjective optimization: Given a set P ⊆ R n , identify the objectives for which P is the Pareto set.
Although it is possible to state this problem in such a general form, it will have many degenerate solutions that we are are not interested in, since there is no restriction on any type of regularity of the objective functions.Therefore, we will instead consider a more well-behaved version of this problem that arises by using the concept of Pareto criticality.A point x * ∈ R n is called Pareto critical if it satisfies the Karush-Kuhn-Tucker (KKT) conditions [22], i.e., if there is a convex combination of all the gradients of the objective functions f i ∈ C 1 (R n , R), i ∈ {1, ..., k}, in x * which is zero.In that case, if α * ∈ R k contains the coefficients of this convex combination, then α * is called a KKT vector of x * and the pair (x * , α * ) is called an extended Pareto critical point.The set of all such pairs is called the extended Pareto critical set.The above problem can be restated using this concept: Given a finite data set D = (D x , D α ) ⊆ R n × R k , find an objective vector f ∈ C 1 (R n , R k ) whose extended Pareto critical set contains D. (IMOP) Since the search space C 1 (R n , R k ) is infinite-dimensional, we will consider finite-dimensional subspaces of C 1 (R n , R) that are spanned by sets of basis functions B ⊆ C 1 (R n , R).This will transform (IMOP) into a homogeneous linear system in the coefficients of the basis functions which can be solved by singular value decomposition.
For the single objective case, i.e., for k = 1, the problem (IMOP) is addressed within the field of inverse optimization.For combinatorial problems, a survey on inverse optimization can be found in [17].In [19] and [1], inverse linear problems of the form min x c x (with some linear constraints) were considered, where the goal is to find the cost vector c so that a given feasible point is optimal.In [21], convex parameter-dependent problems were considered with the intention of estimating the objective functions from observations of parameter values and associated optimal solutions.Recently, first results in the multiobjective case have appeared.In [6], inverse linear multiobjective problems are treated similarly to [21] by transforming the multiobjective problem into a scalar problem via the Weighting Method.In [7], the ideas of [21] and [6] are combined with the additional focus on preserving the trade-off in the given solution.
In all previous approaches, certain properties have to be assumed for the objective functions such as linearity, convexity or even a parameter dependent formulation.In contrast to this, we will make no assumptions besides differentiability.Additionally, instead of single points, the approach can be applied to an arbitrary amount of data points.This will allow us to consider the inverse problem of multiobjective optimization in a much more general setting.
The remainder of this article is structured as follows.We begin with a brief introduction to multiobjective optimization in Section 2 before presenting our main theoretical results in Section 3. There, we will first investigate the existence of an objective vector in the span of the chosen basis B for which the data points are extended Pareto critical.We then address the task of finding the objective vector whose extended Pareto critical set is as close to a given data set as possible.The application of the resulting algorithm to two important problem classes is presented in Sections 4 and 5.These are the construction of objective functions from both clean and noisy decision data as well as the generation of surrogate models for expensive MOPs.Finally, we draw a conclusion and discuss possible future work in Section 6.
For our numerical results, we use the built-in method svd from MATLAB 2017a for singular value decomposition.For the computation of the extended Pareto critical sets in this article, we use the Continuation Method CONT-Recover from [30].

Multiobjective optimization
In this section, we will briefly introduce the basic concepts of multiobjective optimization.For a more detailed introduction, we refer to [25,12,18].
Let f : R n → R k be a vector-valued function, called the objective vector, with continuously differentiable components f i : R n → R, i ∈ {1, ..., k}, called objective functions.It maps the variable space R n to the image space R k .The goal of multiobjective optimization is to minimize the objective vector f , i.e., to minimize all objective functions f i simultaneously.This is called a multiobjective optimization problem (MOP) and is denoted by . . .
In contrast to scalar optimization (i.e., k = 1), it is not immediately clear what we mean by minimizing f , as there is no natural total order of the objective values in R k for k > 1.As a result, we cannot expect to find a single point that solves (MOP).Instead, we search for the Pareto set which is defined in the following way: for all i ∈ {1, ..., k} and f j (x * ) < f j (x) for some j ∈ {1, ..., k}.
(c) The set of all (locally) Pareto optimal points is called the (local) Pareto set, its image under f the (local) Pareto front.
Similar to scalar optimization, there are necessary conditions for local Pareto optimality using the first order derivatives of f , called the Karush-Kuhn-Tucker (KKT) conditions [18]: Theorem 2.2.Let x * be a locally Pareto optimal point of (MOP) and Then there exists some α * ∈ ∆ k such that For k = 1, these conditions reduce to the well-known optimality condition ∇f (x * ) = 0.The set of points satisfying the KKT conditions is a superset of the (local) Pareto set and we make the following definition: Since the structure of P c and P M will be important for our approach, we will briefly mention three results: In [9,24] it was shown that P c is generically a stratification, which basically means that it is a "manifold with boundaries and corners".In [15] it was shown that the boundary (or edge) of P c is covered by Pareto critical sets of subproblems where only subsets of the set of objective functions are optimized.In [19] it was shown that {(x, α) ∈ P M : α ∈ (R >0 ) k } ⊆ P M is a (k − 1)-dimensional submanifold of R n+k if a certain rank condition holds.

Inferring objective vectors from data
We will now present a way to construct objective vectors for which P M contains a finite set of given data points D x = {x 1 , ..., xN } ⊆ R n with corresponding KKT vectors D α = {ᾱ 1 , ..., ᾱN } ⊆ ∆ k .The general concept of this inverse approach is to consider x * and α * as given in (KKT) instead of the objective vector f .So in contrast to the usual task of searching for an x ∈ R n for which an α ∈ ∆ k exists so that (KKT) holds, we now search for an f ∈ C 1 (R n , R k ) for which (KKT) holds for all xj and ᾱj , j ∈ {1, ..., N }.As it is infinite-dimensional, we obviously cannot search the entire C 1 (R n , R k ).Instead, we consider finite-dimensional linear subspaces of C 1 (R n , R) that are spanned by a set of basis functions B = {b 1 , ..., b d } ⊆ C 1 (R n , R), and then search for f ∈ span(B) k .An example for the choice of basis functions are the monomials in n variables (up to a certain degree) such that span(B) is the space of polynomials.The usage of basis functions reduces the task of finding an f ∈ C 1 (R n , R k ) to the task of finding the coefficients c ∈ R d of the corresponding linear combination of basis functions.This problem can be stated as a homogeneous linear problem in c and can be solved efficiently via singular value decomposition.In particular, the smallest singular value can be used as a measure of how well the given data set can be represented as an extended Pareto critical set of an objective vector consisting of the given basis functions.
We will assume for the remainder of this section that the following are given: • a data set D = {(x 1 , ᾱ1 ), (x 2 , ᾱ2 ), ..., (x N , ᾱN )} ⊆ R n × ∆ k (and in particular the number of objective functions k), • a set of basis functions with linearly independent derivatives.

Existence of exact approximations
In this subsection, our goal is to find an objective vector with components in span(B) for which the set D is exactly extended Pareto critical.In other words, our goal is to find a function f : R n → R k , f = (f i ) i∈{1,...,k} , f = 0 with f i ∈ span(B) ∀i ∈ {1, ..., k} and To this end, for f i ∈ span(B), we can write for some c i ∈ R d .Thus, we obtain Then ( 2) is equivalent to the homogeneous linear system Since the derivatives of the basis functions are linearly independent, a (nontrivial) function satisfying (2) exists if and only if We will now consider two cases for the dimension of system (5): 5) is an underdetermined system.In this case, ( 6) automatically holds such that (5) possesses at least one nontrivial solution.In other words, dim(ker(L)) > 0.
Note that in this case, our approach resembles an interpolation method.In fact, for n = 1, k = 1 and monomial basis functions, L is similar to the Vandermonde matrix from polynomial interpolation (without the constant column).5) is a square or overdetermined system.This means that generically, (5) does not have a solution, and we have to check the condition (6).In practice, we can use singular value decomposition (SVD) to do this, as the rank of L equals the number of singular values of L that are non-zero.In particular, as rk(L) = k • d − dim(ker(L)), it yields the dimension of the solution space of (5).
For ease of notation, we make the following definition: be the map that maps a coefficient vector c onto the corresponding objective vector (f i ) i (cf.(3) and (4)).
It is easy to see that F is linear and by the linear independence of the derivatives of the basis functions, F is also injective.

Finding the best approximation
In most applications, one can expect that ( 5) is overdetermined and that it cannot be solved exactly.Even if there was a solution, we would require exact data to find it, which is numerically impossible.Furthermore, the case where the data is slightly noisy is much more realistic.Therefore, it makes more sense to look for the MOP whose extended Pareto critical set is the best approximation for a given data set, i.e., where Df (x) ᾱ 2 is as small as possible for all (x, ᾱ) ∈ D. To this end, consider the problem min where the vector of coefficients is constrained to the unit sphere i.e., the optimal value of ( 7) is an upper bound for all Df (x) ᾱ 2 with (x, ᾱ) ∈ D. In particular, the optimal value of ( 7) is zero if and only if (6) holds.Problem (7) can easily be solved using SVD (see, e.g., [14]): Assume that n Lc 2 = span({v i : Consequently, s 1 is a measure for how well the data set D can be approximated with the extended Pareto critical set of an MOP where the objective functions are linear combinations of the basis functions in B. Furthermore, the singular values of L can be used to determine the dimension of the space of approximating objective vectors.
Algorithm 1 Generate objective vector from data Given: with c * 2 = 1.5: Assemble the objective vector f = F(c * ) as in (3).Algorithm 1 summarizes the numerical procedure which follows from the above considerations.The resulting approximation then satisfies the following property: Theorem 3.2.Let f be the result of Algorithm 1 and s i * be the largest singular value less or equal to s.Then In particular, if s i * = 0, then all (x, ᾱ) ∈ D are extended Pareto critical for the MOP with objective vector f .
Proof.Let c * be the coefficient vector in step 4 such that f = F(c * ).Then there is some Combining this with (8) completes the proof.Some properties of Algorithm 1 are highlighted in the following remark.b) In general, if i * > 1, there is no obvious choice for c in step 4. A possible approach is to choose c as sparse as possible (using, e.g., L 1 minimization [31]).This can be very advantageous for interpretability, see also [5] for sparse identification in the dynamical systems context.
c) It is important to note that by construction, if s i * = 0, we can only guarantee that D is a subset of the extended Pareto critical set of f .It is possible for the extended Pareto critical set to contain more than just D (cf.Example 4.1).Therefore, there are cases where the smallest singular value is 0, but the corresponding MOP might not be desirable.
d) According to (9), if s i * = 0, we can take any element of span({v i : i ∈ I}) \ {0} in step 3 and do not need to normalize it.
We will conclude this section with a brief discussion on the choice of the set of basis functions B. It should satisfy the following requirements: (i) The derivatives of the basis functions should be linearly independent to avoid trivial solutions.(In particular, this implies that the representation of the derivatives of elements of span(B) via coefficients of the derivatives of elements of B is unique.) (ii) Since we have to evaluate the derivatives of the basis functions in every data point in D x for the assembly of L, the evaluation of these derivatives should be efficient.
(iii) In practice, an initial, a priori choice of B will often be insufficient.Thus, it should be possible to increase the quality of the approximation by increasing the size of B without much effort.
An intuitive choice for B are the monomials in n variables up to degree l ∈ N, i.e., It is easy to see that (i) and (ii) are satisfied for this choice.For (iii), the Stone-Weierstrass theorem (cf.[27]) implies that for any compact D ⊆ R n and any g ∈ C 1 (D, R), there exists a sequence of polynomials on D that converges to g (with respect to • ∞ ).Thus, for g ∈ C 1 (R n , R k ), uniform convergence with polynomial functions can be guaranteed component-wise on compact subsets of R n .Therefore, for the rest of this article, we will always consider the monomials up to a fixed degree as the set of basis functions.

Application 1: Constructing objectives from clean and noisy data
In this section, we will show how the results from Section 3 can be utilized to construct objective functions from clean and noisy data.Our first example will be the construction of test problems for MOP solvers, where the data comes from a discretization of some continuous (i.e., non-discrete) set.In the second example, we will consider a stochastic MOP, where we will reconstruct the expected value of the objective vector using stochastic (i.e., noisy) solution data.

Inferring objectives from exact data
Test problems and generators of test problems are an important tool to investigate the behavior and to benchmark MOP solvers (cf.[10,33,20]).The idea is to interpret our method from Section 3 as a way to generate MOPs where we already know the extended Pareto critical set.Instead of finitely many extended Pareto critical points, we here want to prescribe the complete set.Unfortunately, as discussed in Section 3, this will generically fail, as we can only assure that the prescribed data set is Pareto critical for the MOP resulting from Algorithm Nonetheless, it turns out that if we use monomials as basis functions, the resulting space of polynomials is large enough to contain objective vectors for many non-trivial classes of infinite data sets.
When prescribing an infinite data set we have to ensure that D ∞ has the properties of an extended Pareto critical set from a theoretical point of view.This is obviously the difficult part of this approach and requires some knowledge about the structure of (extended) Pareto critical sets.In the following, we will briefly summarize the generic implications of the results from Section 2: • According to [18], D ∞ should (locally) be a differentiable manifold.In practice, this means that similar x ∈ D ∞ x should have similar ᾱ ∈ D ∞ α .
• Following [15], for points (x, ᾱ) ∈ D ∞ where x lies on the edge of D ∞ x , we have to ensure that ᾱj = 0 for some j ∈ {1, ..., k}.In particular, for multiple Pareto critical points on the same edge, the same component of the corresponding ᾱ has to be zero.
After constructing a data set D ∞ with the properties mentioned above, we consider a pointwise discretization D ⊆ D ∞ for large N = |D| and apply Algorithm 1.If the smallest singular value of L is zero, D ∞ will be in the extended Pareto critical set of the resulting MOP.
The corresponding objective vector f = F(c) is given by One can show that for this objective vector, the KKT conditions are indeed equivalent to i.e., the Pareto critical set is precisely S 1 with the corresponding KKT vectors given in (10).
(In particular, we did not need to normalize c in step 4 of Algorithm 1 in this case.)Figures 1(b) and (c) show the Pareto critical set and the image of the Pareto critical set under f .
Example 4.1 shows how the results from Section 3 can be used to derive an explicit expression for an objective vector for a prescribed data set.The following example shows that we can even derive more general formulas. .
In Examples 4.1 and 4.2, the symbolic expressions could easily be verified.In particular, in step 4 of Algorithm 1 we were able to choose c such that the Pareto critical set did not contain more than what we intended, i.e., P c was precisely the unit circle or an ellipse.This obviously only works if the data set is sufficiently well-structured.The following example shows a more complicated case.
They are shown in Figure 3(a).For D x we choose N c = 500 equidistant points on each C i , the corresponding D α are chosen linearly from (0, 1) to (1, 0) , and we again use monomials as basis functions.When dealing with more complex data sets, we first have to estimate the required degree of monomials for a satisfactory approximation.To this end, we repeat step 2 of Algorithm 1 for different maximal degrees.The smallest singular value depending on the maximal degree of the monomials is shown in Figure 2(a).We see that the monomials up to a degree of 5 are a promising choice, since the smallest singular values do not decrease further after that.Figure 2(b) shows all singular values for this set of basis functions.There is a relatively large gap from s 4 = 2.09 • 10 −14 to s 5 = 8.17 • 10 −9 , suggesting that s = s 4 , i.e., I = {1, 2, 3, 4}, in step 3 of Algorithm 1 is a good choice.In this case, there is no obvious way to obtain an expression like (11) for span({v 1 , v 2 , v 3 , v 4 }), which is why we choose c = v 1 v 1 2 in step 4. The Pareto critical set of f = F(c) and its image are shown in Figure 3.As expected from the small singular values, the given data set is approximated almost perfectly.Unfortunately, we observe an additional connected component that is not contained in the data.Since we are unable to influence properties outside the given data set D, additional Pareto critical points can be expected in the general case.However, these can be identified subsequently via comparison to the data set D in many cases.

Inferring objectives from noisy data
In the previous examples, we assumed that we have precise data D that we want to approximate with an extended Pareto critical set.However, there are many cases where this assumption is unrealistic, for instance real-world applications where the data stems from numerical simulations or measurements.Another example is stochastic multiobjective optimization, which we will consider here.We will only give a brief introduction on this topic and refer to [13] for a more detailed discussion.Let ξ ∈ R m be a random vector and f : (SMOP) Since we cannot evaluate F directly, in practice the sample average is used, where ξ 1 , ..., ξ Ns are identically independent distributed samples of ξ.Using this ap-proximation, we consider the Sample Average Approximation problem min For N s = ∞ the solutions of (SMOP) and (SAA) coincide.Otherwise, for a finite N s ∈ N, we can only expect the solution of (SAA) to be an approximation of the solution of (SMOP).In other words, we can consider the solution of (SAA) as inexact data of the solution of the original problem (SMOP) and use our approach to approximate the original solution and objective vector F .We illustrate this approach on the following Multiobjective Stochastic Location Problem from [13].
Example 4.4.Let a := (−1, −1) and ξ := (ξ 1 , 0) be a random vector, where In this case, we have so the Pareto critical (and in this case Pareto optimal) set of (SMOP) is given by the line connecting a and (1, 0) .The KKT vector corresponding to given by α = (t, 1 − t) .If we solve (SAA) instead of (SMOP) (via the smoothing Chebyshev scalarization described in [13]), we obtain an approximation of the original Pareto set as in Figure 4(a).Since there is no noise in the first component of f , the approximation is relatively accurate close to a and becomes worse when moving towards (1, 0) .We now interpret the points in Figure 4(a) as the data set D x .The KKT vector α ∈ D α corresponding to x ∈ D x is chosen as α = (−x 2 , 1 + x 2 ) .We choose B as the monomials up to degree 2, i.e., B := {x 1 , x 2 1 , x 2 , x 1 x 2 , x 2 2 }.In this basis, the objective vector F of (SMOP) can be represented exactly (up to the constants in both components) by the coefficient vector c = (2, 1, 2, 0, 1, −2, 1, 0, 0, 1) . ( When applying Algorithm 1, we obtain the singular values shown in Figure 4 showing that we were able to construct a very good approximation of the objective vector F from noisy data.

Application 2: Generation of surrogate models of expensive MOPs
In this section, we will use the results from Section 3 for the generation of surrogate models for MOPs with an objective vector f e that is known but very costly to evaluate.This scenario occurs frequently for complex physics simulations, e.g., when the system under consideration is described by a partial differential equation, cf.[23,2,26,3] for examples.Here, while it is often possible to calculate single Pareto critical points, the computation of the full Pareto critical set via a fine pointwise approximation is computationally infeasible.In this situation, we will use Pareto critical points of the expensive model f e and their corresponding KKT vectors as data points.Our goal is to find an MOP whose extended Pareto critical set is as close as possible to the extended Pareto critical set of f e while using as few data points as possible.
Surrogate modeling is a very active area of research and has been used extensively for simulation and optimization, see [29,4] for overviews.In recent years, surrogate models have also attracted interest in the multiobjective optimization community.All methods proposed so far have the common goal of finding a surrogate model for the objective function f e , for instance by polynomial regression (cf., e.g., [8,32]).Consequently, the surrogate model will possess a dominance relation similar to the original function and as a result, dominance-based methods like evolutionary algorithms can be applied.In contrast to this, our approach constructs surrogate models which resemble the original KKT conditions.This means that we (in general) do not obtain a surrogate model for the objective function but for the first-order optimality condition such that KKT-based methods like continuation [30] can be used.
When "fitting" a surrogate model to a data set of limited size as in this case, it is important avoid underfitting and overfitting.These terms are common in statistics and machine learning, but they apply here in a similar fashion.In general, underfitting means that the chosen model is not able to capture all structures that are present in the data set.In our context, this means that we chose an unsuited (e.g., too small) set of basis functions.When using monomials as basis functions, one can try to circumvent this by using a higher maximal degree (as in Example 4.3).On the other hand, overfitting means that the model captures structures in the data set that were caused by noise and are highly dependent on the data used for fitting the model.In our context, this happens when the number of basis functions d in B is too large.A necessary condition to circumvent this is to ensure that n As discussed in Section 3, if this condition does not hold then we always find an objective vector in the chosen basis where the data points are exactly extended Pareto critical.Thus, if ( 15) is violated, overfitting is unavoidable (as long as we are not working with exact data).
To illustrate the behavior of our method, we begin with an example where the objective vector is cheap to evaluate and we already know the solution of the MOP.We consider the problem L&H 2×2 from [16], where the objective vector is non-polynomial and has a complex (extended) Pareto critical set.for The Pareto critical set of this MOP and its image are shown in Figures 5(a The previous example shows that few data points of the original objective vector can suffice to generate a good surrogate model, even if the original objective vector does not lie in the span of the chosen basis functions.In order to highlight the potential for increased efficiency in realworld applications, our next example considers an MOP where the evaluation of the objective vector is very expensive. Example 5.2.In this example, we consider the flow around a cylinder governed by the 2D incompressible Navier-Stokes equations at a Reynolds number of 100, where the goal is to influence the flow field by rotating the cylinder (cf. Figure 7(a)): ẏ(x, t) + y(x, t) • ∇y(x, t) = ∇p(x, t) + 1 Re ∆y(x, t), ∇ • y(x, t) = 0, y(x, 0) = y 0 (x).

(NSE)
Here, y is the flow velocity and p is the pressure.For the non-rotating cylinder, the well-known von Kármán vortex street occurs.This is a periodic solution where vortices detach alternatingly from the upper and lower edge of the cylinder, respectively.This setup is a classical problem from flow control which has been studied extensively in the literature both using direct approaches as well as surrogate models, see [26] and the references therein.The classical goal is to stabilize the flow, i.e., to minimize the vertical velocity.This can be associated with minimizing the vertical force on the cylinder, the lift C L .As a second goal, we want to minimize the control effort, which results in the following multiobjective optimal control problem: By introducing a sinusoidal control u(t) = x 1 sin(2πx 2 t) and assuming that the control-to-state mapping is injective, Problem (16) can be transformed into the MOP Since the Navier-Stokes equations are a system of nonlinear partial differential equations, we have to introduce a spatial discretization (here via the finite volume method) with 22, 000 cells, which results in 66, 000 degrees of freedom at each time instant.Consequently, it is infeasible to accurately solve Problem (17) directly, regardless of the used method.
One way to approach this problem is to introduce a surrogate model for the system dynamics (NSE), for instance via Proper Orthogonal Decomposition [26].In contrast to this, here, we directly construct a surrogate model for the MOP (17) instead of the system dynamics.In order to generate the required data points D, we apply scalarization via the well-known Weighting Method (i.e., min w 1 f e 1 + w 2 f e 2 , cf. [25]) to (17) with varying weights An advantage of this method is that we directly obtain the KKT vectors of the resulting Pareto optimal points as the corresponding weights that were used to calculate them.Since there are convergence issues for i ∈ {10, ..., 16} using the weighted sum (likely due to a large number of local minima, which is a known problem), we will exclude these points from our data set.The remaining 19 points are shown in Figures 7(b) and (c).
Considering D x and D α , it appears that the Pareto set consists of a single one-dimensional connected component whose corresponding KKT vectors are monotonically increasing and decreasing in their first and second component, respectively.Due to this simple structure, we take all monomials up to degree 2 as our set of basis functions.The singular values of the resulting L ∈ R 38×10 are shown in Figure 8 A projection of the corresponding extended Pareto critical set is depicted in Figure 8(b), showing that all data points are close to the solution of the surrogate problem.In order to obtain an approximation of the Pareto front of the original MOP (17), we can evaluate the original objective vector f e in a pointwise discretization of the Pareto critical set of the surrogate model f .In order to evaluate the performance, we compare our results with the well-known NSGA-II algorithm [11] (implementation from MATLAB's Global Optimization Toolbox) directly applied to the MOP (17).The results are depicted in Figure 9. Here, we have used an initial population size of 100 for NSGA-II and a discretization of the Pareto critical set of our surrogate model (19) with 468 equidistant points.Figure 9 shows that although we only used 19 data points for the generation of our surrogate model and there was a gap in our data set, we are able to obtain a good approximation of the Pareto set and front in a very efficient manner.
As already mentioned, it is crucial for our approach to obtain not only Pareto critical points of the given, expensive MOP, but also the corresponding KKT vectors.As mentioned in the previous example, when applying the Weighting Method, the KKT vector of an optimal point is immediately given.This means that no additional effort has to be put into computing the KKT vectors in that case.Similar results can be shown for the -Constraint Method and the Reference Point Method (cf.[25]), where the KKT vectors can be obtained from the first order optimality condition of the scalar subproblems.For other methods, in particular evolutionary algorithms, we cannot always expect to directly obtain the KKT vectors for free.Given only the Pareto critical (or optimal) point, a straight-forward way to obtain the corresponding KKT vector would be to evaluate the gradients of the objective functions and solve (KKT) as a linear system in α.However, this approach can obviously be very time consuming.Furthermore, knowledge of the derivatives is required.A much cheaper alternative is to exploit the fact that KKT vectors are orthogonal to the linearized Pareto front [18].For a pointwise approximation of the Pareto front, e.g., obtained by NSGA-II, we can use linear regression in each point of the front using only the neighboring points on the front to obtain an approximation of the tangent space of the Pareto front.While this requires a relatively even discretization of the Pareto front, it is much cheaper than assembling and solving the above-mentioned linear system.

Conclusion and outlook
In this article, we present a way to construct objective vectors of MOPs so that the extended Pareto critical sets contain a given data set.This is realized by considering the x * and α * in the KKT conditions (KKT) as given by the data and then searching for an objective vector f ∈ C 1 (R n , R k ) that satisfies the resulting system of equations.By using a finite set of basis functions B ⊆ C 1 (R n , R), the optimal objective vector can be obtained via singular value decomposition, which results in Algorithm 1.
The ability to infer objective vectors from (potentially noisy) data has several powerful applications.In examples, we showed how it can be used to generate test problems for solution methods of MOPs and to approximate the Pareto set and objective vector of stochastic MOPs.Alternatively, the approach can be used to significantly reduce the computational effort for expensive MOPs.Using several data points from the expensive problem, a much cheaper surrogate model can be constructed which can be solved significantly faster.
To the best of the authors' knowledge, this article presents the first approach to deal with the inverse problem of multiobjective optimization (i.e., finding the objective vector to a given Pareto (critical) set) in such a general way.Therefore, there are many aspects that are should be investigated further: • While the results in Section 3 hold for any number of variables n and any number of objectives k, all the examples in this article consider the case of k = 2 objective func-tions in n = 2 variables.This allowed us to easily visualize the most important features of our approach.Nevertheless, the behavior for higher-dimensional examples is worth investigating.
• For the reasons mentioned at the end of Section 3, we only used monomials up to different maximal degrees as basis functions B. Although this lead to satisfactory results in the examples considered here, there might be more sophisticated choices, in particular if one has some knowledge of the problem structure.
• As the KKT conditions (KKT) can also be formulated for equality and inequality constrained MOPs, we expect that a generalization of our approach to constrained MOPs is possible.
• Since we can currently only influence the Pareto critical set of the resulting objective vector, an extension to Pareto optimality would be of significant interest, in particular in applications.As sufficient optimality conditions for MOPs are using second order derivatives (cf.[25]), a possible way to control the optimality of the data set might by to incorporate the hessians of the basis functions in our approach.
• For the generation of surrogate models, it is important to ensure that the (extended) Pareto critical set of the surrogate model is indeed a good approximation of the actual (extended) Pareto critical set.The convergence result in Theorem 3.2 states that the smallest singular value of L is an upper bound for the Euclidean norm of the KKT conditions in the data points.However, this can not be used directly to obtain an estimate for the Hausdorff distance between the Pareto critical set and its surrogate approximation, which is why further investigation of the convergence theory is required.
• In order to improve the robustness, it is advisable to develop automated procedures for selecting s as well as c (i.e., the threshold for the singular values and the coefficients for the basis).

Definition 2 . 3 .
Let x ∈ R n .a) If there exists some α ∈ ∆ k (with ∆ k as in (1)) such that (KKT) holds, then x is called Pareto critical and α a KKT vector of x containing the KKT multipliers α i , i ∈ {1, ..., k}.The set of Pareto critical points P c of (MOP) is called the Pareto critical set.(Pareto critical points are sometimes also referred to as substationary points by other authors.)b) In the situation of a), the pair (x, α) ∈ R n ×∆ k is called an extended Pareto critical point.The set of all such pairs P M ⊆ R n × ∆ k is called the extended Pareto critical set.

2 =1Lc 2 = s 1 and arg min c 2
the right-singular vectors of L, i.e., the columns of V .Then min c =1

Figure 1 :
Figure 1: (a) Singular values of L in Example 4.1.(b) Pareto critical set of f .(c) Image of the Pareto critical set of f .

Example 4 . 2 .
Using the same strategy as in Example 4.1, it is possible to numerically verify that arbitrary ellipses can be represented as Pareto critical sets of polynomial MOPs.For a, b ∈ R >0 , we merely have to replace the xj from Example 4.1 by

Figure 2 :
Figure 2: (a) Smallest singular value of L for different degrees of the monomial basis in Example 4.3.(b) Singular values of L for degree 5.

Figure 3 :
Figure 3: (a) Pareto critical set of f in Example 4.3.(b) Image of the Pareto critical set of f .
be the (component-wise) expected value of f (x, ξ).For F (x) := E[f (x, ξ)] we consider the stochastic multiobjective optimization problem min x∈R n F (x).

Figure 4 :
Figure 4: (a) Approximation of the solution of (SMOP) via smoothing Chebyshev scalarization with 1000 points for Example 4.4.(b) Singular values of L. (c) Pareto critical set of the original problem (dotted line) and the approximation (solid line).

Figure 5 :Figure 6 :
Figure 5: (a) Pareto critical set of (L&H 2×2 ).The dots represent the N = 17 data points used for the surrogate model construction.(b) Image of the Pareto critical set.(c) Singular values of L for monomials up to degree 4 for the chosen data points.

Figure 7 :
Figure 7: (a) Flow around a cylinder, controlled via cylinder rotation.(b) Result of the Weighting Method applied to the MOP (17) in the variable space.(c) Image of the resulting points under the objective function (17).

Figure 8 :
Figure 8: (a) Singular values of L in Example 5.2.(b) Pareto critical set of (19) (solid line) and the data in D x (circles), where the first KKT multiplier α 1 is shown in the third dimension.

Figure 9 :
Figure 9: (a) The approximation of the Pareto set via the surrogate model compared to NSGA-II in Example 5.2.(b) Comparison of the corresponding approximations of the Pareto fronts.