GLISp-r: a preference-based optimization algorithm with convergence guarantees

Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a preference between two calibrations (samples), highlighting which one, if any, is better than the other. The optimization procedure must use the observed preferences to find the tuning of the decision vector that is most preferred by the decision-maker, while also minimizing the number of comparisons. In this work, we formulate the preference-based optimization problem from a utility theory perspective. Then, we propose GLISp-r, an extension of a recent preference-based optimization procedure called GLISp. The latter uses a Radial Basis Function surrogate to describe the tastes of the decision-maker. Iteratively, GLISp proposes new samples to compare with the best calibration available by trading off exploitation of the surrogate model and exploration of the decision space. In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS, a popular procedure in the black-box optimization framework. Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem. We motivate this claim theoretically, with a proof of global convergence, and empirically, by comparing the performances of GLISp and GLISp-r on several benchmark optimization problems.


Introduction
Preference-Based Optimization (PBO) algorithms seek the global solution of an optimization problem whose objective function is unknown and unmeasurable.Instead, the "goodness" of a decision vector is assessed by a human Decision-Maker (DM), who compares different samples (i.e.tunings of the decision vector) and states which of them he/she prefers.In this work, we consider queries to the DM that involve two options.The output of a query is the preference between the two calibrations of the decision vector expressed by the decision-maker (e.g."this tuning is better than that one") [3].
Preference-based optimization is closely related to industry practice.In the context of control systems, often the performances achieved by a regulator tuning are evaluated by a decision-maker, who expresses his/her judgment by observing the behavior of the system under control.Multiple experiments must be performed until the DM is satisfied by the closed-loop performances.If a trial-and-error approach is adopted, then there is no guarantee that the final tuning is optimal in some sense.Moreover, the calibration process might be quite time-consuming since possibly many combinations of the controller parameters need to be tested.PBO algorithms constitute a better alternative to the trial-and-error methodology due to the fact that they drive the experiments by exploiting the information carried by the preferences expressed by the DM.The goal is to seek the calibration of the decision vector that is most preferred by the decision-maker while also minimizing the number of queries, thus performing fewer experiments.Successful applications of preference-based optimization procedures include [40], where the authors use algorithm GLISp [3] to tune the controller for a continuous stirring tank reactor and for autonomous driving vehicles.The same algorithm has been employed in [31] to calibrate the parameters of a velocity planner for robotic sealing tasks.
Although preference-based optimization is a valuable tool for solving those calibration tasks that involve human decision-makers, its applicability is broader.In multi-objective optimization, PBO can be employed to scalarize the multi-objective optimization problem into a single objective one by tuning the weights for the weighted sum method, or to choose which, among a set of Pareto optimal solutions, is the most suited for the DM [3,21].PBO can also be used to perform active preference learning.In general, preference learning methods estimate a predictive model that describes the tastes of a human decision-maker [12].Preference-based optimization methods also rely on a predictive model (called surrogate model), which is updated after each query to the DM, but its prediction accuracy is not the main concern.Rather, the sole purpose of the surrogate model is to drive the search towards the most preferred tuning of the decision vector.
The preference-based optimization problem can be formalized by applying notions of utility theory [27].Suppose that there exists a binary relation, called the preference relation, which describes the tastes of the decision-maker (i.e. the outputs of the queries).If the preference relation of the DM exhibits certain properties, then it is possible to represent it with a continuous (latent) utility function1 [8], which assigns an abstract degree of "goodness" to all possible calibrations of the decision vector.The tuning that is most preferred by the decision-maker is the one with the highest utility and corresponds to the optimizer of the PBO problem.
As previously mentioned, many preference-based optimization algorithms rely on a surrogate model, which is an approximation of the latent utility function built from the preference information at hand.In general, any procedure that uses a surrogate model when solving an optimization problem is said to be a surrogate-based (or response surface) method.These algorithms are mostly used for black-box optimization problems, where the objective function is unknown but measurable (see [37,19]).Nevertheless, surrogate-based methods can also be employed for PBO problems, provided that the surrogate model is properly defined.In practice, preference-based response surface methods iteratively propose new samples to be compared to the available best candidate by trading off exploitation (or local search), i.e. selecting samples that are likely to offer an improvement based on the data at our disposal, and exploration (or global search), i.e. finding samples in regions of the decision space of which we have little to no knowledge.Typically, new candidate samples are sought by minimizing or maximizing an acquisition function that encapsulates these two aspects.
Most surrogate models either rely on Gaussian Processes (GPs) [39], giving rise to preferential Bayesian optimization [13], or Radial Basis Functions (RBFs) [15].For example, in [7] the authors propose a predictive model for the latent utility function that is based on GPs.The latter is used as a surrogate model in [6] to carry out preference-based optimization.Alternative preferential Bayesian optimization algorithms are proposed in [4] and in [13].Recently, in [3] the authors developed a preference-based optimization method, called GLISp, which is based on a RBF approximation of the latent utility function.
In this work, we propose an extension of the GLISp [3] algorithm (that we will refer to as GLISp-r) which is more robust in finding the global solution of the preference-based optimization problem.To do so, we: 1. Address some limitations of the acquisition function used in [3], which can cause the procedure to miss the global solution and get stuck on local optima; 2. Dynamically vary the exploration-exploitation trade-off weight, allowing GLISp-r to alternate between local and global search.This is commonly done in the black-box optimization framework, see for example [15,30,38,29], but it has not been tried for GLISp [3]; 3. Provide a proof of global convergence for GLISp-r, furtherly motivating its robustness.Currently, no such proof is available for GLISp [3] (or for any of the most popular preference-based response surface methods, cf. also [6,13,4]).
Regarding this last point, as a minor contribution of this work, we formalize the preference-based optimization problem from a utility theory perspective, allowing us to analyze the existence of a solution and, ultimately, to prove the global convergence of GLISp-r.
This paper is organized as follows.In Section 2, we introduce the preference-based optimization problem.Section 3 addresses how to build the surrogate model (as in GLISp [3]) and look for the next sample to evaluate, keeping in mind the exploration-exploitation dilemma.We also briefly cover the exploration function used in [3].The latter is thoroughly analyzed in Section 4, where we propose a solution to the shortcomings that we have encountered in GLISp [3].Section 5 describes algorithm GLISp-r and addresses its convergence.Then, Section 6 compares the performances of GLISp-r with GLISp [3] and C-GLISp [41], a revisitation of the latter method by the same authors, on several benchmark optimization problems.Finally, Section 7 gives some concluding remarks.

Problem formulation
Let x ∈ R n , x = x (1) . . .x (n) ⊤ , be the n-dimensional decision vector and suppose that we are interested in finding its calibration that is most preferred by the decision-maker within a subset of R n , namely Ω ⊂ R n .In particular, we define the constraint set Ω as: In ( 1), l, u ∈ R n , l ≤ u, are the lower and upper bounds on the decision vector while g ineq : R n → R qineq and g eq : R n → R qeq are the constraints functions associated to the inequality and equality constraints respectively (which are q ineq ∈ N ∪ {0} and q eq ∈ N ∪ {0}).Notation-wise, 0 qineq represents the q ineq -dimensional zero column vector (and similarly for 0 qeq ).We suppose that: (i) all of the constraints in (1) are completely known and (ii) Ω includes, at least, the bound constraints l ≤ x ≤ u (the remaining equality and inequality constraints can be omitted, resulting in q ineq = q eq = 0).
In this work, we formalize the preference-based optimization problem from a utility theory [27] perspective.We start by introducing preference relations, a particular kind of binary relations that play a key role in PBO.Definition 1 (Binary relation [27]).Consider the constraint set Ω in (1); we define a generic binary relation R on Ω as a subset R ⊆ Ω × Ω.
Notation-wise, given two samples x i , x j ∈ Ω, we denote the ordered pairs for which the binary relation holds, Definition 2 (Preference relation [27]).A preference relation, ≿⊆ Ω × Ω, is a binary relation that describes the tastes of a human decision-maker.
In the context of utility theory, x i ≿ x j implies that the DM with preference relation ≿ on Ω deems the alternative x i at least as good as x j .We say that the decision-maker is rational (in an economics sense) if his/her preference relation exhibits certain properties, as highlighted by the following Definition.Definition 3 (Rational decision-maker [27]).Consider a decision-maker with preference relation ≿ on Ω.We say that the DM is rational if ≿ is a reflexive, transitive and complete binary relation on Ω.Now, we briefly review each of the aforementioned properties and give some insights as to why they characterize the rationality of an individual (see [8,27,10] for more details): • Reflexivity of ≿ on Ω implies that, for the DM, any alternative is as good as itself, i.e. x i ≿ x i for each x i ∈ Ω; • A decision-maker whose preference relation ≿ on Ω is transitive is able to express his/her preferences coherently since if x i ≿ x j and x j ≿ x k hold, then x i ≿ x k , for any x i , x j , x k ∈ Ω; • Completeness of ≿ on Ω implies that the DM is able to express a preference between any two alternatives in Ω, i.e. either x i ≿ x j or x j ≿ x i holds for each x i , x j ∈ Ω.
The preference relation ≿ on Ω of a rational decision-maker is usually "split" into two transitive binary relations [27]: • The strict preference relation ≻ on Ω, i.e. x i ≻ x j if and only if x i ≿ x j but not x j ≿ x i (x i is "better than" x j or, equivalently, x j is "worse than" x i ), and • The indifference relation ∼ on Ω, i.e. x i ∼ x j if and only if x i ≿ x j and x j ≿ x i (x i is "as good as" x j ).
One last relevant property for preference relations is continuity.Definition 4 (Continuous preference relation [27]).A preference relation ≿ on Ω is continuous if the strict upper and lower ≿-contour sets: Intuitively speaking, if ≿ on Ω is continuous and x i ≻ x j holds, then an alternative x k which is "very close" to x j should also be deemed strictly worse than x i by the decision-maker, i.e. x i ≻ x k .
Having defined the preference relation ≿ on Ω thoroughly, we can finally state the goal of preference-based optimization: (2) Formally, x * is called the ≿-maximum of Ω [27], i.e. the sample that is most preferred by the decision-maker with preference relation ≿ on Ω. Concerning the existence of x * , we can state the following Proposition, which can be seen as a generalization of the Extreme Value Theorem [1] for preference relations.Proposition 1 (Existence of a ≿-maximum of Ω [27]).A ≿-maximum of Ω is guaranteed to exist if Ω is a compact subset of a metric space (in our case Ω ⊂ R n ) and ≿ is a continuous preference relation on Ω of a rational decisionmaker (see Definition 3 and Definition 4).Proposition 1 will be relevant when proving the convergence of the proposed algorithm, in Section 5. Lastly, in order to write Problem (2) as a typical global optimization problem, we need to state one of the most important results in utility theory.Theorem 1 (Debreu's utility representation Theorem for R n [8]).Let Ω be any nonempty subset of R n and ≿ be a preference relation on Ω of a rational decision-maker (as in Definition 3).If ≿ on Ω is continuous, then it can be represented by a continuous utility function u ≿ : Ω → R such that, for any x i , x j ∈ Ω: Using Theorem 1, we can build an optimization problem to find the ≿-maximum of Ω.In particular, we define the scoring function, f : R n → R, as f (x) = −u ≿ (x) and re-write Problem (2) as: Remark 1. Formally, there could be more than one ≿-maximum of Ω, i.e.Problem (3) could admit multiple global solutions, as described by the set: In this work, without loss of generality, we assume that there exists only one global solution x * as in (3).In practice, as we will see in Section 5, any globally convergent preference-based optimization procedure generates a set of samples that is dense in Ω and thus, at least asymptotically, it actually finds all the global minimizers in X * .We do not make any assumptions on the local solutions of (3), which can be more than one.
On a side note, we could view preference-based optimization as a particular instance of black-box optimization [37,19] where the cost function is not measurable in any way.Instead, information on f (x) in (3) comes in the form of preferences, as described in the next Section.

Data available to preference-based optimization procedures
In GLISp [3], instead of considering the preference relation ≿ on Ω explicitly, the authors describe the outputs of the queries to the DM using the preference function π ≿ : R n × R n → {−1, 0, 1}, defined as: In light of the just reviewed utility theory literature, we can see that π ≿ (x i , x j ) in ( 4) is obtained from the utility representation of the preference relation ≿ on Ω (see Theorem 1) and from the fact that f (x) = −u ≿ (x).In the case of rational decision-makers (Definition 3), reflexivity and transitivity of the preference relation are highlighted by the following properties of the preference function: In the context of PBO, surrogate-based methods aim solve Problem (3) starting from a set of N ∈ N distinct samples of the decision vector: and a set of M ∈ N preferences expressed by the decision-maker: The term b h in ( 6) is the output of the h-th query, where the decision-maker was asked to compare two samples in X , as highlighted by the following mapping set:

Handling exploration and exploitation
In this Section, we review some key concepts that are common in most preference-based optimization algorithms.We also cover briefly how exploration and exploitation are handled by algorithm GLISp [3].
Preference-based response surface methods iteratively propose new samples to evaluate with the objective of solving Problem (3) while also minimizing the number of queries.Suppose that, at iteration k, we have at our disposal the set of samples X in (5), |X | = N , and the sets B in (6) and S in (7), |B| = |S| = M .We define the best sample found so far as: The new candidate sample, is obtained by solving an additional global optimization problem: where a N : R n → R is a properly defined acquisition function which trades off exploration and exploitation.
In practice, once x N +1 has been computed, we let the decision-maker express a preference between the best sample found so far and the new one, obtaining: After that, x N +1 is added to the set X and, similarly, the sets B and S are also updated with the new preference b M +1 .The process is iterated until a certain condition is met.Typically, a budget, or rather a maximum number of samples N max ∈ N, is set and the optimization procedure is stopped once it is reached.
In our case, a N (x) in ( 8) is defined starting from a surrogate model fN : R n → R, which approximates the scoring function f (x) of Problem (3), and a function z N : R n → R that promotes the exploration of those regions of Ω where fewer samples have been evaluated.The acquisition function that we will propose in this work is a weighted sum of these two contributions (see Section 4.3).fN (x) and z N (x) are defined as in GLISp [3], which we now review.

Surrogate model
Given N samples x i ∈ X in (5), we define the surrogate model fN : R n → R as the radial basis function expansion [9]: where φ : R ≥0 → R is a properly chosen radial function, ϵ ∈ R >0 is the so-called shape parameter (which needs to be tuned) and ϕ i : R n → R is the radial basis function originated from φ (•) and center ⊤ , is the radial basis function vector and β = β (1) . . .β (N ) ⊤ ∈ R N is a vector of weights that has to be estimated from the preferences in B (6) and S (7).Given a distance r = ∥x − x i ∥ 2 , some commonly used radial functions are [11]: • Linear: • Thin plate spline: One advantage of using ( 9) as the surrogate model is highlighted by the following Proposition.
Proposition 2. fN (x) in ( 9) is differentiable everywhere2 if and only if the chosen radial basis function The surrogate model in ( 9) can be used to define the surrogate preference function π≿ 4), we consider a tolerance σ ∈ R >0 to avoid using strict inequalities and equalities and define π≿ N (x i , x j ) as [3]: Suppose now that we have at our disposal the sets X in (5), B in (6) and S in (7).Then, we are interested in a surrogate model fN (x) in ( 9) that correctly describes the preferences expressed by the decision-maker, i.e. we would like the corresponding surrogate preference function π≿ N (x i , x j ) in ( 10) to be such that: This, in turn, translates into some constraints on fN (x), which can be used to estimate β.To do so, the authors of GLISp [3] define the following convex optimization problem: where: ≥0 is a vector of slack variables (one for each preference) which takes into consideration that: (i) there might be some outliers in B and S if the decision-maker expresses the preferences in an inconsistent way, and (ii) the selected radial function and/or shape parameter for fN (x) in ( 9) do not allow a proper approximation of the scoring function f (x); • r = r (1) . . .r (M ) ⊤ ∈ R M >0 is a vector of weights that can be used to penalize more some slacks related to the most important preferences.In GLISp [3], the authors weigh more the preferences associated to the current best candidate and define r as follows: • λ ∈ R ≥0 plays the role of a regularization parameter.It is easy to see that, for λ = 0, Problem ( 11) is a Linear Program (LP) while, for λ > 0, it is a Quadratic Program (QP).
Problem (11) ensures that, at least approximately, fN (x) in ( 9) is a suitable representation of the unknown preference relation ≿ on Ω which produced the data described in Section 2.1 (see Theorem 1).

Exploration function
Consider a sample x i ∈ X in (5).Its corresponding Inverse Distance Weighting (IDW) function w i : R n \{x i } → R >0 is defined as [32]: GLISp [3] uses the so-called IDW distance function z N : R n → (−1, 0], to promote the exploration in those regions of R n where fewer samples have been evaluated.It is possible to prove the following Proposition and Lemma.Proposition 3. The IDW distance function z N (x) in ( 13) is differentiable everywhere.
The proof of Proposition 3 is given in [2].Here, we derive the gradient of z N (x) in (13) (see Appendix A for its derivation).Lemma 1.The gradient of the IDW distance function z N (x) in (13) is: The gradient ∇ x z N (x) in ( 14) will be particularly relevant in the following Section.

Next candidate sample search
As previously mentioned in Section 3, a key aspect of surrogate-based methods is the exploration-exploitation dilemma.Typically, new candidate samples are sought by minimizing an acquisition function a N (x) that is a weighted sum between the surrogate model and the exploration function.In practice, fN (x) in ( 9) and z N (x) in ( 13) often exhibit different ranges and need to be rescaled.In particular, GLISp [3] adopts the following a N (x): where δ ∈ R ≥0 is the exploration-exploitation trade-off weight.The division by aims to rescale the surrogate model to make it assume a range that is comparable to that of the IDW distance function in (13), which is In this Section, we address some limitations of a N (x) in ( 15), which might prevent GLISp [3] from reaching the global minimizer of Problem (3), and define an alternative acquisition function.Furthermore, we propose a strategy to iteratively vary the exploration-exploitation trade-off.

Shortcomings of GLISp
There are two shortcomings of a N (x) in ( 15) that limit the exploratory capabilities of GLISp [3].First, the rescaling of fN (x) in ( 15), which relies on ∆ F in ( 16), only takes into account the previously evaluated samples inside X in (5) and thus it can be ineffective in making fN (x) ∆ F and z N (x) comparable over all Ω (see Problem ( 8)).Second, the IDW distance function in (13) exhibits two characteristics that can make its contribution negligible in a N (x) in (15) and complicate the selection of δ: , what we are really interested in when solving Problem (8) and, ultimately, Problem (3), are the values that it assumes for x ∈ Ω and not on its whole domain R n .In particular, there are some situations for which |z N (x)| ≪ 1, ∀x ∈ Ω. Consider, for example, the case X = {x 1 } (N = 1); then, ∀x ∈ R n \ X , the IDW distance function simply becomes: Suppose that Problem (3) is only bound constrained, i.e.Ω = {x : l ≤ x ≤ u}.Then, z 1 (x) assumes its lowest value at one (or more) of the vertices of the box defined by the constraints in Ω. Define v Ω ∈ Ω as one of such vertices; if 9) and z N (x) in (13) might not be comparable.
2. The (absolute) values assumed by the IDW distance function decrease as the number of samples increases.
To clarify this, consider two sets of samples: Given any point x ∈ R n \ X ′′ , the IDW distance functions obtained from the previously defined sets are: .
Note that w i (x) > 0, ∀x ∈ R n \ X ′′ and i = 1, . . ., N + 1 (see (12)).Hence: proving the above point.In practice, unless δ in ( 15) is progressively increased as the iterations go on, GLISp [3] will explore the constraint set Ω of Problem (3) less as the number of samples increases, regardless of whether a region that contains the global minimizer x * has been located.A visualization of these two characteristics of z N (x) in ( 13) is presented in Fig. 1.
In this work, we overcome the limitations of a N (x) in ( 15) by defining an acquisition function that is similar to the one used by the Metric Stochastic Response Surface (MSRS [30]) algorithm, a popular black-box optimization procedure.
In MSRS [30], the surrogate fN (x) and the exploration function z N (x) are made comparable by randomly sampling Ω and rescaling the two using min-max normalization.Then, Problem ( 8) is not solved explicitly but by choosing the generated random sample that achieves the lowest value for the acquisition function.Here, we propose to rescale z N (x) in ( 13) and fN (x) in ( 9) using some insights on the stationary points of the IDW distance function and solve Problem (8) explicitly, using a proper global optimization solver.

Novel rescaling strategy
In this Section, we derive the approximate locations of the stationary points of the IDW distance function in (13) and use them to define an augmented set of samples X aug ⊃ X that is suited for the min-max normalization of both z N (x) and fN (x).

Stationary points of the IDW distance function
The locations of the global maximizers of z N (x) can be deduced immediately from (13) and Lemma 1, as stated by the following Proposition.
From Item (i) we deduce that each x i ∈ X is a stationary point of z N (x).Item (ii), in conjunction with Item (iii), implies that such samples are local maximizers of the IDW distance function in (13) since there exists a neighborhood of Reaching similar conclusions for the minimizers of z N (x) is much harder; however, we can consider some simplified situations.Note that we are not necessarily interested in finding the minimizers of the IDW distance function in (13) with high accuracy, but rather to gain some insights on where they are likely to be located so that we can rescale both z N (x) in (13) and fN (x) in ( 9) sufficiently enough to make them comparable.Moreover, their approximate locations can be used to solve the following global optimization problem (pure exploration): s.t.x ∈ Ω by using a multi-start derivative-based optimization method with warm-start [26,23] (recall that z N (x) is differentiable everywhere, see Proposition 3).Problem ( 17) is quite relevant for the global convergence of the algorithm that we will propose in Section 5.
Remark 2. In the following Paragraphs, we analyze where the local minimizers of z N (x) in (13) and the solution(s) of the simplified problem: are located in some specific cases.Note that {x : l ≤ x ≤ u} ⊇ Ω (see (1)) and thus the global minimum of Problem (18) is lower than or at most equal to the global minimum of Problem (17).Therefore, the minimizers of Problem (18) are better suited to perform min-max rescaling of z N (x) than those of Problem (17).
The IDW distance function and its gradient ∀x ∈ R n \ X are: Clearly, ∀x ∈ R n \ X , the gradient is never zero since w 1 (x) > 0. Therefore, the only stationary point is the global maximizer x 1 (see Proposition 4).However, if we were to consider Problem (18), then its solution would be located at one of the vertices of the box defined by the bound constraints l ≤ x ≤ u.
The gradient of the IDW distance function ∀x ∈ R n \ X is: Let us consider the midpoint , that is such that ∥x µ − x 1 ∥ 2 = ∥x µ − x 2 ∥ 2 and for which w 1 (x µ ) = w 2 (x µ ).If we substitute it in the previous expression, we obtain: which means that x µ is a stationary point for z N (x) in (13).It is easy to see by visual inspection that such point is actually a local minimizer for the IDW distance function (see for example Fig. 2).However, note that x µ is not necessarily the global solution of Problem (18), it might just be a local one.
Case X = X (1) ∪ X (2) (N > 2) Suppose now that the samples contained in X (5) can be partitioned into two clusters: such that X (1) ∩ X (2) = ∅ and X (1) ∪ X (2) = X .Consider the midpoint between the centroids of each cluster: We make the simplifying assumption that all the points contained inside each cluster are quite close to each other, i.e.
x 1 ≈ x 2 ≈ . . .≈ x N1 and x N1+1 ≈ x N1+2 ≈ . . .≈ x N .Then, the midpoint in (19) is approximately equal to 5) .13) and its gradient ∇ x z N (x) in ( 14) in the four analyzed cases.The red boxes mark the different clusters while the red dashed lines highlight the values of x for which the first derivative of z N (x) is zero.Finally, the black vertical lines mark the midpoints (either between points or centroids of the clusters).Only a portion of all possible midpoints between centroids has been reported in the general case.Notice that the midpoints in Fig. 2c and Fig. 2d are quite close to the local minimizers of z N (x), while the midpoint in Fig. 2b is an exact local solution of Problem (18).
reaching a similar result to the one that we have seen for the case N = 2.
General case (N > 2) Any set of samples X in (5) can be partitioned into an arbitrary number of disjoint clusters, say In this case, finding the local solutions of Problem (18) explicitly, or even approximately, is quite complex.Heuristically speaking, if the clusters are "well spread" (i.e.all the points contained inside each cluster X (i) are sufficiently far away from the others in X (j) , j = 1, . . ., K, j ̸ = i), then we can approximately deduce where the local minimizers of z N (x) in ( 13) are located.For instance, Fig. 3 depicts a set of samples X that has been partitioned into three clusters,  (1)    (2)    (3)      10 Figure 3: Two-dimensional example of "well spread" clusters, highlighted with different colors.The circles denote the points contained in X while the crosses represent the centroids of X (1) , X (2) and X (3) .x µ is the midpoint between the centroids of clusters X (1) and X (2) .Finally, the blue lines highlight the distances between the samples of cluster X (3) and x µ .
X (1) , X (2) and X (3) , and for which the previous hypothesis is satisfied.In the general case, given the clusters X (i) and X (j) , we compute their centroids x c , and the corresponding midpoint x µ between them as: Going back to the example depicted in Fig. 3, if we consider the midpoint x µ between the centroids of X (1) and X (2) , due to the "well spread" hypothesis we can say that ∥x µ − x i ∥ 2 ≫ 0, ∀x i ∈ X (3) , making the contributions of the points inside the third cluster negligible when evaluating z N (x) in ( 13) at x µ .Therefore, the IDW distance function at x µ is approximately equal to: In general, given K clusters, if these are "well spread", then we can consider each possible couple of clusters separately and neglect the contributions of the remaining ones.Approximately speaking, we could split the general case into K 2 distinct problems that read as follows: find the stationary points of the IDW distance function z Ni∪j (x) in ( 13) defined from the set of samples X (i) ∪ X (j) , i ̸ = j and N i∪j = |X (i) ∪ X (j) |.Hence, rough locations of the stationary points of z Ni∪j (x) can be found by following the same rationale proposed for the previously analyzed cases.
Some one-dimensional examples of all the previously analyzed situations are reported in Fig. 2.

Min-max rescaling and augmented sample set
Given a generic set of samples X = {x 1 , . . ., x N } and a multivariable function h : R n → R, min-max rescaling (or normalization) [16] rescales h (x) as: where 3 : The objective of min-max rescaling is to obtain a function with range [0, 1], i.e. we would like to have h : R n → [0, 1].
Clearly, the quality of the normalization depends on the information brought by the samples contained inside X , as pointed out in the following Remark.Remark 3. We can observe that: 1.If X contains the global minimizer(s) and maximizer(s) of h (x), then h (x) defined as in (21) effectively has codomain [0, 1], 2. Otherwise, we can only ensure that 0 ≤ h (x i ) ≤ 1, ∀x i ∈ X .
3. In general, if we increase the amount of distinct samples in X , then the rescaling of h (x) gets better (or, worst case, stays the same).
Going back to the problem of rescaling the IDW distance function z N (x) in ( 13), if we were to apply (21) using the set of previously evaluated samples X in (5), then it would not be effective since z N (x i ) = 0, ∀x i ∈ X (see Proposition 4).Instead, we have opted to generate a sufficiently expressive augmented sample set X aug ⊃ X and perform min-max normalization using X aug instead of X .
Consider the general case described in Section 4.2.1.Then, the augmented sample set X aug can be built in the following fashion: 1. Partition the points in X (5) into different clusters.Here, for simplicity, we fix a-priori the number K aug ∈ N of clusters and apply K-means clustering [22,17,5] to obtain the sets X (1) , . . ., X (Kaug) ; 2. Compute the centroids of each cluster, using (20a), and group them inside the set X c = x (1) c , . . ., x (Kaug) c ; 3. Calculate all the midpoints x µ between each possible couple of centroids x c , using (20b); 4. Build the augmented sample set as X aug = X ∪ X µ , where X µ is the set which groups all the previously computed midpoints.
Clearly, as highlighted by (21) and Remark 3, if X aug contains points that are close (or equal) to the global minimizer(s) and maximizer(s) of z N (x) in ( 13), then the quality of the min-max rescaling of the IDW distance function improves.
Algorithm 1 formalizes these steps while also taking into consideration the case |X | ≤ K aug (for which no clustering is performed).Note that we also include the bounds l and u inside X c and X aug for two reasons: (i) l or u might actually be the solutions of Problem (18)4 and (ii) given that we also want to rescale fN (x) in ( 9), adding additional points to the augmented sample set improves the quality of min-max normalization (see Remark 3).Notice that the number of points contained inside X aug obtained from Algorithm 1 is: Therefore, to avoid excessively large augmented sample sets, K aug needs to be chosen appropriately.
As a final remark, we point out that we could perform min-max normalization in ( 21) by using the real minima and maxima of z N (x) in ( 13) and fN (x) in ( 9), which can be obtained by solving four additional global optimization problems.However, we have preferred to stick with the proposed heuristic way since we are not interested in an extremely accurate rescaling and, also, to avoid potentially large overhead times due to solving additional global optimization problems.Perform K-means clustering [22,17,5] to group the samples in X into Kaug clusters X (1) , . . ., X (Kaug ) 3: Compute the set of centroids Xc using (20a): Set Xc = X 6: Add the bounds to Xc: Xc = Xc ∪ {l, u} 7: Group all possible couples of Xc (without repetition):

Definition of the acquisition function
In this Section, we take advantage of the results on the stationary points of z N (x) presented in Section 4.2.1 to rescale the surrogate model and the exploration function.In particular, we define the following acquisition function: where fN (x) in ( 9) and z N (x) in ( 13) have been rescaled using min-max normalization as in ( 21) and X aug is generated by Algorithm 1. δ ∈ [0, 1] is the exploration-exploitation trade-off weight; also note that δ = 0 corresponds to pure exploration, while δ = 1 results in pure exploitation.a N (x) in ( 23) is similar to the acquisition function of MSRS [30] (for black-box optimization) but here we use an ad-hoc augmented sample set instead of a randomly generated one and a different exploration function.We will refer to the algorithm that we will propose in Section 5, which uses a N (x) in ( 23), as GLISp-r, where the "r" highlights the min-max rescaling performed for the acquisition function.
A comparison between the terms of the acquisition functions in (23) (GLISp-r) and in (15) (GLISp [3]) is depicted in Fig. 4. As the number of samples N increases, the absolute values of z N (x) in ( 13) get progressively smaller (see Section 4.1) and simply dividing fN (x) by ∆ F as in (15) is not enough to make the exploration and exploitation contributions comparable.Thus, unless δ in (15) is dynamically varied in between iterations of GLISp [3], solving Problem (8) with a N (x) in ( 15) becomes similar to performing pure exploitation.This, in turn, can make GLISp [3] more prone to getting stuck on local minima of Problem (3) with no way of escaping (especially if the surrogate model is not expressive enough to capture the location of the global minimizer).Vice-versa, by performing min-max rescaling as proposed in (23), the exploration and exploitation contributions stay comparable throughout the whole optimization process and approximately assume the same range.For this reason, it is also more straightforward to define δ in ( 23) compared to the weight in (15).
From Propositions 2 and 3, we can immediately deduce the following results on the differentiability of a N (x) in ( 23).
Proposition 5.The acquisition function a N (x) in ( 23) is differentiable everywhere provided that the surrogate model fN (x) in ( 9) is differentiable everywhere.
At each iteration of GLISp-r we find the next candidate for evaluation, i.e. x N +1 , by solving Problem (8) with the acquisition function in (23).It is possible to use derivative-based optimization solvers since a N (x) in ( 23) is differentiable everywhere.In general, the acquisition function is multimodal and thus it is better to employ a global optimization procedure.Moreover, a N (x) is cheap to evaluate; therefore, we are not particularly concerned on its number of function evaluations.For GLISp-r, the number of clusters used to build X aug through Algorithm 1 is K aug = 5.Notice how, for GLISp [3], the exploration function z N (x) is not comparable with the rescaled surrogate fN (x) ∆ F since it assumes values that are one to two orders of magnitude lower.

Greedy δ-cycling
Many black-box optimization algorithms explicitly vary their emphasis on exploration and exploitation in between iterations.Just to cite a few: • Gutmann-RBF [15] uses an acquisition function that is a measure of "bumpiness" of the RBF surrogate that depends upon a target value τ to aim for.The author suggests to cycle the values of τ between τ = min x∈Ω fN (x) (local search) and τ = −∞ (global search).
• The authors of MSRS [30], which uses an acquisition function that is similar to (23), propose to cycle between different values of δ as to prioritize exploration or exploitation more.
• In algorithm SO-SA [38], which is a revisitation of MSRS [30], the weight δ is chosen in a random fashion at each iteration.Moreover, the authors adopt a greedy strategy, i.e. δ is kept unaltered until it fails to find a significantly better solution.
In GLISp [3], the weight δ for a N (x) in ( 15) is kept constant throughout the whole optimization process.Also, defining some form of cycling for such hyper-parameter can be quite complex since the additive terms that compose the acquisition function are not always comparable.In this work, we propose a strategy that is in between that of MSRS [30] and SO-SA [38], which we refer to as greedy δ-cycling.We define a sequence of N cycle ∈ N weights to cycle: ∆ cycle should contain values that are well spread within the [0, 1] range as to properly alternate between local and global search.Greedy δ-cycling operates as follows.Suppose that, at iteration k of GLISp-r, we have at our disposal |X | = N samples and denote the trade-off weight δ in (23) as δ (k) to highlight the iteration number.Furthermore, assume δ (k) = δ j ∈ ∆ cycle , which is used to find the new candidate sample x N +1 at iteration k by solving Problem (8).Then, if x N +1 ≻ x best (N ) (i.e.there has been some improvement), the trade-off weight is kept unchanged, δ (k + 1) = δ (k) = δ j .Otherwise, we cycle the values in ∆ cycle , obtaining δ (k + 1) = δ (j+1)modN cycle .Thus: In Section 5, we will discuss the choice of the cycling sequence in (24) more in detail and also cover its relationship with the global convergence of GLISp-r.

Algorithm GLISp-r and convergence
Algorithm 2 describes each step of the GLISp-r procedure.As with any surrogate-based method, GLISp-r starts from an initial set of samples X , |X | = N init ∈ N, N init ≥ 2, generated by a space-filling experimental design [37], such as a Latin Hypercube Designs (LHD) [24].The sets B in (6) and S in (7), as well as the initial best candidate x best (N init ), are obtained by asking the decision-maker to compare the samples in X (5) as proposed in Algorithm 3, which prompts M = N init − 1 queries.Once the initial sampling phase is concluded, the new candidate samples are obtained by solving Problem (8).The procedure is stopped once |X | = N max , where N max ∈ N is a budget specified by the user.Overall, the decision-maker is queried N max − 1 times.
GLISp-r follows the same scheme of GLISp [3] but, additionally, at each iteration, builds the augmented sample set X aug using Algorithm 1. New candidate samples are found by minimizing the acquisition function in ( 23) instead of a N (x) in (15).Moreover, δ is cycled as proposed in Section 4.3.1.Similarly to GLISp [3], the shape parameter ϵ of the surrogate model in ( 9) is recalibrated at certain iterations of the algorithm, as specified by the set K R ⊆ {1, . . ., N max − N init }, using grid-search Leave One Out Cross-Validation (LOOCV).In particular, at each iteration k ∈ K R , ϵ for fN (x) in ( 9) is selected among a set E LOOCV of possible shape parameters as the one whose corresponding surrogate preference function π≿ N (x i , x j ) in ( 10) classifies (out-of-sample) most of the preferences in B and S correctly [3].Lastly, consistently with GLISp [3], Problem (3) is rescaled so that each decision variable assumes the [−1, 1] range (at least inside Ω).

Global convergence of GLISp-r
Whenever we are dealing with any global optimization algorithm, it is possible to guarantee its convergence to the global minimizer of Problem (3) by checking if the conditions of the following Theorem hold.1: Rescale Problem (3) as in GLISp [3] 2: Generate a set X in (5) of Ninit starting points using a LHD [24] 3: Evaluate the samples in X by querying the decision-maker as in Algorithm 3, obtaining the sets B in (6) and S in (7) Set δ = δ (j+1)modN cycle ∈ ∆ cycle (greedy δ-cycling) and j = j + 1 17: Update the set of samples X and the preference information in the sets B and S 18: Set N = N + 1 and M = M + 1 Algorithm 3 Initial queries for preference-based optimization Keep the best candidate unaltered, Theorem 2 (Convergence of a global optimization algorithm [34]).Let Ω ⊂ R n be a compact set.An algorithm converges to the global minimum of every continuous function f : R n → R over Ω if and only if its sequence of iterates, In what follows, for the sake of clarity, we denote: • X ∞ as the set containing all the elements of ⟨x i ⟩ i≥1 (infinite sequence), • X k ⊆ X ∞ as the set containing all the elements of ⟨x i ⟩ k i=1 , which is a subsequence of ⟨x i ⟩ i≥1 composed of its first k ∈ N entries.
To prove the convergence of GLISp-r, we also make use of the following Theorem, which gives us a sufficient condition that ensures the denseness of the sequence of iterates produced by any global optimization algorithm.Theorem 3 (A sufficient condition for the denseness of X ∞ [29]).Let Ω be a compact subset of R n and let ⟨x i ⟩ i≥1 be the sequence of iterates generated by an algorithm A (when run indefinitely).Suppose that there exists a strictly increasing sequence of positive integers ⟨i t ⟩ t≥1 , i t ∈ N, such that ⟨x i ⟩ i≥1 satisfies the following condition for some α ∈ (0, 1]: where: Then, X ∞ generated by A is dense in Ω. The aforementioned Theorem has been used to prove the global convergence of CORS [29], a black-box optimization procedure.Clearly, if Ω is compact and ( 26) holds for some α ∈ (0, 1] (making X ∞ dense in Ω) then, due to Theorem 2, algorithm A converges to the global minimum of every continuous function f (x) over Ω.For what concerns the preference-based framework, we need to ensure that the scoring function f (x), which represents the preference relation ≿ on Ω, is continuous.Theorem 1 gives us necessary conditions on ≿ to achieve such property.Furthermore, Proposition 1 can be used to check the existence of a ≿-maximum of Ω.The next Theorem addresses the global convergence of GLISp-r (Algorithm 2).Theorem 4 (Convergence of GLISp-r).Let Ω ⊂ R n be a compact set and ≿ be a continuous preference relation on Ω of a rational (as in Definition 3) human decision-maker.Then, provided that ∃δ j ∈ ∆ cycle in (24) such that δ j = 0 and N max → ∞, GLISp-r converges to the global minimum of Problem (3) for any choice of its remaining hyper-parameters5 .
Proof.Compactness of Ω, continuity of ≿ on Ω and rationality of the decision-maker are conditions that ensure the existence of a solution for Problem (3) (cf.Theorem 1 and Proposition 1).
Consider the sequence of iterates ⟨x i ⟩ i≥1 produced by Algorithm 2. The first N init ∈ N, N init ≥ 2, elements of ⟨x i ⟩ i≥1 are obtained by the LHD [24].Instead, each x i ∈ X ∞ , i > N init , is selected as the solution of Problem ( 8) with a N (x) in (23).Now, suppose that δ in ( 23) is cycled regardless of the improvement that the new candidate samples might bring (nongreedy cycling).Denote the exploration-exploitation trade-off weight at iteration k of Algorithm 2 as δ (k) and assume that δ(k) = δ j ∈ ∆ cycle .Then, the cycling is performed as: instead of (25).Without loss of generality, suppose that ∆ cycle in ( 24) is defined as: δ j ̸ = 0, ∀j = 0, . . ., N cycle − 2, and δ N cycle −1 = 0.
Then, every N cycle iterations Algorithm 2 looks for a new candidate sample by minimizing the (min-max rescaled) IDW distance function in (13), regardless of the surrogate model in (9), see a N (x) in (23) and Problem (8).In practice, minimizing zN (x; X aug ) over Ω is equivalent to solving arg min x∈Ω z N (x) since scaling and shifting the IDW distance function does not change its minimizers [26].We define the following strictly increasing sequence of positive integers: which is such that: Now, recall from Proposition 4 that each x i ∈ X i t ′ −1 is a global maximizer of Problem (30).Furthermore, z i t ′ −1 (x) in ( 13) is differentiable everywhere and thus continuous (see Proposition 3).Then, by the Extreme Value Theorem [1], Problem (30) admits at least a solution.Hence, we can conclude that: Clearly, due to how new candidate samples are sought (i.e. by minimizing some acquisition function over Ω), we have that (recall ( 27)): for some α ′ ∈ (0, 1].Then, by combining (31) and (32), we get: Therefore, ∃α ′ ∈ (0, 1] such that α′ = α ′ • d Ω X i t ′ −1 which satisfies the condition (26) of Theorem 3, ∀t ′ ∈ N. Thus, Algorithm 2 with δ cycled as in (28) produces a sequence of iterates that is dense in Ω.
Next, consider the greedy δ-cycling strategy proposed in Section 4.3.1 and for an arbitrary choice of ∆ cycle in (24).Let us examine the case in (25) when δ is kept unchanged from an iteration of Algorithm 2 to the other.Denote as ⟨i t ′′ ⟩ t ′′ max t ′′ =1 , t ′′ max ∈ N, the sequence of indexes of those samples that improve upon the current best candidate, resulting in no change in the exploration-exploitation trade-off weight.We have that: ∈ X i t ′′ −1 since it is strictly preferred to all the other samples in X i t ′′ −1 .Thus, we could define a positive constant α′′ ∈ R >0 analogously to (31): Finally, let us consider the greedy δ-cycling strategy in (25) as whole and assume, as in Theorem 4, that ∃δ j ∈ ∆ cycle in (24) such that δ j = 0. We can build a strictly increasing sequence of positive integers ⟨i t ⟩ t≥1 by merging: • The elements of the sequence ⟨i t ′′ ⟩ t ′′ max t ′′ =1 , which are the indexes of those samples that improve upon the best candidates x best (i t ′′ − 1) , ∀t ′′ : 1 ≤ t ′′ ≤ t ′′ max ; • The elements of the sequence ⟨i t ′ ⟩ t ′ ≥1 , which constitute the indexes of those samples found by solving the pure exploration problem in (30).Note that, unless Algorithm 2 always improves upon its current best candidate (in which case ⟨i t ′′ ⟩ t ′′ max t ′′ =1 is actually infinite and hence X ∞ is dense in Ω), Problem (8) with a N (x) in ( 23) is solved using δ = 0 infinitely often, although not necessarily every N cycle iterations as in (29).
Most preference-based response surface methods, such as the ones in [3,6,13,4], do not address their convergence to the global minimum of Problem (3).In this work, we have shown that, by leveraging some results from the utility theory literature (see Section 2), we can find sufficient conditions on the preference relation of the human decisionmaker (≿ on Ω) that guarantee the existence of a solution for Problem (3) and allow us to analyze the convergence of any preference-based procedure as we would in the global optimization framework.
We conclude this Section with two Remarks on Theorem 4. The first deals with the importance of adding a zero entry inside ∆ cycle in (24), while the second addresses the selection of the cycling set.Remark 4. The omission of a zero entry inside ∆ cycle in (24) does not necessarily preclude the global convergence of Algorithm 2 on all possible preference-based optimization problems.For example, if f (x) is a constant function then, after we evaluate the first sample x 1 , any other point brings no improvement (i.e.we would have The caveat is that, if ∄δ j ∈ ∆ cycle such that δ j = 0, then the sequence ⟨i t ′′ ⟩ t ′′ max t ′′ =1 for which (34) holds is likely to be finite (i.e.GLISp-r does not improve upon its current best candidate infinitely often).Moreover, differently from Problem (30), we have no guarantee that the solutions of: with a N (x) defined as in (23) and for δ ̸ = 0, are not already present in X N .Therefore, we cannot ensure the existence of a strictly increasing sequence of positive integers that is infinite and for which (26) holds.Instead, the result in Theorem 3 does not apply for ⟨i t ′′ ⟩ t ′′ max t ′′ =1 , since the sequence is finite.Consequently, Algorithm 2 does not necessarily produce a sequence of iterates ⟨x i ⟩ i≥1 that is dense in Ω, preventing its convergence on some (but not all) preference-based optimization problems.
authors in [3].Lastly, for GLISp-r, we select K aug = 5 for all optimization problems since, empirically, after several preliminary experiments, it has proven to be good enough to rescale z N (x) in (13) and fN (x) in ( 9) in most cases (see for example Fig. 4).
We run the procedures using a fixed budget of N max = 200 samples and solve each benchmark optimization problem N trial = 100 times, starting from N init = 4 • n points generated by LHDs [24] with different random seeds.To ensure fairness of comparison, all the algorithms are started from the same samples.
For the sake of clarity, we point out that the preferences between the couples of samples are expressed using the preference function in (4), where f (x) is the corresponding cost function for the considered benchmark optimization problem.Therefore, the preferences are always expressed consistently, allowing us to compare GLISp-r and GLISp/C-GLISp [3,41] properly.That would not necessarily be the case if a human decision-maker was involved.

Results
We compare the performances of GLISp-r and GLISp/C-GLISp [3,41] on each benchmark optimization problem by means of convergence plots and data profiles [1].Convergence plots depict the median, best and worst case performances over the N trial instances and with respect to the cost function values achieved by x best (N ), as N increases.Data profiles are one of the most popular tools for assessing efficiency and robustness of global optimization methods: efficient surrogate-based methods exhibit steep slopes (i.e.fast convergences speeds), while robust algorithms are able to solve more (instances of) problems within the budget N max .In general, no method is both efficient and robust: a trade-off must be made [33].Typically, data profiles are used to visualize the performances of several algorithms on multiple benchmark optimization problems simultaneously.However, due to the stochastic nature of LHDs [24], here we will also depict the data profiles for GLISp-r, GLISp [3] and C-GLISp [41] on each benchmark optimization problem on its own, highlighting how the algorithms behave when started from different samples.In practice, data profiles show, for 1 ≤ N ≤ N max , how many among the N trial instances of one (or several) benchmark optimization problem(s) have been solved to a prescribed accuracy, defined as [1]: where f * = min x∈Ω f (x).In particular, here we consider a benchmark optimization problem to be solved by some algorithm when acc (N ) > t, t = 0.95.In what follows, the results achieved by GLISp-r equipped with ∆ cycle = ⟨0.95⟩and ∆ cycle = ⟨0⟩ are reported only in the data profiles (and not in the convergence plots), to make the graphs more readable.
Fig. 5 shows the cumulative data profiles of GLISp-r, GLISp [3] and C-GLISp [41], which result from considering all the benchmark optimization problems simultaneously.Instead, Fig. 6 and Fig. 7 depict the convergence plots and the data profiles achieved by the considered algorithms on each benchmark.In Table 1 we report the number of samples required to reach the accuracy t = 0.95, i.e.: In practice, given that each benchmark optimization problem is solved multiple times, N acc>t in Table 1 is assessed median-wise (over the N trial instances), giving us an indication on the efficiency of each method.In the same Table , we also report the percentages of instances of problems solved by GLISp-r, GLISp [3] and C-GLISp [41] (which is an indicator of robustness) and the average execution times of each algorithm.
In practice, GLISp [3] shines when exploitation is better suited for the benchmark optimization problem at hand.That is particularly relevant for the bukin 6 [18] problem, on which both GLISp [3] and GLISp-r with ∆ cycle = ⟨0.95⟩("pure" exploitation) perform quite well.Lastly, C-GLISp [41] is as robust as GLISp-r (∆ cycle = ⟨0.95,0.7, 0.35, 0⟩) but is the least efficient among the analyzed procedures.• The pure exploration strategy (i.e.GLISp-r with ∆ cycle = ⟨0⟩) performs poorly, even for n = 2.When n = 5, it is unable solve any problem (in fact, the data profiles stay flat after the initial sampling phase).Only for n = 1 the pure exploration strategy is quite robust and relatively efficient.
• Vice-versa, a "pure" exploitatory approach (i.e.GLISp-r with ∆ cycle = ⟨0.95⟩),although not necessarily globally convergent (see Theorem 4), can actually be successful on some benchmark optimization problems.Often, such strategy exhibits a slightly lower N acc>t (median-wise) than the other procedures, see Table 1.Notably, the data profiles of GLISp-r (∆ cycle = ⟨0.95⟩)can be quite similar to those of GLISp [3].Therefore, we could say that GLISp [3] has limited exploratory capabilities.• The main disadvantage of GLISp-r, compared to GLISp [3] and C-GLISp [41], is the increased computational time, as reported in Table 1.That is due to the computational overhead of Algorithm 1, which generates the augmented sample set X aug for the proposed procedure by performing K-means clustering.On average, GLISp-r (∆ cycle = ⟨0.95,0.7, 0.35, 0⟩) is 31% slower than GLISp [3] and 72% slower than C-GLISp [41].However, we point out that these overheads are practically negligible when the queries to the decision-maker involve running computer simulations or performing real-world experiments which, contrary to the considered benchmark optimization problems, can take from a few minutes up to several hours.This is a common assumption made by surrogate-based methods: at each iteration, the most time-consuming operation is the query to the decision-maker (or, in the context of black-box optimization, the measure of the cost function [37,19]).

Conclusions
In this work, we introduced the preference-based optimization problem from a utility theory perspective.Then, we extended algorithm GLISp [3], giving rise GLISp-r.In particular, we have addressed the shortcomings of a N (x) in ( 15), defined a new acquisition function and proposed the greedy δ-cycling strategy, which dynamically varies the exploration-exploitation weight δ in (23).Furthermore, we have proven the global convergence of GLISp-r, which is strictly related to the choice of the cycling sequence ∆ cycle .To the best of our knowledge, GLISp-r is the first preference-based surrogate-based method with a formal proof of convergence.
Compared to the original method, the proposed extension is less likely to get stuck on local minima of the scoring function and proves to be more robust on several benchmark optimization problems without particularly compromising its convergence speed.Moreover, we have also considered algorithm C-GLISp [41], which improves the exploratory capabilities of GLISp [3] by employing a different exploration function.In our experiments, we have observed that, even though C-GLISp [41] is as robust as GLISp-r, it often exhibits slower convergence rates compared to GLISp [3] and the proposed extension.
Further research is devoted to extending GLISp-r in order to handle black-box constraints, which involve functions whose analytical formulations are not available.One possibility is to follow the same reasoning behind C-GLISp [41], which does so by adding a term to the acquisition function that penalizes the exploration in those regions of Ω where the black-box constraints are likely to be violated.Additionally, a compelling line of research concerns preferencebased optimization problems in the case where the human decision-maker is "irrational" (in a sense that some of the properties in Definition 3, typically completeness, do not hold).From a practical perspective, when the preference relation ≿ on Ω of the DM is not complete, we would need a surrogate model that is able to handle the answer "I do not know" when the decision-maker cannot decide which, among two calibrations, he/she prefers.percentage of (instances of) problems solved and (c) average execution times (in seconds).The acronym n.r.stands for "not reached".The best results are highlighted in bold font.We also report the average performances over all the considered benchmark optimization problems.Note that the averages for indicator (a) do not consider those benchmarks for which the algorithms did not reach the desired accuracy.

In ( 7 )
, ℓ : N → N and κ : N → N are two mapping functions that associate the indexes of the samples, contained inside X , to their respective preferences in B. The cardinalities of these sets are |X | = N and |B| = |S| = M .Also note that 1 ≤ M ≤ N 2 .

Figure 1 :
Figure 1: Examples of the IDW distance function z N (x) in (13) for different numbers of points (N = 2 on the left, N = 6 on the right) and −3 = l ≤ x ≤ u = 3.Notice how z N (x) does not cover its whole range (−1, 0], at least inside the bound constraints, and its absolute values decrease as the number of samples increases.

Figure 2 :
Figure 2: One-dimensional examples of the IDW distance function z N (x) in (13) and its gradient ∇ x z N (x) in(14) in the four analyzed cases.The red boxes mark the different clusters while the red dashed lines highlight the values of x for which the first derivative of z N (x) is zero.Finally, the black vertical lines mark the midpoints (either between points or centroids of the clusters).Only a portion of all possible midpoints between centroids has been reported in the general case.Notice that the midpoints in Fig.2cand Fig.2dare quite close to the local minimizers of z N (x), while the midpoint in Fig.2bis an exact local solution of Problem(18).

Algorithm 1
Computation of X aug for min-max rescaling Input: (i) Set of samples X in (5); (ii) Number of clusters Kaug ∈ N; (iii) Lower bounds l ∈ R n and upper bounds u ∈ R n of Problem (3).Output: (i) Augmented sample set Xaug ⊃ X .1: if |X | > Kaug then 2:

Figure 6 :
Figure 6: Performances achieved by the considered preference-based optimization algorithms on benchmark optimization problems: convergence plots on the left and data profiles (acc (N ) > 95%) on the right.GLISp-r (with ∆ cycle = ⟨0.95,0.7, 0.35, 0⟩) is depicted in red, GLISp [3] in blue and C-GLISp [41] in green.The dashed black-line in the convergence plots represents f * .We also show the number of initial samples, N init , with a black vertical line.The results obtained by GLISp-r with ∆ cycle = ⟨0.95⟩(dashed red line) and GLISp-r with ∆ cycle = ⟨0⟩ (dotted red line) are shown only in the data profiles.

Table 1 :
Comparison between the considered preference-based optimization algorithms.Several indicators are taken into account: (a) efficiency indicator, i.e. number of samples required to solve the benchmark optimization problems to an accuracy of 0.95 (median-wise), as defined in (38), (b) robustness indicator, i.e.