An Adaptive Consensus Based Method for Multi-objective Optimization with Uniform Pareto Front Approximation

In this work we are interested in stochastic particle methods for multi-objective optimization. The problem is formulated via scalarization using parametrized, single-objective sub-problems which are solved simultaneously. To this end a consensus based multi-objective optimization method on the search space combined with an additional heuristic strategy to adapt parameters during the computations is proposed. The adaptive strategy aims to distribute the particles uniformly over the image space, in particular over the Pareto front, by using energy-based measures to quantify the diversity of the system. The resulting gradient-free metaheuristic algorithm is mathematically analyzed using a mean-field approximation of the algorithm iteration and convergence guarantees towards Pareto optimal points are rigorously proven. In addition, we analyze the dynamics when the Pareto front corresponds to the unit simplex, and show that the adaptive mechanism reduces to a gradient flow in this case. Several numerical experiments show the validity of the proposed stochastic particle dynamics, investigate the role of the algorithm parameters and validate the theoretical findings.


Introduction
Motivated by the problem in which two or more objectives must be considered at the same time, even though they may conflict with each other, in this work we are interested the design of stochastic algorithms for multi-objective optimization. This type of problem is commonly found in everyday life, for example, in physics, engineering, social sciences, economy, biology, and many others [15,20,39,40,45]. Investing in the financial market while maximizing profit and minimizing risk or building a vehicle while maximizing performance and minimizing fuel consumption and pollutant emissions are examples of multi-objective optimization problems.
From a mathematical viewpoint the problem can be formulated through a variable x ∈ R d describing a possible decision and assuming that g i (x) is the i-th objective for i = 1, . . . , m, with m ∈ N being the total number of objectives. A multi-objective problem then requires to solve for a decision x min where g(x) = (g 1 (x), . . . , g m (x)) . A solution to (1.1) corresponds to several optimal decisions. Here, we consider optimality in the sense of Pareto [40], i.e., no objective can be improved without necessarily degrade another objective. Without additional information about subjective preferences, there may be a (possibly infinite) number of Pareto optimal solutions, all of which are considered equally good. Therefore, the optimization tasks consist of providing a set of optimal decisions. To this end, it is also desirable to have a diverse set, that is, addressing the problem not only by optimizing fitness, but also by aiming to cover a variety of user-defined features of interest, in order to best describe the (possibly) broad set of optimal decisions. Several methods have been proposed to numerically solve (1.1) and, as for single-objective optimization, they typically belong to either the class of metaheuristic algorithms or mathematical programming methods [49]. Among metaheuristics [50], multi-objective evolutionary algorithms [15], such as NSGA-II [16] and MOEA/D [54], have gained popularity among practitioners due to their flexibility and ease of use. At the same time, they usually lack of convergence analysis compared to mathematical programming methods. For more details on mathematical programming methods and evolutionary algorithms in multi-objective optimization we refer to the recent surveys [14,20].
We are interested in a particular class of stochastic particle optimization methods, called consensus-based optimization (CBO), which has recently gained popularity due to the use of mean-field techniques that can provide them with a rigorous mathematical foundation. Such methods consider interacting particle systems described by stochastic differential equations (SDEs) that combine a drift towards the estimated minimum and random exploration of the search space [2,10,11,30,31,48,51]. These approaches have been extended also to optimization problems over hypersurfaces [27][28][29], constrained optimization [5,13] and multi-objective optimization [6]. From a mathematical viewpoint, this class of metaheuristic methods is inspired by the corresponding mean-field dynamics based on particle swarming and multi-agent social interactions, which have been widely used to study complex systems in life sciences, social sciences and economics [17,43,44,46,53]. These techniques have proven fruitful to demonstrate convergence towards a global minimum for single-objective problems, not only in the case of CBO methods, but also for the popular Particle Swarm Optimization (PSO) algorithm [33,38], thus paving the way to provide a mathematical foundation for other metaheuristics.
In the same spirit, the authors proposed in [6] a multi-objective optimization algorithm (M-CBO) by prescribing a CBO-type dynamics among several particles making use of a scalarization strategy. Scalarization strategies are a common tool in multi-objective optimization [40] as they allow to translate problem (1.1) into a set of parametrized single-objective problems, which can be solved simultaneously in the case of particle-based optimization methods. In this paper, we provide a convergence analysis for the method based on the mean-field description of the M-CBO dynamics. Furthermore, we improve the method in order to capture with uniform accuracy the shape of the Pareto front. This is done by iteratively updating parameters of the method to minimize specific diversity measures. Mathematically, this last feature is achieved by enlarging the phase space of the particles. A detailed analysis of the extended model is also presented by studying a mean-field approximation of the particle dynamics which allows to recover convergence guarantees towards optimal points and also underline a gradient-flow structure in the space of the parameters.
Recently, energy-based diversity measures have gained popularity in the multi-objective evolutionary optimization community due to their flexibility, scalability and theoretical properties [14]. In this formulation, a set of decision is diverse if it corresponds to a minimal configuration of a suitable two-body energy potential, breaking the problem down into finding such configurations. In the proposed algorithm, we obtain this by inserting a Vlasov-type dynamics in the space of the parameters. We prove that the particle dynamics can be written in the more general framework of non-local interaction equations over bounded domains. The later topic has been recently investigated e.g. in [12,25,26,47].
The rest of the paper is organized as follows. In Section 2 we formally introduce the concept of optimality for (1.1) and present the scalarization strategy. Next, in Section 3 we illustrate the particle dynamics both in the search space and in the space of parameters. Section 4 is devoted to the mathematical analysis of the system evolution using a mean-field description. Finally, in Section 5 numerical examples on convex and non-convex as well as disjoint Pareto fronts are presented which confirm the theoretical results as well as the performance of the new method. Some concluding remarks are discussed in the last section.

Problem definition and scalarization
We will use the following notation. Let a ∈ R n , |a| indicates its euclidean norm and (a) l its l-th component, while for A Borel set A ⊂ R n , |A| indicates its Lebesgue measure. The symbols ≺ and indicates the partial ordering with respect to the cone R m >0 and R m ≥0 respectively.

Pareto optimality and diversity
When dealing with a vector-valued objective function g : with m ≥ 2, the interpretation of the minimization problem (1.1) is not unique, as the image space R m is not fully ordered. We consider the notions of strong and weak Edgeworth-Pareto optimality which rely on the natural, component-wise, partial ordering on R m [40].
is a minimal element of the image set g(R d ) with respect to the natural partial ordering, that is if there is no x ∈ R d such that Alike,x is weakly EP optimal, if there is no x ∈ R d such that The set F x = {x ∈ R d |x is EP optimal} constitutes the set of optimal EP points, while The multi-objective optimization problem (1.1) consists of finding the set of EP optimal points. Unlike single-objective problems, the set is typically uncountable and the optimization task involves finding a finite subset of optimal points. Those should ideally cover F and the concept of diversity is introduced to distinguish between two approximations [15]. Intuitively, if points on the Pareto front are more distanced, the diversity is higher. In view of the minimization problem, having a diverse approximation is desirable as it provides at the same cost a broader variety of possible solutions.
The most diverse approximation possible is possibly given by a set of point which is uniformly distributed over the Pareto front. Quantifying the diversity of an optimal set is of paramount importance both, to assess the performance of optimization methods and to design them. Indeed, oftentimes the heuristic of a specific method is constructed to specifically minimize, or maximize, a specific measure [14]. Without knowledge of the exact Pareto front, popular diversity measures are given by hypervolume contribution [55], crowding distance [16] and, recently, by the Riesz s-energy [23,42]. Our proposed algorithm will aim to minimize the latter (or similar energybased measures) as it can be embedded in a mean-field framework. The exact definitions are introduced later.
To sum up, the multi-objective optimization problem we consider is a two-objective task itself, as one needs to find a set of points which are both EP optimal and optimize a suitable diversity measure.

Scalarization strategy
A popular way to approach (1.1) is to use a scalarization strategy [19,40] which reduces the multiobjective problem to a (finite) number of single-objective sub-problems. Among the possible scalarization strategies, we consider the approximation sub-problems with weighted Chebyschev semi-norms [40] where the single objectives are given by and are parametrized by a vector of weights w which belongs to the unitary, or probability, simplex For each w ∈ Ω the subproblems then read The link between the scalarized problems and the original multi-objective problem is given by the following result. b) Assume all sub-problems (2.2) attains an unique minimum. Then,x is EP optimal if and only ifx is the solution to (2.2) for some w ∈ Ω.
Theorem 2.1 shows the strength of the Chebyschev scalarization strategy which allows to find all the weakly EP optimal points, contrary to other strategies like linear scalarization [40]. We remark that the proposed algorithm can also be applied to solve any other scalarized problems of the form (2.2) where the parameters are take from the unitary simplex Ω.
Even though solving N sub-problems with corresponding weights vectors {W i } N i=1 ⊂ Ω ensures to find N optimal points, we note that there is no guarantee to obtain a diverse approximation. Therefore, scalarization targets only one of the two objectives of the problem, without addressing the diversity of the solution. In the following, we introduce an algorithm where the parameters W i are dynamically changed during the computation to obtain a set of EP points which is also diverse.

Adaptive multi-objective consensus based optimization
We propose a dynamics where N ∈ N particles interact with each other to solve N scalar subproblems given in the form (2.2). We introduce the dynamics as a continuous-in-time process and leave the definition of the actual discrete optimization method to Section 5.
At a time t ≥ 0, every particle is described by its position X i t ∈ R d and its vector of weights W i t ∈ Ω which determines the optimization sub-problem the particle aims to solve. As a result, particles are described by N tuples The initial configuration is generated by sampling the positions X i 0 from a common distribution ρ 0 ∈ P(R d ) and by taking uniformly distributed weights vectors W i 0 over Ω. The dynamics is prescribed to solve the multi-objective optimization task. We recall that (1.1) not only requires to find optimal points, but also points that are diverse, that is, well-distributed over the Pareto front. To this end, the optimization process is made of two mechanisms which address these two objectives separately.

A consensus based particle dynamics in the search space
The first mechanism prescribes the update of the position {X i t } N i=1 , such that they converge towards EP optimal points. As in [6], this is done by introducing a CBO-type dynamics between the particles. To illustrate the CBO update rule, let us consider for the moment a fixed single-objective sub-problem parametrized by w ∈ Ω. Similar to Particle-Swarm Optimization methods, in CBO dynamics at time t > 0, the particles instantaneously move towards an attracting point Y α t which is given by a weighted average of their position: .
Due to the coefficients used in (3.1), if α 1, Y α t (w) is closer to the particles with low values of the objective function G(·, w) and, in the limiting case, it holds if the above minimum uniquely exists. This promotes the concentration of the particles in areas of the search space where the objective function G(·, w) attains low values and hence, more likely, a global minimum. We remark that the exponential coefficients correspond to the Gibbs distribution associated with the objective function and, moreover, that this choice is justified by the Laplace principle [18]. The later is an essential result to study the convergence of CBO methods [30] and it states that for any absolutely continuous probability density ρ ∈ P(R d ) we have G(x, w) .
Since in the multi-objective optimization dynamics each particle addresses a different subproblem, each of them moves towards a different attracting point given by Y α t (W i t ). The drift strength is given by λ > 0, while another parameter σ > 0 determines the strength of an additional stochastic component.
The time evolution of the particles positions is determined by a system of SDE where B i t are d-dimensional independent Brownian processes. The matrices D i t characterize the random exploration process which might be isotropic [48] Both explorations depend on the distance between X i t and the correspondent attracting point making the stochastic component larger if the particle is far from Y α t (W i t ). The difference lays on the direction of the random component: while in the isotropic exploration all dimensions are equally explored, the anisotropic one explores each dimension with a different magnitude.
The expected outcome of the position update rule (3.2) is that every particle will find a minimizer of a sub-problem and hence, by Theorem 2.1, a weak EP optimal point. We have already mentioned that if the weights vectors are fixed to the initial, uniform, distribution there is no guarantee to obtain equidistant points on the front. Since it is impossible to determine beforehand the optimal distribution on Ω, we propose a heuristic strategy which updates the vector weights promoting diversity.

Uniform approximation of the Pareto front
A popular diversity metric in multi-objective optimization is the hypervolume contribution metric [55], which has the drawbacks of being computationally expensive [3] and, by definition, dependent on an estimate of g. Motivated by this and by the objective of designing algorithms which perform well for any shape of the Pareto front [42], new energy-based diversity measures have recently gained popularity [14,23]. Such measures quantify the diversity of a given empirical distribution ρ N ∈ P(R d ) by considering the pairwise interaction given by a two-body potential U :

5)
g#ρ N being the push-forward measure of ρ N . The problem of finding well-spread points over the Pareto front is then equivalent to finding a configuration which is minimal with respect to the given energy U where we recall that F x is the set of EP optimal points: Any energy U describing short range repulsion between particles, like Monge energy or repulsive-attractive power-law energy, is in principle a candidate to be a diversity measure. The Riesz s-energy given by is a popular choice [23] due to its theoretically guarantees of being a good measure of the uniformity of points. Indeed, if F is a (m − 1)-dimensional manifold, the minimal energy configuration ν N converges to the uniform Hausdorff distribution over F as N → ∞. We refer to [36] for the precise statements of the result and more details. Inspired by the electrostatic potential between charged particles, the authors in [9] used a Newtonian potential which is also empirically proven to be a suitable diversity measure [8,9]. See [23] for a numerical comparison between two-body potentials as diversity measures in evolutionary algorithms. We will also compare different energies in Section 5 and consider U ∈ C 1 (R m \ {0}) to be any of the above. Exact computation of minimal energy configurations of a system of N particles is a well-studied problem as it is connected to e.g. crystallization phenomenon [4]. We note that, in our settings, the configuration ρ N is additionally mapped to the image space in (3.5), making the task even harder. Therefore, we propose an heuristic strategy that is expected to find only suboptimal configurations.
To promote diversity, we let the particles follow a vector field associated with U. The movement will be only in parameter space {W i t } N i=1 in order not interfere with the CBO optimization dynamics acting on the positions Intuitively, if two particles are close to each other in the image space g(R d ), their weights vectors are pulled apart. This resemble a short range repulsion of U . To ensure W i t remains in the unitary simplex Ω, a projection to the tangent cone T (W i t , Ω) for all h ∈ R m is required, see also [12] for more details. A parameter τ ≥ 0 determines the time scale of the weights adaptation process with respect to the CBO dynamics (5.1). The process can be turned off for τ = 0. In case of bi-objective problems, where m = 2, we therefore obtain a Vlasov-type dynamics which is well-defined as the parameters space is embedded in the image space R m . If U has singularity in 0, we set ∇U (0) = 0. We note that the additional minus sign in (3.7), is due to explicit form of the relation determined by Theorem 2.1 between the Pareto front and Ω.
This will become clear in the next section, as this choice gives a gradient flow structure to the parameters dynamics. For m > 2, the relation between a weight vector w ∈ Ω and the correspondent (weakly) EP optimal point is more involved. Nevertheless, we prescribe a suitable heuristic dynamics as follows: let U be given as then the parameters dynamics reads for all i = 1, . . . , N . The term r (·) determines the strength and the sign of the interaction , As before, the projection step is needed due to the boundedness of Ω. Even though (3.8) can also be used when m = 2, we will consider in the next section (3.7) only.
Up to our knowledge, energy-based diversity metrics have only been used a selection criterion between candidate approximation of the Pareto front [24,42], and this is the first time the vector field associated to U is used to guide the particle dynamics in a metaheuristic multi-objective optimization method.

Mean-field analysis of the particle dynamics
In this section, we give a statistical description of the optimization dynamics by presenting the corresponding mean-field model, which allows us to analyze the convergence of the method towards a solution to the multi-objective optimization problem. We restrict ourselves to the case where m = 2 and the dynamics in Ω is given by (3.7). The particle dynamics is given by (5.1) and (5.2), respectively.
Similar to [48], we formally derive the mean-field equation of the large system (3.2), (3.7) by making the so-called propagation of chaos assumption on the marginals. In particular, let F N (t) be the particles probability distribution over (R d × Ω) N at a time t ≥ 0. We assume that In the following, we indicate with ρ(t) ∈ P(R d ) the first marginal of f (t) and with µ(t) ∈ P(Ω) the second marginal on the parameters space Ω. As a consequence of the propagation of chaos assumption, we obtain that and that The dynamics (3.2), (3.7) is now independent on the index i and we obtain the process (X t , W t ), t > 0 as with initial conditions f (0, x, w) = ρ 0 ⊗ µ 0 , µ 0 being the uniform distribution over the unit simplex Ω, µ 0 = Unif(Ω). The nonlinear partial differential equation (4.2) is a mean-field description of the microscopic dynamics generated by the optimization dynamics described in Section 3. We note that the rigorous mean-field limit for single-objective CBO dynamics, which are similar to (3.2), was proven in [37]. Following previous works, see e.g. [10,30], we consider such an approximation and mathematically analyze the proposed optimization method by studying a solution f to (4.2).

Convergence to the Pareto front
In the following, we assume f ∈ C [0, ∞), P 2 (R d × Ω) to be a solution to (4.2) with initial data given by ρ 0 ∈ P 2 (R d ), µ 0 ∈ P(Ω). We assess the performance of a multi-objective algorithm by the average distance form the Pareto front F and we use the Generational Distance (GD) [52] given by where ρ(t) is the first marginal of f (t). In the following, we state conditions such that DG[ρ(t)] decays up to a given accuracy ε > 0. Assumption 4.1 (Uniqueness). Every sub-problem (2.2) w ∈ Ω admits a unique solution The uniqueness requirement is common in the analysis of CBO methods [30]. This is due to difficulty to control the attractive term y α , whenever there are two or more minimizers. For example, assume ν is a measure concentrated in two different global minimizes of a sub-problem w, ν = δX 1 + δX 2 : the attractive term could be located in the middle between them being obviously not(!) minimizer. The regularity assumption onx follows form the transport term in (4.2) with respect to w, and may be dropped if the interaction in the weights space is not present. The next assumption requires that all scalar objective functions G(x, w), w ∈ Ω have a common lower and upper bounds in a neighborhood of the minimizer. See also [32] and the references therein for more details on the following conditions. Assumption 4.2 (Stability at the minimizer). In a neighborhood of their minimizer, G(·, w), w ∈ Ω are p-conditioned and satisfy a growth condition: there exists a radius R > 0, exponents p > 2, q > 1 and constants c 1 , c 2 > 0 such that for all w ∈ Ω Moreover, outside such a neighborhood, the function cannot be arbitrary close to the minimum: there exists c 3 > 0 such that for all w ∈ Ω Finally, we assume the optimal EP points to be bounded. As in other CBO methods we also prescribe a condition on the initial data ρ 0 and µ 0 .
then, it holds: For any accuracy ε, 0 < ε < E f 0 |x −x(w)| 2 , if and if α is sufficiently large, there exists a time T > 0 such that Moreover, for all t ∈ [0, T ] it holds We remark that the choice of α depends on the estimates given in Assumption 4.2 and in particular on the accuracy ε. and, for all t ∈ [0, T ], where ρ(t) ∈ P(R d ) is the first marginal of f (t).
Proof. Since every sub-problem admits a unique solution (Assumption 4.1), by Theorem 2.1 every solutionx(w) is EP optimal and therefore its image g(x(w)) belongs to the Pareto front F . Therefore from which follows that the generational distance GD is bounded by the average 2 -error.
Theorem 4.1 and Corollary 4.1 show that CBO mechanism is able to successfully solve all sub-problems (2.2) simultaneously. In the next section, we will analyze the dynamics in the parameters space Ω to investigate the diversity of the computed solution.

Decay of diversity measure
The aim of interaction (3.8) is improve the distribution of the parameters {W i t } N i=1 so that, in view of Theorem 2.1, the corresponding (weak) EP optimal points are well-distributed in the image space. Under suitable assumptions, such dynamics corresponds to a gradient flow on the unitary simplex Ω.
For any sub-problem (2.2) parametrized by w ∈ Ω, letx(w) be one of its global minima, which we assume exists. As we are interested in the relation between w and its correspondent point on the Pareto front g (x(w)) ∈ F , let us formally insert in the mean-field model (4.2) solutions of the form where µ(t) ∈ P(Ω). In ansatz (4.7), the location x of the particle (x, w) corresponds exactly to a solutionx(w) to its sub-problem w ∈ Ω. This is justified by the convergence result (Theorem 4.1) and by the fact that the positions dynamics takes place at a faster time scale then the parameters adaptation, see Remark 4.1.
The reduced mean field equation in strong form is then given by and, the marginal µ(t) over Ω fulfills where for simplicity we introducedḡ := g •x.
Assumption 4.4. The Pareto front F is exactly the unitary simplex Ω and the potential energy U is radially symmetric.
Under Assumption 4.4 and thanks to relation (4.10) between Ω and F , Theorem 4.2 states that the energy over the front is decreasing. We note that the flow may convergence to the stationary points of (4.8) that are not minimal configurations, as observed in [25] for even simple domains.
Clearly, without ansatz (4.7), there is no guarantee that the potential decreases along the evolution of the algorithm. Quite the opposite, by Theorem 4.1 particles are expected to concentrate on the Pareto front leading to an increased potential U. Nevertheless, by Theorem 4.1 there exists a time T > 0 where |x −x(w)| 2 df (T, x, w) < ε and hence x ≈x(w) making ansatz (4.7) valid. Therefore, we claim that the reduced model (4.11) describes the dynamics for t > T . We will numerically investigate two phases of the algorithm: the first one when concentration over the Pareto happens, and the second when the potential U decays leading the an improved diversity of the solution.

Numerical experiments
In this section, we numerically investigate the performance of the proposed method by testing it against several benchmark multi-objective problems.
The adaptive multi-objective consensus based optimization (AM-CBO) algorithm is obtained from an Euler-Maruyama time-discretization of (3.2) and (3.7) (or (3.8) if m > 2). Let ∆t > 0 be a fixed time-step. For k = 0, 1, . . . , the particles positions are iteratively updated according to . The update rule (5.1) is overparametrized and in CBO optimization schemes typically λ = 1 is used.
Similar to the projected gradient flow scheme used in [47], we replace the instantaneous projection to the tangential space T (w, Ω) by the projection Π Ω to Ω, and discretize the dynamics in Ω as for m = 2, while for m > 2 it reads The complete optimization method is described by Algorithm 1. A remark on the computational complexity follows.

Algorithm 1: AM-CBO
Set parameters: α, λ, σ, τ, ∆t Initialize the positions: For the sake of reproducible research, in the GitHub repository https://github.com/ borghig/AM-CBO an implementation in MATLAB code of the proposed algorithm is made available.
Remark 5.1. Even though in every iteration the objective function g is evaluated only N times, the overall computational complexity is O(N 2 ) because the computation of Y i k (w) requires O(N ) computations, as well as the parameters update (3.8) which is particularly costly.
One can reduce the computation complexity by considering only a random subset I M k ⊂ {1, . . . , N } of M N particles when computing (3.1) and (5.2), by substituting whenever a sum over the different particles is performed. Inspired by Monte-Carlo particle simulations [1,41], this mini-random batch technique allows to lower the complexity to O(N M ).
We also note that that Fast Multipole Methods (FMM) [34] may additionally be used to speed up the computation of the potential field, Then, the computational complexity of (5.2), (5.3) is further reduced.

Performance metrics
Denote by {X i k } N i=1 the set of particle positions at the k-th algorithm iteration and their empirical distribution by ρ N k ∈ P(R d ). We employ three different energies, the Riesz s-energy (3.6), Newtonian and the Morse potentials, both to measure the solutions diversity and to determine the dynamics of the vector weights. The Newtonian binary potential is given by while the Morse potential is given All considered potentials describe short-range repulsion between the particles. While the Morse potential isλ-geodetically convex, the Newtonian and Riesz repulsion are not. Since we will also employ the corresponding energies U R , U N , U M to define the interaction between parameters, the constant C can be considered as an algorithm parameter when the Morse repulsion is used.
To show the validity of the energy-based diversity metrics, we additional consider the hypervolume contribution metric S [55]. Let g * ∈ R m be a maximal element with respect to the natural partial ordering y i ≺ g * j for all y ∈ F , the hypervolume measure is given by the Lebesque measure of the set of points between the computed solution and the maximal point g * , that is Maximizing S has been shown to lead to a diverse approximation of the Pareto front [21]. In Section 4.1, the convergence of the mean-field dynamics towards the Pareto front is shown by studying the evolution of the Generation Distance GD (4.3). In the experiments, we approximate this quantity by considering a reference approximation {y j } M j=1 of the front with M = 100 points y i ∈ F , i = 1, . . . , M for every test problem. More details on the reference solution are given in Appendix A. For simplicity, we indicate the numerical approximation of the Generational Distance again by GD, which is defined by The Inverted Generational Distance IGD is also considered. It consists of the average distance between the points of the reference solution {y i } M j=1 and the computed front Contrary to GD which only measures the distance form the Pareto front, IGD takes in account the diversity of the computed solution, too. Hence, IGD is also a suitable indicator of the optimality of the solution.

Test problems
Test problems with diverse Pareto front geometries are selected to show the performance of the proposed method. In the Lamé problems [22] the parameter γ controls the front curvature: we use γ = 0.25, 1, 3 to obtain convex, linear and concave fronts respectively. We also consider the DO2DK [7] problems with k = 2, s = 1 and k = 4, s = 2. Here, the Pareto fronts have more complex geometries as they are not symmetric and, in one case, discontinuous. All above problems are scalable to any dimension of the search space d and in the image space m. For presentation purposes, we restrict ourselves to bi-objective optimization problems by setting m = 2, but consider possibly large d. In this case, the fronts analytical description are known, allowing us to obtain reference solutions. The problems definitions are recalled in Appendix B for completeness.
In this section, we use Algorithm 1 in four different scenarios 1. No parameters interaction τ = 0; 2. Riesz potential (3.6), with τ = 10 −5 ; 3. Newtonian potential (5.4), with τ = 10 −3 ; 4. Morse potential (5.5), with τ = 10 −1 , C = 20; The first scenario clearly corresponds to the standard M-CBO approximation, while the others to different AM-CBO strategies. To validate model (4.11) and Theorem 4.2, we update the parameters according to (5.2). The initial weights vectors {W i 0 } N i=1 are taken (deterministically) uniformly distributed over Ω, while the particle positions are uniformly sampled over [0, 1] d , d = 10. We employ N = 100 particles, which evolve for a maximum of k max = 5000 steps. The remaining parameters are set to λ = 1, σ = 4, α = 10 6 . This parameter choice consists of a compromise between the optimal parameters of each problem. Anisotropic diffusion (3.4) is used and a projection step ensures the particle positions remain in the search space [0, 1] d , which is the same for all considered problems. Fig. 1 shows the computed solutions, in the image-space, in the four different scenarios. Regardless of the interaction on Ω, the particles always converge towards EP optimal points and hence to the Pareto front. By definition of the Chebyshev sub-problems (2.2), a uniform distribution in Ω leads to an uniform distribution of the particles over the front only when F is linear (as in the Lamé problem γ = 1). Indeed, Fig. 1 shows that the particles are well distributed even when there is not weights interaction (τ = 0). If the front geometry differs from this straight segment, the optimal parameters distribution on Ω differs form the uniform one. In particular, subsets of the Pareto front which are almost parallel to the axis are difficult to approximate without any interaction in the parameter space, see for instance Lamé γ = 0.25 and the DO2DK problems in Fig. 1. When using τ = 0, the solutions improves as the particles are more distributed over the entire front.

Problem
Interaction   Table 1 reports the performance metrics for all the problems. For most problems, the strategy τ = 0, with no interaction in Ω allows to reach lower values of GD. This is consistent with the analytical results Theorem 4.1 and Remark 4.1, which suggested that the additional dynamics may interfere with the CBO mechanism and, as a consequence, slow down the convergence towards optimal EP points. If the diversity metrics U R , U N , U M and S are considered, dynamics including interaction of parameters allow to obtain more diverse solutions. Interestingly, using Morse binary potential in the interaction leads to a final lower Newtonian energy in some cases. We will investigate the role of the potential choice and τ in the next section.
In Fig. 1 the IGD performance shows that letting particles interact in parameter space improves the overall quality of the solution. While the improvement is more substantial in problems with complex Pareto fronts (see for instance Lamé γ = 0.25, or DO2DK k = 2), we remark that the additional mechanism allows to obtain better solutions. This is even true, if the parameter distribution is already optimal form the beginning (see Lamé γ = 1). We conjecture that this due to the additional stochasticity introduced by the potential. We will also study this aspect in the next subsection.
Figs. 2a and 2b show the time evolution of GD, U R , U N , U M and IGD for two of the considered test problems. As suggested by the analysis of the mean-field model, in particular Theorem 4.1, GD exponentially decays up to a maximum accuracy within the first iterations of the algorithm. This is due to the CBO dynamics driving the particles around EP optimal points. At the same time, the potential energies increase as the particles are concentrating towards the front in the image-space. Another consequence of Theorem 4.1 is that assumption (4.7) is fulfilled and consequently the gradient-flow description (4.11) is valid. This is also observed in Figs. 2a and 2b where the potentials start decreasing provided that relatively low GD values are attained.

Effect of the parameter τ and scalability
By looking at the computational results, it becomes clear that the two phases of the algorithm, the one characterized by the CBO dynamics and the one characterized by the gradient-flows dynamics, have different scales. Typically, the former dynamics is much slower compared with the second one. This was consistent with assumptions to Theorem 4.1, where τ needs to be taken of order o( √ ε). To experimentally investigate the importance, we test the algorithm for various values of τ , keeping the remaining parameters fixed. Figs. 3a and 3b show the final GD and IGD metrics when different binary potential are used during the computation. As expected, relatively large values of τ lead to a strong interaction in parameter space that interferes with the CBO mechanism. As a result, the GD metric increases for large values of τ . Interestingly, the lowest GD values are not always attained for the smallest values of τ , suggesting that the additional weights vectors dynamics might help the CBO mechanism in optimizing the sub-problems.
The IGD metrics in Fig. 3b,shows that the optimal value of τ is different for each test case. In particular, DO2DK problems benefit from a strong interaction in parameter space. This might be explained by the front geometry ( Fig. 1): the front length is long and, as consequence, the particles tend to be further apart in the image space, making the binary potential interaction weaker. Larger values of τ mitigate this effect, leading to better algorithm performances. If the extrema of the Pareto front are known in advance, one could address this issue by estimating the front length and choosing the parameter τ accordingly. We also note that algorithm seems to perform better when the Morse potential is used during the computation. As already mentioned, the dynamics in Ω adds stochasticity to the particles position evolution. Hence, the additional diffusive term σD i k B i k in (5.1) might not be necessary. Yet, taking σ = 0 yields poor approximations of the Pareto front, see Fig. 6b, suggesting that the diffusive term is still of paramount importance for the particles exploration behavior and their statistical independence. From Fig. 6a, it is obvious that the optimal diffusion parameter σ is larger, the smaller τ is. In particular, if τ = 0 the particles diverge from the optimal EP points only when σ > 10, which is consistent with other CBO methods for single-objective optimization, see for instance [2]. At the same time, for some problems, if σ is too small, larger values of τ improve the convergence towards optimal points. Finally, we test the algorithm performance for different dimensions d of the search space, keeping the same parameters choice. If the same number N = 100 of particles are used, the IGD of the computed solutions increases as the space dimension d becomes larger, see

Conclusions
In this work, we proposed an adaptive stochastic particle dynamics based on consensus to solve multi-objective optimization problems. The method makes use of a scalarization strategy the break down the original problem into N parametrized single-objective sub-problems. The proposed algorithm, AM-CBO, extends prior work on multi-objective consensus based optimization by an additional adaptive dynamics in the parameter space in order to ensure that the particles distribute uniformly over the Pareto front. This is achieved by exploiting energy-based diversity measures. A rigorous mathematical analysis and numerical evidence are provided to validate this behavior. We theoretically investigated the long time behavior of the particle dynamics under the propagation of chaos assumption and establish convergence towards optimal points. Indeed, under appropriate assumptions, the particles are capable of solving several single-objective problems at the same time, with a remarkable save of computational cost with the respect to a naive approach. The additional dynamics on the parameter space is also analyzed based on results on non-linear aggregation equations. Numerical experiments show that the proposed method is capable to solve multi-objective problems with very different Pareto fronts. The algorithm scales well with the problem dimension, even when using a relatively small number of particles.

A Construction of reference solutions
Even if we assume there exists an analytical representation of the Pareto front F , finding an M -approximation of F which also minimizes a given two-body potential is a computationally expensive task, which is related to the already mentioned crystallization problem in physics [4].
In [9], this was achieved by using mathematically programming techniques, while in [24] the authors proposed the following heuristic strategy: generate N M points on the front and iteratively delete the point subject to the highest potential energy until only M are left. In this appendix, we propose a different heuristic strategy which not only generates low-energy approximations of the front, but also provide more insight into the choice of the proposed update strategy (3.8).
In the following, we assume F to be a (m − 1)-dimensional manifold with known chart g ∈ C 2 (V, F ) where V = [0, 1] m−1 or V = Ω. We also assume the tangential space T (y, F ) to be well-defined for all y ∈ F . Let {G i t } M i=1 ⊂ F describe the positions at time t ≥ 0 of M particles interacting over the front under a potential U , that is with some given initial conditions G i 0 =ḡ(Z i 0 ), for all i = 1, . . . , M . Our heuristic strategy is based on the conjecture that, as t → ∞, the system will eventually converge towards a lowenergy configuration. Rather then solving (A.1) in R m where F is embedded, we consider the

B Problem definition
We report here the problems definition, together with the penalization strategy and known parametrization of F . The Lamé [22] and the DO2DK [7] problems are originally formulated as constrained multi-objective optimization problems where the feasible domain is given by H = [0, 1] d . Moreover the set of EP optimal points corresponds the edge [0, 1] × {0} d−1 . Adding a projection step to H has a relevant impact on the algorithm dynamics, as any point belonging to the cone R × R d−1 ≤0 is projected to an EP optimal point. Therefore, we make use of an exact penalization strategy to ensure the particles remain the feasible region adding a 1 -penalty term of the form βdist(x, H), β > 0, to the original objective functions.