Consensus-Based Optimization for Multi-Objective Problems: A Multi-Swarm Approach

We propose a multi-swarm approach to approximate the Pareto front of general multi-objective optimization problems that is based on the Consensus-based Optimization method (CBO). The algorithm is motivated step by step beginning with a simple extension of CBO based on fixed scalarization weights. To overcome the issue of choosing the weights we propose an adaptive weight strategy in the second modelling step. The modelling process is concluded with the incorporation of a penalty strategy that avoids clusters along the Pareto front and a diffusion term that prevents collapsing swarms. Altogether the proposed $K$-swarm CBO algorithm is tailored for a diverse approximation of the Pareto front and, simultaneously, the efficient set of general non-convex multi-objective problems. The feasibility of the approach is justified by analytic results, including convergence proofs, and a performance comparison to the well-known non-dominated sorting genetic algorithm (NSGA2) and the recently proposed one-swarm approach for multi-objective problems involving Consensus-based Optimization.


Introduction
Multiple conflicting objective functions occur in a variety of applications ranging from engineering design to economic and financial decisions.Economical goals are often in conflict with ecological criteria, we have to trade-off between expected return and risk, and we aim at affordable yet high quality products.We refer to the textbooks [Ehr05] and [Mie99] for a general introduction to the topic of multi-objective optimization.In this paper, we focus on continuous multi-objective optimization problems (MOP) (MOP) min x∈X f (x) = (f 1 (x), . . ., f p (x)), with a non-empty feasible set X ⊂ R d , d ∈ N, and with p ≥ 2 continuous objective functions f i : X → R, each of which has a unique global minimum on X .We consider unconstrained problems, i.e., X = R d , as well as box-constrained problems where X = [ , u] with lower and upper bounds , u ∈ R d with j ≤ u j , j = 1, . . ., d.
We refer to R d as the decision space and to R p as the objective space of (MOP).The set Y := f (X ) ⊂ R p is called the feasible outcome set of (MOP).Two feasible solutions x 1 , x 2 ∈ X are compared based on their respective outcome vectors z 1 = f (x 1 ) and z 2 = f (x 2 ): We say that z 1 dominates z 2 (and x 1 dominates x 2 ) denoted by z 1 z 2 if and only if z 1 i ≤ z 2 i for all i = 1, . . ., p and z 1 = z 2 .A feasible solution x ∈ X is called Pareto optimal or efficient if there is no other solution x ∈ X such that f (x) f (x).The corresponding image in the objective space is called non-dominated in this case.The set of all Pareto optimal solutions is denoted by X P and referred to as the efficient set, and the set of all non-dominated outcome vectors, i.e., Y P = f (X P ), is called the non-dominated set or the Pareto front of (MOP).Note that since we assume that each objective function has a unique global minimum on X , the Pareto front of (MOP) is non-empty since it contains these individual minima.
For later reference we phrase the following observations as remark.
Remark 1 (see, e.g., [Ehr05]).When λ ∈ Λ 0 , then an optimal solution x(λ) ∈ X of (1) is at least weakly efficient for (MOP), i.e., there is no x ∈ X such that f i (x) < f i (x(λ)) for all i = 1, . . ., p.Note also that when Y + R p + is non-convex, then it can not be guaranteed in general that all efficient solutions can be obtained as an optimal solution of a weighted-sum scalarization.
The goal of this paper is to develop a provably convergent, yet efficient algorithm for high-quality representations of Pareto fronts of multi-objective optimization problems.We choose the Consensus-based Optimization method (CBO) as basis for the algorithm as it is a particle method for single-objective global optimization problems which is easy to implement and allows for analytical studies.Its dynamics is governed by an interacting particle system that allows to propagate information of the individuals through a weighted mean.The weighted mean has two advantages: (a) there is no need to label individuals as current best, and (b) there is no need for pairwise interactions as the dynamic of one particle depends only on the weighted mean of the whole swarm.Advantage (a) allows for the derivation of a corresponding mean-field equation [HQ22] which can be employed for analytical studies such as the large time behaviour and the convergence to the global minimizer.Indeed, in [CJLZ21,CCTT18] it is shown that the invariant solution of the mean-field dynamics is a Dirac-delta located arbitrarily close to the global minimizer of the objective function.
For completeness we recall the CBO dynamics for N particles in the single-objective case, i.e., for p = 1.Hence, let f : R d → R + denote the objective function.For λ, σ > 0 the dynamics of the j-th particle X j : [0, T ] → R d , j = 1, . . ., N , at time t ∈ [0, T ] is given by where the weighted mean v t is given by , ρ 0 is a probability distribution on the state space and B j : [0, T ] → R d , j = 1, . . ., N are independent Brownian motions.Here and in the following we use the anisotropic noise term σ diag(X j t − v t ) dB j t as proposed in [CJLZ21] as it is shown to be more robust in settings with high-dimensional state space.
The idea of the multi-swarm CBO algorithm we propose in Section 2.1 and further develop in Section 2.2 is based on Theorem 1. Indeed, each swarm is associated with a weight vector that yields a scalarization of the cost function.Following the CBO dynamic the swarms globally minimize their respective scalarized problems giving us an approximation of the Pareto front.For a diverse approximation we introduce interactions between the swarms in Section 2.2.The multi-swarm CBO algorithm with adaptive swarms is analyzed in Section 2.3, where we show under appropriate assumptions that each swarm clusters at a point along the Pareto front and the approximation obtained by the points of all swarms is diverse in the sense that the clustering points admit a distance greater than a minimal distance.
As mentioned above, weighted sum scalarization can have issues in case of nonconvex problems.In particular, efficient solutions do not necessarily correspond to global minima of the weighted sum scalarization.By introducing a penalization strategy (in Section 3) that avoids clustering along the Pareto front we allow swarms to also converge towards efficient unsupported solutions, i.e., solutions which are not located on the convex hull of Y.In the modelling process we focus on the weighted means of the swarms and tailor the dynamics such that the means form a high quality, i.e., representative and well-distributed approximation of the Pareto front.In Consensus-based Optimization the swarms are forced to collapse in the large time limit [CCTT18,PTTM17].This is disadvantageous in our setting, as we would loose local information about the Pareto front.Hence, we prevent collapsing swarms by implementing the diffusion term from Consensus-based sampling [CHSV21].
Each modelling step is illustrated by simulation results.To not interrupt the flow, we present the details of the simulations in Section 4.2, where all parameters and implementation details are reported.We conduct a qualitative study comparing the proposed multi-swarm CBO with the recently proposed single-swarm CBO for multiobjective problems [BHP22] and the well-known NSGA2 algorithm implemented in [BD20] in Section 5, before we conclude the article and give an outlook to future work.

CBO for convex multi-objective problems
We begin with the presentation of a multi-swarm approach for multi-objective optimization based on CBO (MSCBO) with fixed scalarization weights and provide analytical results.Fixed weights come with the burden of choosing appropriate weights a priori, which may result in highly unbalanced Pareto front approximations [DD97].To overcome this problem, we propose an adaptive weight strategy in the second part of this section.
2.1.MSCBO with fixed scalarization weights.We begin with a simple CBO method for multi-objective problems that is gradually extended in the following: Let K ∈ N be the number of swarms involved and let (λ k ) k=1,...,K ⊂ Λ be their respective fixed scalarization weights.For each k we introduce the swarm size N k ∈ N and the positions of the individuals are denoted by The evolution of the swarms is given by the dynamics where the weighted mean v k t is given by (3c) , σ > 0 is a diffusive strength parameter, ρ 0 ∈ P 2 (R d ) is a probability measure with finite second moment, B k,j are independent Brownian motions and α > 0 allows to scale the difference of the local and global minima in the objective function (Laplace principle).For notational convenience we define N = K k=1 N k and the vector X A pseudo code for the dynamics is given in Algorithm 1.
Remark 2 (Initialization of the weights).In order to obtain admissible initial conditions for the weight vectors λ k , k = 1, . . ., K, we draw uniformly distributed random samples from the standard simplex S p = Λ 0 .Thus, each weight vector has p − 1 degrees of freedom and we can write For the numerical tests we use Algorithm 2 from [OW09] to sample the initial weights λ k for k = 1, . . ., K.
Remark 3. In contrast to the dynamics proposed in [Bor22,BHP22], where the interacting particles in the swarm converge towards different efficient solutions, here each swarm (namely its weighted mean) represents one solution.That means each swarm is supposed to find one point along the Pareto front and the swarms act independently.In Section 2.2 we implement interactions between the swarms through adaptive scalarization weights.
2.1.1.Analysis of the independent swarms.Note that in the dynamics proposed above the K swarms are independent and each is globally minimizing its weighted sum (1) given by λ k .Hence, we may employ the result in [TW20] to prove the well-posedness of the dynamics.
Theorem 2 (Well-posedness [TW20]).Let K ∈ N, (λ k ) k=1,...,K ⊂ Λ 0 , N k ∈ N fixed and f λ k locally Lipschitz continuous for every λ k , then system (3) admits a unique strong solution {X N t : t ≥ 0} for each initial condition Proof.By the independence of the swarms, we first apply Theorem 1 in [TW20] to every subsystem describing the evolution of each swarm and collect the corresponding solutions in X N for the full system.
As the convergence analysis on the particle level is highly nontrivial, we follow the steps in [CJLZ21,CCTT18] and derive statistical representations of the swarms in form of mean-field equations.Under the standard assumption of propagation of chaos, i.e. that the particles decouple as N k → ∞, we denote the probability of finding a member of the k-th swarm in position x at time t by ρ k t (x).Following the lines of [CJLZ21,CCTT18] we compute the evolution equation of these probabilities with the help of Itô-calculus.Towards this end, let for simplicity N k = N for all k = 1, . . ., K. For N → ∞ the PDE system corresponding to (3) reads with the weighted mean given by and σ > 0, ρ 0 ∈ P 2 (R d ) as above.Again, using the independence of the K swarms we can directly apply the result in [CCTT18] to prove the convergence of each swarm towards a limit point arbitrarily close to the efficient set.To state the result properly we define The convergence theory for CBO [CCTT18] is based on the (strong) assumption that the minimizer of the objective function is unique.This can be guaranteed under a similarly strong uniqueness assumption for the minimizer of the scalarized objective functions, i.e., (A1) xk := argmin x∈X f λ k (x) is unique, and In addition, we inherit the following technical assumption from the convergence theory of CBO [CCTT18] (A2) where Then, Y = Y P is the line segment connecting the two points (0, 1) and (1, 0) in R 2 , and while the global minimizers of f 1 and of f 2 are unique, the complete feasible set X is optimal for f λ with λ = ( 1 2 , 1 2 ).Theorem 3 (Convergence towards the efficient set [CJLZ21]).Let (A1) and (A2) hold.Further, let α and the initial condition ρ 0 be chosen such that for each k ∈ {1, . . ., K} it holds Then for each k ∈ {1, . . ., K} it holds that V k (t) → 0 exponentially fast and there exist xk such that v[ρ k t ] → xk , E k (t) → xk exponentially fast.Moreover, for α → ∞ the limit points xk are arbitrarily close to the efficient set.In particular, for every > 0 there exists α 1 such that xk ∈ B (x k ).
Proof.As the k swarms are independent the result is a direct consequence of Theorem 3.1 in [CJLZ21].
Corollary 1.Let f i be strictly convex functions for all i ∈ {1, . . ., p} and let Assumption (A2) be satisfied.Then for each point of the efficient set there exists λ k ∈ Λ 0 such that the swarm corresponding to λ k and following the dynamics (3) concentrates arbitrarily close to a point in the efficient set.
Proof.The result follows from Theorem 1 in combination with Theorem 3.
The previous results are based on the fact that the weights are fixed.It therefore remains open to choose the weights appropriately.This is not a trivial task, as it is for example well-known that an equidistant choice of weights does not necessarily lead to an equidistant resolution of the Pareto front, even in the case of convex problems [DD97].In order to circumvent the problem of choosing appropriate weights, we propose an adaptive procedure including dynamic weights in the next section.

MSCBO with dynamic weight adaption.
Already in the biobjective case the choice of scalarization weights is nontrivial.In fact, assuming that f 1 and f 2 are continuously differentiable, the optimal choice of the weights depends on the ratio This observation motivates to incorporate adaptive weights in the multi-objective CBO method.The adaption dynamics are written as an ODE.In order to circumvent the restriction λ k j ∈ [0, 1], we use the bijective transformation and formulate the dynamic weight adaption in terms of µ k .The scalarized cost function then reads .
Remark 5. Note that λ k ∈ Λ instead of Λ 0 implies that only efficient solutions with bounded trade-off can be determined.Since efficient solutions with unbounded trade-off usually correspond to extreme cases that are not very interesting from a practical point of view, this does not impose a strong restriction.
The dynamic of adaptive weights is based on a pairwise interaction of swarms k and if their respective weight vectors µ k and µ are close to each other, or the distance of their weighted means, v k t and v t , in the objective space is small, i.e., if the swarms map to similar points f (v k t ) and f (v t ) in the objective space.Let Then the repulsion is modelled by means of interaction potentials, where in the case c rep < ∞ these potentials are assumed to have compact support.
For the numerical tests we use Morse potentials [DCBC06] which are given by ( 4) The parameters R, A denote the strength of the repulsion and attraction, respectively, while r, a allow to adjust the range of the attractive and repulsive force, respectively.

Remark 6. We choose to have strong repulsive forces on a very short range and attractive forces on a medium range. Hence on larger ranges interactions can be neglected and we consider only interactions with direct neighbors in our main theorem analysing the approximation of the Pareto front (Theorem 5).
For interacting particle systems, see for example [BPT + 20], the force on µ k that results from the interaction with µ = µ k is given by the corresponding gradient (5) The main goal we are pursuing with the adaptive weights is to obtain a diverse approximation of the Pareto front.We therefore add another interaction strategy that depends on the distance of the weighted means in the objective space.Following the spirit of (5) we replace the distances d k, in the prefactor by the distance of the objective function values d f k, , leading to the second force given by Altogether, this leads to the following forces for the adaptive weight adjustment More generally, one can use any force with similar properties to define an adaptive weight adjustment given by d dt Here τ > 0 is a parameter that allows us to control the time scale of the interaction dynamics.In fact, it will become important in Section 2.3, where we assume that the adaption of the scalarization weights happens much faster than the interaction dynamics and concentration process of the swarms.We recover the original weights .
Combining (3) and ( 7) yields a CBO dynamic for multi-objective problems with self-adaptive scalarization weights.We want to emphasize that this combination introduces interdependencies between swarms.Moreover, the dynamics of the weights lead to dynamic scalarized cost functions.A pseudo code of this method can be found in Algorithm 2.

Numerical comparison of static and dynamic weights.
To illustrate the effect of the dynamic weight adaption, we solve the test problem Schaffer1, see Section 4 for more details.The weight vectors are initialized equidistantly in (0, 1) like shown with the blue points in Figure 1a.The corresponding front found by solving (3) is shown in Figure 1b.The distribution of the adaptive λ k is given by the orange points in Figure 1a and the corresponding front in Figure 1c.We note that equally distributed weights stress the left part of the front.The adaptive weighting counteracts this effect by shifting the weights to the right, which results in a better resolution of the front.All other details regarding the numerical results are given in Section 4.
The results indicate that the proposed method works fine for convex problems.However, we note that we may run into problems in non-convex settings.Indeed, as discussed in Remark 1, we can only find convex parts of the front by globally minimizing weighted sum scalarizations.For general and possible non-convex problems we propose another extension in the next section.Before we enter the part of the general problems we analyse the method with adaptive weights in terms of well-posedness, mean-field approximation and convergence.

Well-posedness of the SDE system with adaptive weights.
To analyse the SDE system with adaptive weights we introduce the notation Y t = [X t , µ t ] where X N t = (X k t ) k=1,...,K and µ N t = (µ k t ) k=1,...,K .Moreover, we write the SDE system in vectorized form using the notation where To obtain a well-posedness result for the system with adaptive weights, we make the following assumptions: The interaction force K is locally Lipschitz and grows at most linearly in x, y, µ or ν.In particular, it holds Remark 7. As usual in particle interaction dynamics the forces resulting from physical potentials such as the Morse potential proposed above, do not satisfy the regularity assumption of ODE theory.In order to comply with these assumptions we have to consider a smoothed version, as for example, Similar to the single objective case, the proof of the well-posedness of the SDE system with adaptive weights is based on the following technical lemma. where .
Similar to [CCTT18] the terms can be bounded as follows: To bound I 4 and I 5 we first note that where ).As these estimates are independent of j = 1, . . ., N k and k = 1, . . ., K, this proves that G X is locally Lipschitz.Note that G µ is locally Lipschitz by Assumption (A3).
Moreover, it is easy to see that holds which leads to the linear growth of G X .Further, the linear growth of G µ is ensured by Assumption (A3).
Based on this Lemma we establish the following well-posedness and existence result.
Theorem 4. Let the assumptions of Lemma 1 be satisfied.For each K ∈ N and N k ∈ N for k = 1, . . ., K, system (8) admits a unique strong solution {Y t : t ≥ 0} for any initial condition Proof.The proof is based on [Dur96, Chapter 5.3, Theorem 3.1].Note that due to Lemma 1 we only have to show that there exists b > 0 such that it holds For the first term we obtain, using (A3) and (A4), for some constant C 2 > 0. Altogether, this proves the desired estimate and Theorem 3.1 in [Dur96, Chapter 5.3] yields the result.
Remark 8.Note that the regularized potential proposed in Remark 7 satisfies the regularity assumptions of Theorem 4. 2.2.3.Mean-field limit with adaptive weights.Formally passing to the mean-field limit yields a coupled PDE/ODE system given by with the weighted mean given by and τ, σ > 0, ρ 0 ∈ P 2 (R d ) as above.Note that the coupling through the weights makes the proof of convergence towards the efficient set considerably more difficult as there is a trade-off between the concentration of the swarms and the balancing of the weight vectors.This is addressed in the next section.
2.3.Diverse approximation of the Pareto front.The aim of this section is to prove that the proposed scheme leads to a diverse approximation of the Pareto front in the case of strictly convex biobjective problems, i.e. f : R d → R 2 .In fact, we prove that each swarm concentrates at one point along the Pareto front in the long-time limit, t → ∞.The proof builds on results of the single-objective CBO scheme in [CCTT18,FKR21].Certainly, we require additional assumptions that we motivate and formulate in the following.
For the sake of a simple presentation of the idea, we consider the smoothed version of the Morse interaction potential discussed in Remark 7.Moreover, we assume that the initial condition of the scalarization weights and the parameters of the interaction potentials are chosen such that the scalarization weights do not leave the domain This can be ensured by choosing A f , R f , a f and r f such that we have short-range repulsion and mid-range attraction and negligible interaction in the long-range.Note that this assumption allows us to work directly with normalized scalarization weights 1 for all k = 1, . . ., K. In fact we do not require the reformulation in terms of µ that is used for the numerics.
To be more precise, we consider the dynamics Note that As all other quantities on the right-hand side are scalar, the λ k stay on the constraint λ k 1 + λ k 2 = 1 for all times t > 0 once they are normalized.As a consequence, the dynamics of the vectors λ k are uniquely defined by the dynamics of one of their components.Without loss of generality we chose the first component here and obtain where we use the notation Note that the unique representation by the first component allows us to sort the scalarization weights of the swarms.Without loss of generality, we assume in the following that Moreover, due to negligible long-range interaction we only consider interactions with direct neighbors.
The strategy of the proof is as follows: first, we exploit the relationship of the non-dominated points f (x(λ)) corresponding to a given λ ∈ Λ (i.e., x(λ) is the unique optimum of (1) with weight λ) and the scalarization weight λ itself.In fact, we assume that for some problem dependent matrix S(λ k , λ ).Second, using the chain rule, we notice that Now, we employ a Lyapunov argument for all swarms at once, which will show that the non-dominated points f (x(λ k )) spread along the Pareto front with • 2 -distance d min in the long time limit, where d min is the unique root of u(d).
Let us state the assumptions and the theorem: (A5) The Pareto front of the biobjective problem f : R d → R 2 is strictly convex and connected.(A6) The parameters of the interaction potential satisfy A, R = 0 and A f , R f , a f and r f are such that the interaction potential U models short-range repulsion and mid-range attractions; long-range interactions are negligible.In particular, we assume that there is a unique root d min of u(d) such that all K non-dominated points f (x(λ k )), k = 1, . . .K can spread along the front with Euclidean distance greater than d min .Moreover, we assume that each swarm is only interacting with its direct neighbors (in the sense of the sorted weight vectors).(A7) The initial values of the scalarization weights For each pair λ k , λ with k, ∈ {1, . . ., K}, k = , there exists a negative definite matrix S(λ k , λ ) ∈ R 2×2 such that (A9) For each λ ∈ [ λ , 1 − λ ] there exists a negative definite matrix T (λ) such that (A10) The product T (λ)S(λ) with is symmetric.Here T (λ k ) are as defined in (A9) and S(λ k , λ ) as defined in (A8) for each k, = 1, . . ., K, k = .(A11) The dynamics of the scalarization weights and the dynamics of the swarms have different time scales.In fact, the adaption of the scalarization weights is much faster than the dynamics of the swarms.We first state the result for the limiting case α = ∞ as this ensures that f (v[ρ k t ]) = f (x(λ)) by the Laplace principle [FKR21].

Theorem 5. Let (A1)-(A11) hold and α
x e −αf λ k (x) dρ k t (x), k = 1, . . ., K, yields a diverse approximation of the Pareto front of f in the long time limit.In particular, each swarm concentrates at x(λ k ) with pairwise distance In other words, the stationary solution of the dynamics consists of K Dirac-measures located at the points x(λ k ) for k = 1, . . ., K which have a pairwise Euclidean distance greater or equal d min .
Proof.First note that due to α = ∞, we have have . This allows us to focus only on the dynamics of the scalarization weights in the following.Indeed, we follow the steps motivated above and begin with some observations on the pairwise interactions.Using (A6), (A8) and (A9) we obtain for k = 1, . . ., K. For simplicity we set the summand to zero whenever < 1 or > K.
In our strictly convex setting, the sorting of the weights λ k induces a sorting of the points f (x(λ 1 )), . . ., f (x(λ K ) in decending order long the f 1 -axis.To reconstruct the vectors f (x(λ k )), k = 1, . . ., K at t > 0, it is sufficient to know their initial values and the relative positions from k to k + 1.This will be exploited in the following.
The positive definiteness of T (λ)S(λ) allows us to define a norm • T S induced by the scalar product •, T (λ)S(λ)• .The time evolution of V(q) can now be computed as Hence V is a Lyapunov functional for q and hence in the long-time limit the dynamics stabilizes such that ∇V = 0.In particular, by (A6) that means the Euclidean distances of the non-dominated points is greater than d min .Now, τ in (A11) allows us to balance the times of the stabilization of the scalarization weights and the time to collapse the swarms.For τ → 0 the scalarization weights converge very fast to a diverse approximation of the Pareto front.The scalarization weights are then stationary and the swarms concentrate at their weighted average as discussed in Theorem 3.This concludes the proof.
We want to emphasize that in this setting each of the swarms admits a stationary solution which is the efficient point along the Pareto front where the swarm concentrates.In contrast, the particles of the dynamic proposed in [BHP22] may move along the front for all times as Theorem 4.1 in [BHP22] shows that the density of the particles move towards the Pareto front, but no stationarity is shown.Before presenting an example satisfying (A5)-(A10) we comment on the case α < ∞.
Remark 9. We expect to obtain a similar result for α < ∞ using the quantitative Laplace principle [FKR21].Clearly, the points f (v[ρ k t ]), k = 1, . . ., K will only be in an -neighborhood of the Pareto front, where 0 < 1 depends on α.The details of the proof are beyond the scope of this article.

Example for (A5)-(A10).
A priori it is not clear that any problem satisfies the assumptions (A5)-(A10).We therefore present an example with two swarms represented by the weighting vectors λ 1 , λ 2 ∈ Λ in the following.Note that biobjective convex quadratic optimization problems have been extensively studied in the literature, therefore parts of the following analysis, including optimality conditions and the subsequent sensitivity analysis, can also be found, e.g., in [BDGK21,Hil01,TABH19].
Let us consider the biobjective stricly convex quadratic problem Let λ ∈ [0, 1] and consider the scalarized objective Note that this is equivalent to a weighted sum scalarization (1) with (λ 1 , λ 2 ) = (λ, 1 − λ).The efficient solutions parameterized by λ ∈ [0, 1] can be computed using the first order optimality condition for f λ w.r.t. the variable x ∈ R: The function P : [0, 1] → R 2 given by P (λ) = f (x(λ)) is continuous, therefore (A5) holds.Moreover, we obtain for the difference of the non-dominated points of the two swarms as As we assume that all entries of λ 1 , λ 2 are bounded away from zero and one, the matrix is invertible.This allows us to define S(λ 1 , λ 2 ) through its inverse Concerning (A9) we compute Hence, we find is negative definite and analogous for k = 2.As T (λ 1 ), T (λ 2 ) and S(λ 1 , λ 2 ) are all diagonal, it is easy to check that (A10) holds.
Up to here, we mainly focused on convex objective functions since only convex parts of the efficient set can be obtained by global minimization of weighted sum scalarizations, see [DD97] for more details.In Section 3 we introduce a penalization of clusters in the objective space in order to obtain well-dispersed representations of the Pareto front and, as an intended side-effect, reach into non-convex parts of the Pareto front.

MSCBO for general multi-objective problems
As discussed in the previous section, we cannot expect the present dynamics to work well in non-convex settings.We therefore propose a penalization strategy in order to obtain dynamics that lead to reasonable approximations for general multi-objective problems.

Penalization of clusters in the objective space.
To motivate the penalization strategy we analyze the approximation obtained with the adaptive weight algorithm applied to a test problem with dent, see Section 4.2 for more details.
Figure 2 shows the approximation of the Pareto front and of the efficient set that is obtained when using the algorithm with adaptive weights.We observe that the Figure 2. Illustration of the clustering of the weighted averages in non-convex settings.For an idea of the shape of the true front see Figure 3.
weighted means form clusters along the convex part of the Pareto front while the non-convex part is not recovered.
To overcome this issue, we propose a penalization strategy that avoids clusters along the Pareto front and tends to a uniform distribution of points along the Pareto front.Towards this end, we consider the potential leading to the interaction forces above, which is given by (4).As we want to penalize clusters in the objective space, we use the distance in the objective space leading to the penalization term By construction, the penalty term is smaller the farther away the objective of a particle is from the objective of the weighted means of the other swarms.The penalization term is added to the scalarization leading to a new cost given by Here β allows us to balance the objective and the penalization.Using the properties of the exponential function, we can rewrite the weighted average from (3c) as .
Clearly, for β = 0 we are in the previous case without penalization of clustering in the objective space.With increasing values of β > 0 the impact of this penalization in the optimization process is increased.A pseudo code can be found in Algorithm 3.

Numerical results for adaptive weights and penalization.
The following results are obtained with the adaptive weight dynamics with additional penalization given in (12).We observe that the penalization prevents the clustering and the swarms are able to approximate also the non-convex part of the Pareto front.Moreover, the swarms spread along the convex parts leading to a better result in both the decisionand the objective space, as is illustrated in Figure 3.The improvement compared to Figure 2 is obvious.
The influence of the penalization on the convex example of Section 2.2.1 is illustrated in Figure 4.The approximation of the Pareto front obtained with penalization nicely covers the whole efficient set.In particular, the tails are better resolved compared to the approximation without penalization, cf. Figure 1a.We emphasize that the penalization only affects the cost function of the problem.Therefore, all the analytical results obtained in Section 2 still apply.The only difference is that the weighted averages of the swarms approximate the minimizers of the penalized scalarizaed cost functions instead of the global minima of the (original) scalarized cost functions.
The theory of CBO provides us with a convergence result showing that the invariant state of the dynamics is a consensus near the global minimizer of the cost function.We adapted the dynamics to obtain a scheme that provides a uniform approximation of the Pareto front all at once.So far, we only considered the weighed means of the swarms.Now, we want to use the additional information provided by the individuals.In fact, for our purpose it is better if the swarms do not collapse but cluster around the weighted mean, in order to obtain a local approximation of the Pareto front.We achieve this by implementing an idea from Consensus-based sampling (CBS) [CHSV21].Technically speaking, we replace the anisotropic diffusion factor in Algorithm 2 by a factor that counteracts collapse of the swarm, see Algorithm 3.
Instead of the approximation consisting of the weighted means shown in Figure 2 and Figure 3 we obtain now an approximation based on the weighted means and additional information of all individuals as shown in Figure 5.The different colors of individuals illustrate the allocation to the respective swarms, plotted in the decision space (Fig. 5a) and in the objective space (Fig. 5b), respectively.

Algorithms and parameter settings
In the following we provide the implementation details such as discretization, handling of box constraints and the parameter values for the simulations underlying the illustrations above.4.1.Discretization.The stochastic differential equations modelling the multiswarm dynamics are solved with the Euler-Maruyama scheme [KP92].In each iteration particles are projected to the feasible set X (if necessary).Since we study only box-shaped feasible sets, the projection can be applied component-wise.A pseudo code for the simple multi-swarm CBO with fixed scalarization weights is given in Algorithm 1.
The dynamic weight adaption proposed in Section 2.2 couples the dynamics of the different swarms.We proceed swarm by swarm and update the scalarization weight in each iteration explicitly.This leads to the pseudo code in Algorithm 2. The third modelling step implements the penalization strategy.Note that this affects only the cost function, hence leading to the adapted weighted mean (c.f.(12) above) To avoid collapsing swarms we replace the diffusion term with the sampling noise leading to The pseudo code with all modelling features is summarized in Algorithm 3.

Test problems.
The illustrative example and also the numerical comparisons in the following sections use the following test problems.Schaffer1 [Sch84].
Algorithm 1: MSCBO with fixed weights Data: K ∈ N, N k ∈ N, initial positions {X k,j 0 } for k = 1, . . ., K and j = 1, . . ., N k , scalarization weights λ k ∈ Λ 0 , time step τ > 0, diffusion coefficient σ > 0, terminal time T > τ, initial time t = 0 and independent standard-normal distributed W k t Result: approximation of the Pareto front (f λ k (X k,j T )) k,j and efficient set (X k,j T ) k,j while t < T do for k ← 1 to K do Algorithm 2: MSCBO with adaptive weights Data: K ∈ N, N k ∈ N, initial positions {X k,j 0 } for k = 1, . . ., K and j = 1, . . ., N k , scalarization weights µ k = ln(λ k ), time step τ > 0, diffusion coefficient σ > 0, terminal time T > τ, initial time t = 0 and independent standard-normal distributed W k t Result: approximation of the Pareto front (f µ k (X k,j T )) k,j and efficient set (X k,j T ) k,j while t < T do for k ← 1 to K do Biobjective test problem with a dent, Algorithm 3: MSCBO with adaptive weights, penalization and sampling noise Data: K ∈ N, N k ∈ N, initial positions {X k,j 0 } for k = 1, . . ., K and j = 1, . . ., N k , scalarization weights µ k = ln(λ k ), time step τ > 0, diffusion coefficient σ > 0, terminal time T > τ, initial time t = 0 and independent standard-normal distributed W k t Result: approximation of the Pareto front (f µ k (X k,j T )) k,j and efficient set (X k,j T ) k,j while t < T do for k ← 1 to K do Three Parameters for illustrative examples.In the previous sections we illustrated the modelling ideas with the help of a convex optimization problem (Schaffer1) in Figures 1 and 4, and with a non-convex problem (dent) in Figures 2, 3 and 5. Initial data is sampled independently from the uniform distribution on X .All potentials are chosen to be purely repulsive, therefore A and A f are set to zero, for completeness we set a and a f to 1.In a post-processing we eliminate dominated individuals, we therefore use a numerical offset dom = 10 −5 .For the test problems with Y ⊂ R 2 we initialize the scalarization weights equidistantly on [0.001, 0.999] 2 .For the three-objective test case, initial scalarization weights are randomly generated with Algorithm 2 in [OW09].For simplicity, we set N k = N for all k = 1, . . .K for all simulations.For the illustrations in Figure 1 and Figure 5 we use K = 20 swarms with N = 50 individuals each.For all other biobjective test problems we set K = 30 and N = 20.For the test problem with three objectives we set K = 50 and N = 20.All other parameters are reported in Table 1.
Table 1.Algorithmic parameters used in the numerical tests.

Comparison to other population-based algorithms
This section is devoted to a comparison to other multi-objective optimization methods.In particular, we address the well-known non-dominated Sorting Genetic Algorihm (NSGA2) [DPAM02] and the recently introduced single swarm Consensusbased optimization approach for multi-objective problems [BHP22].5.1.Computational effort.The computational complexity of population-based multi-objective optimization algorithms lies on the one hand in the evaluation of the objective functions and on the other hand in computing the dominance relations between the generations or agents.The improvement from NSGA [SD94] to NSGA2 [DPAM02] is mainly obtained by an efficient computation of the dominance relationship of the best N individuals.Note that we have refrained from using the explicit multiobjective dominance information in the algorithms proposed and therefore save the complexity of O(pN 2 ) which is governed by the non-dominated sorting algorithms (see e.g.[Lan22]).
The single swarm multi-objective algorithm (SSCBO) proposed in [BHP22] is stabilized by a greedy approach which ensures that individuals only move if the attempted move leads to a better position.Computationally this is cheaper then computing the dominance relationship in NSGA2, still it requires N comparisons in each iteration.Our multi-swarm CBO approach MSCBO does not require these comparisons and therefore saves O(pN ) as compared to SSCBO.

Numerical results.
To obtain comparable results, we chose the parameters in the following simulations such that all algorithms use the same number of function evaluations.The time steps of SSCBO and MSCBO are chosen equally.We set the diffusion parameter of SSCBO to 10 as reported in [BHP22].Moreover, we activate the greedy strategy.The reference solution is given by NSGA2 with the same number of iterations.In the following this will be T /τ = 50.
The comparison is based on different performance indicators: generational distance (GD), inverted generational distance (IGD) and hypervolume (HV) from the pymoo package are used for the comparison (see, e.g., [RLB15, ZTL + 03] for a detailed description of performance indicators).GD gives us an indication on the precision of the approximation of the Pareto front.IGD detects clustering or sparse regions along the front and HV gives us a flavour of how good the algorithms perform compared to NSGA2.As reference solution for GD and IGD we use the corresponding NSGA2 solution.The reference points for the hypervolume indicator are the upper bounds of the Cartesian intervals containing Y given in Section 4.2.These choices allow us to analyse the performance without knowledge about the true Pareto front of the problems.As mentioned above we do not compute dominance relations in each iteration, but we evaluate the number of non-dominated individuals (NI) at the end of the simulations.5.2.1.Biobjective tests.We begin our tests with the convex test problem Schaffer1 with the results reported in Table 2.In the outputs all individuals of all algorithms are non-dominated.The dominated hypervolumes of the three approximations are very similar.The GD values of MSCBO and SSCBO are very close as well, indicating that the approximations are precise.However, we see a difference in the IGD values.Further simulations and Figure 6 suggest that the approximation points of SSCBO tend to accumulate in the region of the knee of the Pareto front, therefore the distances of IGD at the tails of the front are higher leading to a higher overall value.
The second test yields approximations of the Pareto front of the test problem Dent.It turns out that many of the individuals of MSCBO are dominated, also individuals of SSCBO are dominated.The GD values are very small for both methods.The IGD is better for SSCBO.Further investigations suggest that the approximation of  The third test is concerned with the Schaffer2 problem which has a discontinuous Pareto front.The precision of the approximation of the front indicated by the GD value is again very good.The IGD values differ and simulations showed that the upper part of the Pareto front is not very well-approximated by SSCBO leading to this IGD behavior.This is visualized in Figure 8 as well.Also in the values of the hypervolume indicator we see a difference between the SSCBO approximation.Although many of the MSCBO individuals are dominated, the dominated hypervolume and the IGD values are promising.Table 4. Simulation results for Schaffer2.

Three-objective test.
The results of the benchmark with three objectives are shown in Table 5.It jumps to the eye that many individuals of SSCBO are dominated.This is not expected as the front is convex.For all tested indicators MSCBO outperforms SSCBO.There seems to be a performance drop from two to three objectives in SSCBO.The approximation determined by MSCBO is reasonable.Note that we do expect the magnitude of the indicators to be higher due to the additional objective function.Table 5. Simulation results for three-objective problem.

Conclusion and outlook
We propose a versatile population-based algorithm for the approximation of the Pareto front and of the efficient set for multi-objective optimization problems that is based on the Consensus-based optimization or sampling method, respectively.In the case of fixed scalarization weights, many analytical results of CBO can be easily generalized to this setting.For adaptive weights we proved that the methods yields a diverse approximation of the front.Moreover, with prevention of collapsing swarms, we retain local information of the Pareto front close to the swarm means.The method is computationally cheap as no dominance relations need to be computed.
The numerical results are promising and motivate for further investigations and applications of the method.In fact, the algorithm is competitive with widely-used NSGA2 and the recently proposed single-swarm CBO for multi-objective problems.

Figure 1 .
Figure 1.Illustration of adaptive weights.The orange points in the plot on the left show the adaptive weights at time t = 2.5.The corresponding front given by f (v k t ) k=1,...,10 is shown in the plot on the right.In the middle the approximation obtained with equally distributed weights is shown.

Figure 3 .
Figure 3. Illustration of the effect of penalization.On the left-hand side we see the approximation of the efficient set by the weighted means of the swarms obtained with Algorithm 2. On the right-hand side the corresponding approximation of the Pareto front is shown.

Figure 4 .
Figure 4. Influence of penalization on the convex toy problem discussed in Section 2.2.1.

Figure 5 .
Figure 5. Approximation of the Pareto front using the information of weighted means and all individuals.Individuals of the same swarm are plotted in the same color.

Figure 7 .Table 3 .
Figure 7. Approximation of the Pareto fronts for problem Dent.

Figure 8 .
Figure 8. Approximation of the Pareto fronts for Schaffer2.
the spectral radius and ∂ ii f λ k is the i-th element of the diagonal of ∇ 2 f λ k .Since we assumed that all objective functions f i , i = 1, . . ., p, are bounded from below, all weighted sums f λ k are bounded from below for λ k ∈ Λ 0 .Hence, F k > 0 in Assumption (A1) can always be guaranteed after applying an appropriate linear transformation to the objective functions values.The uniqueness of the global minimizers of weights sum scalarizations f λ k , however, does in general not follow from the uniqueness of the global minimizers of the individual objective functions f i , i = 1, . . ., p, even if p = 2.As an example, suppose that