An efficient descent method for locally Lipschitz multiobjective optimization problems

In this article, we present an efficient descent method for locally Lipschitz continuous multiobjective optimization problems (MOPs). The method is realized by combining a theoretical result regarding the computation of descent directions for nonsmooth MOPs with a practical method to approximate the subdifferentials of the objective functions. We show convergence to points which satisfy a necessary condition for Pareto optimality. Using a set of test problems, we compare our method to the multiobjective proximal bundle method by M\"akel\"a. The results indicate that our method is competitive while being easier to implement. While the number of objective function evaluations is larger, the overall number of subgradient evaluations is lower. Finally, we show that our method can be combined with a subdivision algorithm to compute entire Pareto sets of nonsmooth MOPs.


Introduction
In many scenarios in real life, the problem of optimizing multiple objectives at the same time arises. In engineering for example, one often wants to steer a physical system as close as possible to a desired state while minimizing the required energy cost at the same time. These problems are called multiobjective optimization problems (MOPs) and generally do not possess a single optimal solution. Instead, the solution is the set of all optimal compromises, the so-called Pareto set containing all Pareto optimal points. Due to this, the numerical computation of solutions to MOPs is more challenging than to single-objective problems. On top of that, there are numerous applications where the objectives are nonsmooth, for example contact problems in mechanics, which adds to the difficulty. In this article, we will address both difficulties combined by considering nonsmooth MOPs.
When addressing the above-mentioned difficulties, i.e., multiple objectives and nonsmoothness, separately, there exists a large number solution methods. For smooth MOPs, the most popular methods include evolutionary [9,10] and scalarization methods [27]. Additionally, some methods from single-objective optimization have been generalized, like gradient descent methods [13,30,14] and Newton's method [12]. For the nonsmooth single-objective case, commonly used methods include subgradient methods [31], bundle methods [20] and gradient sampling methods [3]. More recently, in [22], a generalization of the steepest descent method to the nonsmooth case was proposed, which is based on an efficient approximation of the subdifferential of the objective function. For nonsmooth multiobjective optimization, the literature is a lot more scarce. Since classical scalarization approaches do not require the existence of gradients, they can still be used. In [1], a generalization of the steepest descent method was proposed for the case when the full subdifferentials of the objectives are known, which is rarely the case in practice. In [2,7], the subgradient method was generalized to the multiobjective case, but both articles report that their method is not suitable for real life application due to inefficiency. In [25] (see also [18,23]), a multiobjective version of the proximal bundle method was proposed, which currently appears to be the most efficient solver.
In this article, we develop a new descent method for locally Lipschitz continuous MOPs by combining the descent direction from [1] with the approximation of the subdifferentials from [22]. In [1] it was shown that the element with the smallest norm in the negative convex hull of the subdifferentials of the objective functions is a common descent direction for all objectives. In [22], the subdifferential of the objective function was approximated by starting with a single subgradient and then systematically computing new subgradients until the element with the smallest norm in the convex hull of all subgradients is a direction of (sufficient) descent. Combining both approaches yields a descent direction for locally Lipschitz MOPs and together with an Armijo step length, we obtain a descent method. We show convergence to points which satisfy a necessary condition for Pareto optimality. Using a set of test problems, we compare the performance of our method to the multiobjective proximal bundle method from [25]. The results indicate that our method is inferior in terms of function evaluations, but superior in terms of subgradient evaluations.
The structure of this article is as follows: we start with a short introduction to nonsmooth and multiobjective optimization in Section 2. In Section 3, we derive our descent method by replacing the Clarke subdifferential for the computation of the descent direction by the Goldstein ε-subdifferential and then showing how the latter can be efficiently approximated. In Section 4, we apply our descent method to numerical examples. We first visualize and discuss the typical behavior of our method before comparing it to the multiobjective proximal bundle method from [25] using a set of test problems. Afterwards, we show how our method can be combined with a subdivision algorithm to approximate entire Pareto sets. Finally, in Section 5, we draw a conclusion and discuss possible future work.

Nonsmooth multiobjective optimization
We consider the nonsmooth multiobjective optimization problem . . .
where f : R n → R k is the objective vector with components f i : R n → R, i ∈ {1, ..., k}, called objective functions. We assume the objective functions to be locally Lipschitz continuous, i.e., for each i ∈ {1, ..., k} and x ∈ R n , there is some L i > 0 and ε > 0 with where · denotes the Euclidean norm in R n . Since (MOP) is an optimization problem with a vector-valued objective function, the classical concept of optimality from the scalar case can not directly be conveyed. Instead, we are looking for the Pareto set, which is defined in the following way: The set of all Pareto optimal points is the Pareto set.
In practice, to check if a given point is Pareto optimal, we need optimality conditions. In the smooth case, there are the well-known KKT conditions (cf. [27]), which are based on the gradients of the objective functions. In case the objective functions are merely locally Lipschitz, the KKT conditions can be generalized using the concept of subdifferentials. In the following, we will recall the required definitions and results from nonsmooth analysis. For a more detailed introduction, we refer to [6].
It is easy to see that if f i is continuously differentiable, then the Clarke subdifferential is the set containing only the gradient of f i . We will later use the following technical result on some properties of the Clarke subdifferential (cf. [6], Prop. 2.1.2).
is nonempty, convex and compact.
In the smooth case, (1) reduces to the classical multiobjective KKT conditions. Note that in contrast to the smooth case, the optimality condition (1) is numerically challenging to work with, as subdifferentials are difficult to compute. Thus, in numerical methods, (1) is only used implicitly.
The method we are presenting in this paper is a descent method, which means that, starting from a point x 1 ∈ R n , we want to generate a sequence (x j ) j ∈ R n in which each point is an improvement over the previous point. This is done by computing directions v j ∈ R n and step lengths t j ∈ R >0 such that x j+1 = x j + t j v j and For the computation of v j , we recall the following basic result from convex analysis that forms the theoretical foundation for descent methods in the presence of multiple (sub)gradients. Let . be the Euclidean norm in R n .
Then eitherv = 0 and orv = 0 and there is no v ∈ R n with v, ξ < 0 for all ξ ∈ W .
Proof. Sincev is the projection of the origin onto the closed and convex set −W , we have for all ξ ∈ W (cf. [4], Lem.). In particular, ifv = 0 then v, ξ ≤ − v 2 < 0. Conversely,v = 0 implies 0 ∈ W , so in this case there can not be any v ∈ R n with v, ξ < 0 for all ξ ∈ W .
Roughly speaking, Theorem 2.5 states that the element of minimal norm in the convex and compact set −W is directionally opposed to all elements of W . To be more precise,v is contained in the intersection of all half-spaces induced by elements of −W . In the context of optimization, this result has several applications: (i) In the smooth, single-objective case, W = {∇f (x)} trivially yields the classical steepest descent method.
In (i) and (ii), the solution of problem (2) is straightforward, since W is a convex polytope with the gradients as vertices. In (iii), the solution of (2) is non-trivial due to the difficulty of computing the subdifferential. In subgradient methods [31], the solution is approximated by using a single subgradient instead of the entire subdifferential. As a result, it can not be guaranteed that the solution is a descent direction and in particular, (2) can not be used as a stopping criterion. In gradient sampling methods [3], the subdifferential is approximated by the convex hull of gradients of the objective function in randomly sampled points around the current point. Due to the randomness, it can not be guaranteed that the resulting direction yields sufficient descent. Additionally, a check for differentiability of the objective is required, which can pose a problem [17]. In (iv), the solution of (2) gets even more complicated due to the presence of multiple subdifferentials. So far, the only methods that deal with (2) in this case are multiobjective versions of the subgradient method [2,7], which were reported unsuitable for real life applications.
In the following section, we will describe a new way to compute descent directions for nonsmooth MOPs by systematically computing an approximation of conv ∪ k i=1 ∂f i (x) that is sufficient to obtain a "good enough" descent direction from (2).

Descent method for nonsmooth MOPs
In this section, we will present a method to compute descent directions of nonsmooth MOPs that generalizes the method from [22] to the multiobjective case. As described in the previous section, when computing descent directions via Theorem 2.5, one has the problem of having to compute subdifferentials. Since these are difficult to come by in practice, we will instead replace W in Theorem 2.5 by an approximation of conv ∪ k i=1 ∂f i (x) such that the resulting direction is guaranteed to have sufficient descent. To this end, we will first replace the Clarke subdifferential by the so-called ε-subdifferential, and then take a finite approximation of the latter.

The epsilon-subdifferential
By definition, ∂f i (x) is the convex hull of the limits of the gradient of f i in all sequences near x that converge to x. Thus, if we evaluate ∇f i in a number of points close to x (where it is defined) and take the convex hull, we expect the resulting set to be an approximation of ∂f i (x). To formalize this, we introduce the following definition [16,21].
To be able to choose W = F ε (x) in Theorem 2.5, we first need to establish some properties of F ε (x).
is nonempty, convex and compact. In particular, the same holds for F ε (x).

Proof.
For ∂ ε f i (x), this was shown in [16], Prop. 2.3. For F ε (x), it then follows directly from the definition.
We immediately get the following corollary from Theorems 2.4 and 2.5.
The previous corollary states that when working with the ε-subdifferential instead of the Clarke subdifferential, we still have a necessary optimality condition and a way to compute descent directions, although the optimality conditions are weaker and the descent direction has a less strong descent. This is illustrated in the following example.
The following lemma shows that for the direction from (5), there is a lower bound for a step size up to which we have guaranteed descent in each objective function f i . Lemma 3.5. Let ε ≥ 0 andv be the solution of (5). Then Combined with (6) we obtain In the following, we will describe how we can obtain a good approximation of (5) without requiring full knowledge of the ε-subdifferentials.

Efficient computation of descent directions
In this part, we will describe how the solution of (5) can be approximated when only a single subgradient can be computed at every x ∈ R n . Similar to the gradient sampling approach, the idea behind our method is to replace F ε (x) in (5) by the convex hull of a finite number of ε-subgradients ξ 1 , ..., ξ m ∈ F ε (x), m ∈ N. Since it is impossible to know a priori how many and which ε-subgradients are required to obtain a good descent direction, we solve (5) multiple times in an iterative approach to enrich our approximation until a satisfying direction has been found. To this end, we have to specify how to enrich our current approximation conv({ξ 1 , ..., ξ m }) and how to characterize an acceptable descent direction. Let Let c ∈ (0, 1). Motivated by Lemma 3.5, we regardṽ as an acceptable descent direction, if If the set I ⊆ {1, ..., k} for which (8) is violated is non-empty then we have to find a new εsubgradient ξ ∈ F ε (x) such that W ∪{ξ } yields a better descent direction. Intuitively, (8) being violated means that the local behavior of f i , i ∈ I, in x in the directionṽ is not sufficiently captured in W . Thus, for each i ∈ I, we expect that there exists some t ∈ (0, ε ṽ ] such that ξ ∈ ∂f i (x + t ṽ) improves the approximation of F ε (x). This is proven in the following lemma.
Proof. Assume that for all t ∈ (0, ε ṽ ] and ξ ∈ ∂f i (x + t ṽ) we have By applying the mean value theorem as in Lemma 3.5, we obtain By (10) it follows that which is a contradiction. In particular, (3) yields ξ ∈ F ε (x) \ conv(W ).
The following example visualizes the previous lemma.  Figure 3 show the ε-subdifferentials, F ε (x) and the resulting descent direction (cf. Figure 1 and and Let W := {ξ 1 , ξ 2 } and conv(W ) be the approximation of F ε (x), shown as the solid line in Figure  3(a). Letṽ be the solution of (7) for this W and choose c = 0.25. Checking (8), we have By Lemma 3.6, this means that there is some t ∈ (0, ε ṽ ] and ξ ∈ ∂f 2 (x + t ṽ) such that In this case, we can take for example t = 1 2 ε ṽ , resulting in  By checking (8) for this new descent direction, we see thatṽ is acceptable. (Note that in general, a single improvement step like this will not be sufficient to reach an acceptable direction.) Note that Lemma 3.6 only shows the existence of t and ξ without stating a way how to actually compute them. To this end, let i be the index of an objective function for which (8) is not satisfied, define (cf. [22]) and consider Algorithm 1. If f i is continuously differentiable around x, then (9) is equivalent to h i (t ) > 0, i.e., h i being monotonically increasing around t . Thus, the idea of Algorithm 1 is to find some t such that h i is monotonically increasing around t, while checking if (9) is satisfied for a subgradient ξ ∈ f i (x + tṽ).
Although in the general setting, we can not guarantee that Algorithm 1 yields a subgradient satisfying (9), we can at least show that after finitely many iterations, a factor t is found such that ∂f i (x + tṽ) contains a subgradient that satisfies (9).
Lemma 3.8. Let (t k ) k be the sequence generated in Algorithm 1. If (t k ) k is finite, then some ξ was found such that (9) is satisfied. If (t k ) k is infinite, then it converges to somet ∈ [0, ε ṽ ] such that there is some ξ ∈ ∂f i (x +tṽ) which satisfies (9). Additionally, there is some N ∈ N such that for all k > N there is some ξ ∈ ∂f i (x + t kṽ ) satisfying (9).
Proof. Let (t k ) k be finite with last elementt ∈ (0, ε ṽ ). Then Algorithm 1 must have stopped in step 3, i.e., some ξ ∈ ∂f i (x +tṽ) satisfying (9) was found. Now let (t k ) k be infinite. By construction, (t k ) k is a Cauchy sequence in the compact set [0, ε ṽ ], so it has to converge to somet ∈ [0, ε ṽ ]. Additionally, since (8) is violated for the index i by assumption, we have Let (a k ) k and (b k ) k be the sequences corresponding to a and b in Algorithm 1 (at the start of each iteration). Then h i (a k ) < h i (b k ) for all k ∈ N. Thus, by the mean value theorem, there has to be some r k ∈ (a k , b k ) such that In particular, lim k→∞ r k =t and since a k < b k , ∂h i (r k ) ∩ R >0 = ∅ for all k ∈ N. By upper semicontinuity of ∂h there must be some θ ∈ ∂h i (t) with θ > 0. By the chain rule, we have Thus, there must be some ξ ∈ ∂f i (x +tṽ) with By upper semicontinuity of ∂h it also follows that there is some N ∈ N such that ∂h i (t k )∩R >0 = ∅ for all k > N . Applying the same argument as above completes the proof.
In the following remark, we will briefly discuss the implication of Lemma 3.8 for practical use of Algorithm 1. a) Note that if k > N and h is differentiable in t k , then we have i.e., the stopping criterion in step 3 is satisfied. Thus, if Algorithm 1 generates an infinite sequence, h must be nonsmooth in t k for all k > N . In particular, f i must be nonsmooth in x + t kṽ for all k > N .
b) If f is regular (cf. [6], Def. 2.3.4), then equality holds when applying the chain rule to h (cf. [6], Thm. 2.3.10), i.e., Thus, if Algorithm 1 generates an infinite sequence, then for all k > N there must be both positive and negative elements in ∂h i (t k ). By convexity of ∂h i (t k ), this implies that 0 ∈ ∂h i (t k ) for all k > N , i.e., h must have infinitely many (nonsmooth) stationary points in [0, ε ṽ ].
Motivated by the previous remark, we will from now on assume that Algorithm 1 stops after finitely many iterations and thus yields a new subgradient satisfying (9). We can use this method of finding new subgradients to construct an algorithm that computes descent directions of nonsmooth MOPs, namely Algorithm 2.
If I l = ∅ then stop. 5: For each j ∈ I l , compute t ∈ (0, ε v l ] and ξ j l ∈ ∂f j (x + tv l ) such that v l , ξ j l > −c v l 2 via Algorithm 1. 6: Set W l+1 = W l ∪ {ξ j l : j ∈ I l }, l = l + 1 and go to step 2.
The following theorem shows that Algorithm 2 stops after a finite number of iterations and produces an acceptable descent direction (cf. (8)). Theorem 3.10. Algorithm 2 terminates. In particular, ifṽ is the last element of (v l ) l , then either ṽ ≤ δ orṽ is an acceptable descent direction, i.e., Proof. Assume that Algorithm 2 does not terminate, i.e., (v l ) l∈N is an infinite sequence. Let l > 1 and j ∈ I l−1 . Since for all s ∈ [0, 1]. Since j ∈ I l−1 we must have v l−1 , ξ j l−1 > −c v l−1 by step 5. Let L be a common Lipschitz constant of all f i , i ∈ {1, ..., k}, on the closed ε-ball B ε (x) around x. Then by [6], Prop. 2.1.2, and the definition of the ε-subdifferential, we must have ξ ≤ L for all ξ ∈ F ε (x). So in particular, Combining (13) with (14) and (15) yields Since Algorithm 2 did not terminate, it holds v l−1 > δ. It follows that In particular, there is some l such that v l ≤ δ, which is a contradiction.
Remark 3.11. The proof of Theorem 3.10 shows that for convergence of Algorithm 2, it would be sufficient to consider only a single j ∈ I j in step 5. Similarly, for the initial approximation W 1 , a single element from ∂ ε f i (x) for any i ∈ {1, ..., k} would be enough. A modification of either step can potentially reduce the number of executions of step 5 (i.e., Algorithm 1) in Algorithm 2 in case the ε-subdifferentials of multiple objective functions are similar. Nonetheless, we will restrain the discussion in this article to Algorithm 2 as it is, since both modifications also introduce a bias towards certain objective functions, which we want to avoid.
To highlight the strengths of Algorithm 2, we will consider an example where standard gradient sampling approaches can fail to obtain a useful descent direction.
The set of nondifferentiable points is separating R 2 into four smooth areas (cf. Figure 4(a)). For large a > 0, the two areas above the graph of λ → a|λ| become small, making it difficult to compute the subdifferential close to (0, 0) . Let a = 10, b = 0.5, ε = 10 −3 and x = (10 −4 , 10 −4 ) . In this case, (0, 0) is the minimal point of f 2 and In particular, 0 ∈ ∂ ε f 2 (x), so the descent direction with the exact ε-subdifferentials from (5) is zero. When applying Algorithm 2 in x, after two iterations we obtaiñ v = v 2 ≈ (0.118, 1.185) · 10 −9 , i.e., ṽ ≈ 1.191 · 10 −11 . Thus, x is correctly identified as (potentially) Pareto optimal. The final approximation W 2 of F ε (x) is The first two elements of W 2 are the gradients of f 1 and f 2 in x from the first iteration of Algorithm 2, and the last element is the gradient of f 2 in x = x + tv 1 = (0.038, 0.596) · 10 −3 ∈ B ε (x) from the second iteration. The result is visualized in Figure 4. Building on Algorithm 2, it is now straightforward to construct the descent method for locally Lipschitz continuous MOPs given in Algorithm 3. In step 4, the classical Armijo backtracking line search was used (cf. [13]) for the sake of simplicity. Note that it is well defined due to step 4 in Algorithm 2.
Since we introduced the two tolerances ε (for the ε-subdifferential) and δ (as a threshold for when we consider ε-subgradients to be zero), we can not expect that Algorithm 3 computes points which satisfy the optimality condition (1). This is why we introduce the following definition, similar to the definition of ε-stationarity from [3].
It is easy to see that (ε, δ)-criticality is a necessary optimality condition for Pareto optimality, but a weaker one than (1). The following theorem shows that convergence in the sense of (ε, δ)-criticality is what we can expect from our descent method.
Theorem 3.14. Let (x j ) j be the sequence generated by Algorithm 3. Then either (f i (x j )) j is unbounded below for each i ∈ {1, ..., k}, or (x j ) j is finite with the last element being (ε, δ)-critical.
Proof. Assume that (x j ) j is infinite. Then v j > δ for all j ∈ N. By step 4 and Lemma 3.5 we have for all i ∈ {1, ..., k}. This implies that (f i (x j )) j is unbounded below for each i ∈ {1, ..., k}. Now assume that (x j ) j is finite, withx andv being the last elements of (x j ) j and (v j ) j , respectively. Since the algorithm stopped, we must have v ≤ δ. From the application of Algorithm 2 in step 2, we know that there must be some W ⊆ F ε (x) such thatv = arg min v∈−W v 2 . This implies

Numerical examples
In this section we will apply our nonsmooth descent method (Algorithm 3) to several examples. We will begin by discussing its typical behavior before comparing its performance to the multiobjective proximal bundle method [25]. Finally, we will combine our method with the subdivision algorithm [11] in order to approximate the entire Pareto set of nonsmooth MOPs.

Typical behavior
In smooth areas, the behavior of Algorithm 3 is almost identical to the behavior of the multiobjective steepest descent method [13]. The only difference stems from the fact that, unlike the Clarke subdifferential, the ε-subdifferential does not reduce to the gradient when f is continuously differentiable. As a result, Algorithm 3 may behave differently in points x ∈ R n where max{ ∇f i (x) − ∇f i (y) : y ∈ B ε (x), i ∈ {1, ..., k}} is large. (If f is twice differentiable, this can obviously be characterized in terms of second order derivatives.) Nevertheless, if ε is chosen small enough, this difference can be neglected. Thus, in the following, we will focus on the behavior with respect to the nonsmoothness of f . To show the typical behavior of Algorithm 3, we consider the objective function from [25] (combining Crescent from [19] and Mifflin 2 from [26]). The set of nondifferentiable points is Ω f = S 1 ∪ (S 1 + (0, 1) ). We consider the starting points the tolerances ε = 10 −3 , δ = 10 −3 and the Armijo parameters c = 0.25, t 0 = 1. The results are shown in Figure 5. We will briefly go over the result for each starting point: • For x 1 , the sequence moves through the smooth area like the steepest descent method until a point is found with a distance less or equal ε to the set of nondifferentiable points Ω f . In that point, more than one ε-subgradient is required to obtain a sufficient approximation of the ε-subdifferentials. Since this part of Ω f is Pareto optimal, no acceptable descent direction (cf. (8)) is found and the algorithm stops (in a (ε, δ)-critical point).
• For x 2 , the sequence starts zig-zagging around the non-optimal part of Ω f , since the points are too far away from Ω f for the algorithm to notice the nondifferentiability. When a point is found with distance less or equal ε to Ω f , a better descent direction is found, breaking the zig-zagging motion.
• For x 3 , the sequence has a similar zig-zagging motion to the previous case. The difference is that this time, the sequence moves along Ω f until a Pareto optimal point in Ω f is found.
As described above, the zig-zagging behavior when starting in x 2 is caused by the fact that ε was too small for the method to notice the nondifferentiability. To circumvent problems like this and quickly move through problematic areas, it is possible to apply Algorithm 3 consecutively with decreasing values of ε. The result is Algorithm 4. (A similar idea was implemented in [22].)
1: Set y 1 = x 1 . 2: for i = 1, ..., K do 3: Apply Algorithm 3 with initial point y i and tolerance ε = ε i . Let y i+1 be the final element in the generated sequence. 4: end for

Comparison to the multiobjective proximal bundle method
We will now compare Algorithms 3 and 4 to the multiobjective proximal bundle method (MPB) by Mäkelä, Karmitsa and Wilppu from [25] (see also [23]). As test problems, we consider the 18 MOPs in Table 1, which are created on the basis of the scalar problems from [25]. Problems 1 to 15 are convex (and were also considered in [28]) and problems 16 to 18 are nonconvex. Due to their simplicity, we are able to differentiate all test problems by hand to obtain explicit formulas for the subgradients. For each test problem, we choose 100 starting points on a 10 × 10 grid in the corresponding area given in Table 1.
To compare the performance of the three methods, we count the number of evaluations of objectives f i , their subgradients ξ ∈ ∂f i and the number of iterations (i.e., descent steps) needed. (This means that one call of f will account for k evaluations of objectives.) Since the MPB always evaluates all objectives and subgradients in a point, the value for the objectives and the subgradients are the same here. The results are shown in Table 2 and are discussed in the following.
• Function evaluations: When considering the number of function evaluations, it is clear that the MPB requires far less evaluations than both of our algorithms. In our methods,

Combination with the subdivision algorithm
Note that so far, we have a method where we can put in some initial point from R n and obtain a single (ε, δ)-critical point (close to an actual Pareto optimal point) as a result. But ultimately, we are not interested in one, but all Pareto optimal points. The intuitive and straightforward approach to extend our method would be to just take a large set of well-spread initial points and apply our method to each of them. The problem with this is that we have no guarantee that this results in a good approximation of the Pareto set. To solve this issue, we combine our method with the subdivision algorithm which was developed for smooth problems in [11]. Since we only have to do minor adjustments for the nonsmooth case, we will only sketch the method here and refer to [11] for the details.
The idea is to interpret the nonsmooth descent method as a discrete dynamical system where g : R n → R n is the map that applies one iteration of Algorithm 3 to a point in R n .
(For the sake of brevity, we have omitted the rest of the input of the algorithm here.) Since no information is carried over between iterations of the algorithm, the trajectory (i.e., the sequence) generated by the system (17) is the same as the one generated by Algorithm 3. In particular, this means that the Pareto set of the MOP is contained in the set of fixed points of the system (17). Thus, the subdivision algorithm (which was originally designed to compute attractors of dynamical systems) can be used to compute (a superset of) the Pareto set. The subdivision algorithm starts with a large hypercube (or box ) in R n that contains the Pareto set and mainly consists of two steps: 1. Subdivision: Divide each box in the current set of boxes into smaller boxes.

Selection:
Compute the image of the union of the current set of boxes under g and remove all boxes that have an empty intersection with this image. Go to step 1.
In practice, we realize step 1 by evenly dividing each box into 2 n smaller boxes and step 2 by using the image of a set of sample points. The algorithm is visualized in Figure 6. Unfortunately, the convergence results of the subdivison algorithm only apply if g is a diffeomorphism. If the objective function f is smooth, then the descent direction is at least continuous (cf. [13]) and the resulting dynamical system g, while not being a diffeomorphism, still behaves well enough for the subdivision algorithm to work. If f is nonsmooth, then our descent direction is inherently discontinuous close to the nonsmooth points. Thus, the subdivision algorithm applied to (17) will (usually) fail to work. In practice, we were able to solve this issue by applying multiple iterations of Algorithm 3 in g at once instead of just one. Roughly speaking, this smoothes g by pushing the influence of the discontinuity further away from the Pareto set and was sufficient for convergence (in our tests). Figures 7 to 9 show the result of the subdivision algorithm for some of the problems from Table 1. For each problem, we used 15 iterations of Algorithm 3 in g, [−3.1, 3] 2 as the starting box and applied 9 iterations of the subdivision algorithm. For the approximation of the Pareto front (i.e., the image of the Pareto set), we evaluated f in all points of the image of g of the last selection step in the subdivision algorithm. In all of these examples, the algorithm produced a tight approximation of the Pareto set.

Conclusion and outlook
In this article, we have developed a new descent method for locally Lipschitz continuous multiobjective optimization problems, which is based on the efficient approximation of the Clarke subdifferentials of the objective functions from [22]. In [1], it was shown that the element with the smallest norm in the negative convex hull of the union of the subdifferentials is a descent direction for all objectives at the same time. In practice, the entire subdifferentials which are required to compute this direction are rarely known and only single subgradients can be computed.
To solve this issue, we presented a method to obtain an approximation of the subdifferentials which is sufficient to obtain a descent direction. The idea is to start with a rough approximation of the subdifferentials by only a few subgradients and then systematically enrich the approximation with new subgradients until a direction of sufficient descent is found. By combining the descent direction with an Armijo step length, we obtained a descent method for nonsmooth MOPs and showed convergence to points which satisfy a necessary condition for Pareto optimality. We then compared the performance to the multiobjective proximal bundle method from [25]. For the 18 test problems we considered, the MPB was superior in terms of objective function evaluations, but our method required less subgradient evaluations and iterations. Finally, we showed that our descent method can be combined with the subdivision algorithm from [11] to compute approximations of entire Pareto sets. For future work, we believe that it is straightforward to extend our method to constrained MOPs by adding constraints to the problem (7) that ensure that the descent direction maintains the feasibility of the descent sequence (similar to [14] for smooth problems). Additionally, in [8], the classical gradient sampling method for scalar nonsmooth optimization was generalized by allowing variable norms in the direction finding problem, increasing its efficiency. We expect that a similar generalization can be performed for problem (7), which potentially yields a similar increase in efficiency. Additional potential for increased performance lies in more advanced step length schemes as well as descent directions with memory (for instance, conjugate-gradient-like). Furthermore, it might be possible to extend our method to infinite-dimensional nonsmooth MOPs [29,5]. Finally, in the context of nonsmooth many-objective optimization, we believe that considering subsets of objectives is a very promising and efficient approach (cf. [15] for smooth problems). However, theoretical advances are required for locally Lipschitz continuous problems.