Retraction based Direct Search Methods for Derivative Free Riemannian Optimization

Direct search methods represent a robust and reliable class of algorithms for solving black-box optimization problems. In this paper, we explore the application of those strategies to Riemannian optimization, wherein minimization is to be performed with respect to variables restricted to lie on a manifold. More specifically, we consider classic and line search extrapolated variants of direct search, and, by making use of retractions, we devise tailored strategies for the minimization of both smooth and nonsmooth functions. As such we analyze, for the first time in the literature, a class of retraction based algorithms for minimizing nonsmooth objectives on a Riemannian manifold without having access to (sub)derivatives. Along with convergence guarantees we provide a set of numerical performance illustrations on a standard set of problems.


Introduction
Riemannian optimization, or solving minimization problems constrained on a Riemannian manifold embedded in an Euclidean space, is an important and active area of research considering the numerous problems in data science, robotics, and other settings wherein there is an important geometric structure characterizing the allowable inputs.Derivative Free Optimization (DFO), or Zeroth Order Optimization, involves algorithms that only make use of function evaluations rather than any gradient computations in their implementation.In cases of dynamics subject to significant epistemic uncertainty and the necessity of performing a simulation to compute a function evaluation, derivatives may be unavailable.This paper presents the introduction of a classic set of DFO algorithms, namely direct search, to the case of Riemannian optimization.For classic references of Riemannian optimization and DFO, see, e.g., [1] and [5,10,20], respectively.
Formally, let M be a smooth manifold embedded in R n .We are interested here in the problem with f continuous and bounded below.We consider both the case of f (x) being continuously differentiable, as well as the more general nonsmooth case.Direct search methods (see, e.g., [19] and references therein) belong to the class of algorithms that are mesh based, rather than model based.This distinction presents a binary taxonomy of DFO algorithms: on the one hand we have those based on approximating gradient information using function evaluations and constructing approximate local models, while on the other hand we have those based on sampling a pre-defined grid of points for the next iteration.Thus direct search is particularly suitable for black box cases wherein it is unknown the degree to which any model would have much veracity.
To the best of our knowledge, thorough studies of DFO on Riemannian manifolds have only been carried out recently in the literature.In [21], the authors focus on a model based method using a two point function approximation for the gradient.The paper [27] presents a specialized Polak-Ribiéere-Polyak procedure for finding a zero of a tangent vector field on a Riemannian manifold.In [12], the author focuses on a specific class of manifolds (reductive homogeneous spaces, including several matrix manifolds) where, thanks to the properties of exponential maps, a straightforward extension of mesh adaptive direct search methods (see, e.g., [4,5]) and probabilistic direct search strategies [14] is possible.Some DFO methods and nonsmooth problems on Riemannian manifolds without convergence analysis can be found in [16] and references therein.
Thus our paper presents the first analysis of retraction based direct search strategies on Riemannian manifolds, and the first analysis of a DFO algorithm for minimizing nonsmooth objectives in Riemannian optimization.In particular, we first adapt, thanks to the use of retractions, a classic direct search scheme (see, e.g., [10,19]) and a linesearch based scheme (see, e.g., [11,22,23,24] for further details on this class of methods) to deal with the minimization of a given smooth function over a manifold.Then, inspired by the ideas in [13], we extend the two proposed strategies to the nonsmooth case.
The remainder of this paper is as follows.In Section 2, we present some definitions.In Section 3, we present and prove convergence for a direct search method applicable for continuously differentiable f .In Section 4, we consider the case of f not being continuously differentiable, and only Lipschitz continuous.We present some numerical results in Section 5 and conclude in Section 6.

Definitions and notation
We now introduce some notation for the formalism we use in this article.We refer the reader to, e.g., [1,7] for an overview of the relevant background.Let T M be the tangent manifold and for x ∈ M let T x M be the tangent bundle to M in x.We assume that M is a Riemannian manifold, i.e., for x in M, we have a scalar product •, • x : T x M × T x M → R smoothly dependent from x.Let dist(•, •) be the distance induced by the scalar product, so that for x, y ∈ M we have that dist(x, y) is the length of the shortest geodesic connecting x and y.Furthermore, let ∇ M be the Levi-Cita connection for M, and Γ : T M × M → T M the parallel transport with respect to ∇ M , with Γ y x (v) ∈ T y M transport of the vector v ∈ T x M. We define P x as the orthogonal projection from R n to T x M, and S(x, r) ⊂ R n as the sphere centered at x and with radius r.We write {a k } as a shorthand for {a k } k∈I when the index set I is clear from the context.We also use the shorthand notations We define the distance dist * between vectors in different tangent spaces in a standard way using parallel transport (see for instance [6] and for a sequence As it is common in the Riemannian optimization literature (see, e.g., [2]), to define our tentative descent directions we use a retraction R : (true in any compact subset of T M given the C 1 regularity of R, without any further assumptions), and that the sufficient decrease property holds: for any L-Lipschitz smooth f , 3 Smooth optimization problems In this section, we consider solving (1) with the objective satisfying f ∈ C 1 (M).Recall that we can define the Riemannian gradient as gradf (x) = P x (∇f (x)), (5) for given x ∈ M.

Preliminaries
First, we assume that the objective function f has a Lipschitz continuous gradient on the manifold.
Assumption 3.1.There exists Like in the unconstrained case, the Lipschitz gradient property implies the standard descent property.
Proposition 3.1.Assume that M is compact and R is a C 2 retraction.If condition (6) holds, then the sufficient decrease property (4) holds for some constant L > 0.
The proof can be found in the appendix.An analogous property, but under the stronger assumption that f has Lipschitz gradient as a function in R n , is proved in [8].
Another assumption we make in this context is that the gradient norm is globally bounded.

Assumption 3.2. There exists
for every x ∈ M.
For each of the algorithms in this section, we further assume that, at each iteration k, we have a positive spanning basis {p j k } j∈[1:K] of the tangent space T x k M of the iterate x k (further details on how to get a positive spanning basis can be found, e.g., in [10]).More specifically, we assume that the basis stays bounded and does not become degenerate during the algorithm, that is, for every k ∈ N. Furthermore there is a constant τ > 0 such that for every k ∈ N and r ∈ T x k M .

Direct search algorithm
We present here our Riemannian Direct Search method based on Spanning Bases (RDS-SB) for smooth objectives as Algorithm 1.This procedure resembles the standard direct search algorithm for unconstrained derivative free optimization (see, e.g., [10,19]) with two significant modifications.First, at every iteration a positive spanning basis is computed for the current tangent vector space T k M. As this space is expected to change at every iteration, it is not possible to use the same standard positive spanning sets appearing in the classic algorithms.Second, the candidate point x j k is computed by retracting the step α k p j k from the current tangent space T x j k M to the manifold.

Convergence analysis
Now we show asymptotic global convergence of the method.Using a similar structure of reasoning as in standard convergence derivations for direct search, we prove that the gradient evaluated at iterates associated with unsuccessful steps must converge to zero, and extend the property to the remaining iterates, using the Lipschitz continuity of the gradient.
The first lemma states a bound on the scalar product between the gradient and the descent direction for an unsuccessful iteration.
Proof.To start with, we have where we used (4) in the second inequality, and (8) in the third one.The above inequality can be rewritten as Given that α k > 0, the above is true iff which rearranged gives the thesis.
From this we can infer a bound on the gradient with respect to the stepsize.
Proof.If iteration k is unsuccessful, equation (10) must hold for every j ∈ [1 : K].We obtain the thesis by applying the positive spanning property (9) in the RHS: Finally, we are able to show convergence of the gradient norm using the lemmas above and appropriate arguments regarding the step sizes.
Theorem 3.1.For the sequence {x k } generated by Algorithm 1 we have Proof.To start with, clearly α k → 0 since the objective is bounded below, {f (x k )} is non increasing with k if the step k is successful, and so there can be a finite number of successful steps with α k ≥ ε for any ε > 0. For a fixed ε > 0, let k such that α k ≤ ε for every k ≥ k.We now show that, for every ε > 0 and k ≥ k large enough, we have which clearly implies the thesis given that ε is arbitrary.First, ( 17) is satisfied for k ≥ k if the step k is unsuccessful by Lemma 3.2: using α k ≤ ε in the second inequality.
If the step k is successful, then let j be the minimum positive index such that the step k+j is unsuccessful.

2
. Therefore )) where we used (3) together with (8) in the second inequality, and ( 19) in the third one.
In turn, gradf where we used ( 6) and ( 18) with k + j instead of k for the first and second summand respectively in the second inequality, and (20) in the last one.

Incorporating an extrapolation linesearch
The works [23,24] introduced the use of an extrapolating line search that tests the objective on variable inputs farther away from the current iterate than the tentative point obtained by direct search on a given direction (i.e., an element of the positive spanning set).Such a thorough exploration of the search directions ultimately yields better performances in practice.We found that the same technique can be applied in the Riemannian setting to good effect.We present here our Riemannian Direct Search with Extrapolation method based on Spanning Bases (RDSE-SB) for smooth objectives.The scheme is presented in detail as Algorithm 2. As we can easily see, the method uses a specific stepsize for each direction in the positive spanning basis, so that instead of α k we have a set of stepsizes {α j k } j∈[1:K] for every k ∈ N 0 .Furthermore a retraction based linesearch procedure (see Algorithm 3) is used to better explore a given direction in case a sufficient decrease of the objective is obtained.
When analyzing the RDSE-SB method, we assume that the following continuity condition holds.
We refer the reader to [24] for a slightly weaker continuity condition in an Euclidean setting.
Algorithm 2 RDSE-SB We now proceed to prove the asymptotic convergence of this method.Lemma 3.3.We have, at every iteration k, that the following inequality holds: Proof.It is immediate to check that we must always have for ∆ k = 1 γ1 αj(k) k+1 if the Linesearchprocedure terminates at the second line, and ∆ k = γ 2 αj(k) k+1 if the Linesearchprocedure terminates in the last line.Then in both cases where we used Lemma 3.1 in the first inequality.
Theorem 3.2.For {x k } generated by Algorithm 2, we have Proof.Let ᾱk = max j∈[1:K] αj(k) k+1 , so that ᾱk → 0 since αj(k) k → 0, reasoning as in the proof of Theorem 3.1.As a consequence of Lemma 3.3 we have for the constant c 1 = γ2 γ1 (2LB 2 + γ) independent from j(k).It remains to bound gradf (x k ), p i k for i = j.To start with, we have the following bound: for h ≤ K such that k + h = j(i), and where in the second inequality we used (27) with k + h instead of k.For the second summand appearing in the RHS of (28), we can write the following bound where in the second inequality we used the Cauchy-Schwartz inequality together with the Assumptions on the Lipschitz property of the iterates ( 6) and (22), while in the third inequality we used conditions (8) and (7).We can now bound dist(x k , x k+h ) as follows where we used (3) in the second inequality, (8) in the third one, and h ≤ K in the last one.Now let ∆ k = max l∈[0:K] ᾱk+l , so that in particular ∆ k → 0. We apply (30) to the RHS of (29) and obtain for k → ∞ and c 2 = KBL r .Finally, for every i ∈ and the thesis follows after observing that, by (9), where the convergence of the gradient norm to zero is a consequence of (32).

Nonsmooth objectives
Now we proceed to present and study direct search methods in the context where f is Lipschitz continuous and bounded from below, but not necessarily continuously differentiable.The algorithms we devise are built around the ideas given in [13], where the authors consider direct search methods for nonsmooth objectives in Euclidean space.

Clarke stationarity for nonsmooth functions on Riemannian manifolds
In order to perform our analysis, we first need to define the Clarke directional derivative for a point x ∈ M .The standard approach is to write the function in coordinate charts and take the standard Clarke derivative in an Euclidean space (see, e.g., [17] and [18]).Formally, given a chart (ϕ, U ) at x ∈ M and v ∈ T x M , we define for f (y) = f (ϕ −1 (y)).The following lemma shows the relationship between definition (34) and a directional derivative like object defined with retractions.
The proof is rather technical and we defer it to the appendix.

Refining subsequences
We now adapt the definition of refining subsequence used in the analysis of direct search methods (see, e.g., [3,13]) to the Riemannian setting.Let (x k , d k ) be a sequence in T M.
Definition 4.1.We say that the subsequence {x i(k) } is refining if x i(k) → x, and if for every d ∈ T x M with d x = 1 there is a further subsequence {j(i(k))} such that We now give a sufficient condition for a sequence to be refining.
where in the second equality we used the continuity of P x and of the norm • x , and in the third equality we used P x * ( d) = d since d ∈ T x * M by construction.

Direct search for nonsmooth objectives
We present here our Riemannian Direct Search method based on Dense Directions (RDS-DD) for nonsmooth objectives.The scheme is presented in detail as Algorithm 4. The algorithm performs three simple steps at an iteration k.First, a given search direction is suitably projected onto the current tangent space.Then a tentative point is generated by retracting the step α k d k from the tangent space to the manifold.Such a point is then eventually accepted as the new iterate if a sufficient decrease condition Algorithm 4 RDS-DD of the objective function is satisfied (and the stepsize is expanded), otherwise the iterate stays the same (and the stepsize is reduced).
Thanks to the theoretical tools previously introduced, we can easily prove that a suitable subsequence of unsuccessful iterations of the RDS-DD method converges to a Clarke stationary point.Proof.Clearly as in the smooth case α k → 0 and in particular α i(k) → 0. Since by assumption i(k) is an unsuccessful step, we have, for every i(k) Let {j(i(k))} be such that d j(i(k)) → d, and let thanks to (38), and by applying Lemma 4.1 we get which implies the thesis since d is arbitrary.

Direct search with extrapolation for nonsmooth objectives
We present here our Riemannian Direct Search method with Extrapolation based on Dense Directions (RDSE-DD) for nonsmooth objectives.The detailed scheme is given in Algorithm 5.As we can easily see, the algorithm performs just two simple steps at an iteration k.First, a given search direction is suitably projected on the current tangent space.Then a linesearch is performed using Algorithm 3 to hopefully obtain a new point that guarantees a sufficient decrease.
Algorithm 5 RDSE-DD Once again, by exploiting the theoretical tools previously introduced, we can straightforwardly prove that a suitable subsequence of the RDSE-DD iterations converges to a Clarke stationary point.It is interesting to notice that, thanks to the use of the linesearch strategy, we are not restricted to considering unsuccessful iterations this time.
Theorem 4.2.Let {x k } be generated by Algorithm 5.If {x i(k) } is refining, with x i(k) → x * , then x * is Clarke stationary.
Proof.Let β k = αk /γ 2 if the linesearch procedure exits before the loop, and β k = γ 1 αk+1 otherwise.Clearly β k → 0, and by definition of the linesearch procedure, for every k The rest of the proof is analogous to that of Theorem 4.1.

Numerical results
We now report the results of some numerical experiments of the algorithms described in this paper on a set of simple but illustrative example problems.The comparison among the algorithms is carried out by using data and performance profiles [25].Specifically, let S be set of algorithms and P a set of problems.For each s ∈ S and p ∈ P , let t p,s be the number of function evaluations required by algorithm s on problem p to satisfy the condition where 0 < τ < 1 and f L is the best objective function value achieved by any solver on problem p.Then, performance and data profiles of solver s are the following functions where n p is the dimension of problem p.
We used a budget of 100(n p + 1) function evaluations in all cases and two different precisions for the condition (42), that is τ ∈ {10 −1 , 10 −3 }.We consider randomly generated instances of well-known optimization problems over manifolds from [1,7,16].A brief description of those problems as well as the details of our implementation can be found in the appendix (see Sections 7.2, 7.3 and 7.4).The size of the ambient space for the instances varies from 2 to 200.We would finally like to highlight that, in Section 7.5, we report further detailed numerical results, splitting the problems by ambient space dimension: between 2 and 15 for small instances, between 16 and 50 for medium instances, and between 51 and 200 for large instances.

Smooth problems
In Figure 1, we include the results related to 8 smooth instances of problem (1) from [1,7], each with 15 different problem dimensions (from 2 to 200), for a total number of 60 tested instances.We compared our methods, that is RDS-SB and RDSE-SB, with the zeroth order gradient descent (ZO-RGD, [21, Algorithm 1]).
The results clearly show that RDSE-SB performs better than RDS-SB and ZO-RGD both in efficiency and reliability for both levels of precision.By taking a look at the detailed results in Section 7.5, we can also see how the gap between RDSE-SB and the other two algorithms gets larger as the problem dimension grows.

Nonsmooth problems
We finally report a preliminary comparison between a direct search strategy and a linesearch strategy on two nonsmooth instances of (1) from [16], each with 15 different problem sizes (from 2 to 200), thus getting a total number of 30 tested instances.
In the direct search strategy (RDS-DD+), we apply the RDS-SB method until α k+1 ≤ α ǫ , at which point we switch to the nonsmooth version RDS-DD.Analogously, in the linesearch strategy (RDSE-DD+), we apply the RDSE-SB method until max j∈[1:K] αj k+1 ≤ α ǫ , at which point we switch to the nonsmooth version RDSE-DD.Both strategies use a threshold parameter α ǫ > 0 to switch from the smooth to the nonsmooth DFO algorithm.We refer the reader to [13] and references therein for other direct search strategies combining coordinate and dense directions.We report, in Figure 2, the comparison between the two considered strategies.As in the smooth case, the linesearch based strategy outperforms the simple direct search one.By taking a look at the detailed results in Section 7.5, we can once again see how the gap between the algorithms gets larger as the problem dimension gets large enough.

Conclusion
In this paper, we presented direct search algorithms with and without an extrapolation linesearch for minimizing functions over a Riemannian manifold.We found that, modulo modifications to account for the changing vector space structure with the iterations, direct search strategies provide guarantees of convergence for both smooth and nonsmooth objectives.We found also that in practice, in our numerical experiments, the extrapolation linesearch speeds up the performance of direct search in both cases, and it appears that it even outperforms a gradient approximation based zeroth order Riemannian algorithm in the smooth case.As a natural extension for future work, considering the stochastic case would be a reasonable next step.

Proofs
In order to prove Proposition 3.1 we first need the following lemma.
Lemma 7.1.For a Lipschitz continuous function h : R m → R, ỹ, ṽ ∈ R m , if ỹk → ỹ, ṽk → ṽ and Proof.We have with L h the Lipschitz constant of h.Then lim sup where we used (44) in the first equality, and with the inequality true by definition of the Clarke derivative.
Proof of Proposition 3.1.Let (ϕ) be a chart defined in a neighborhood U of x ∈ M .We use the notation (x, d) = (ϕ(x), dϕ(x)d) for (x, d) ∈ T M. We pushforward the manifold and the related structure with the chart ϕ, i.e. for φ = ϕ −1 we define With slight abuse of notation we use dist(x, ỹ) to denote dist(x, y).We also define as grad f the gradient of f with respect to the scalar product g, so that g(grad f (x), d) = ∇ f (x), d for any d ∈ R m .Importantly, by the equivalence of norms in R m we can use O( d x ) and O( d ) interchangeably.We first prove (4) in x for some constant L > 0 and any d with d ≤ B for some B > 0. Equivalently, we want to prove f First, since R is in particular and by smoothness of the parallel transport Furthermore, grad f (x + q) = Γx+q by the Lipschitz continuity assumption (6), and consequently where we used (3) in the last equality.Finally, since, d dt R(x, t d) is C 1 regular, we also have where we used (48) in the third equality, and (3) in the last one.Again by compactness, for ỹ ∈ Ũ1 , t ≤ 1, q , d ≤ B the implicit constants can be taken with no dependence from the variables.Now for d s.t.d ≤ B define q = B d/ d , so that d = tq for t = d /B.Then we obtain (46) reasoning as follows: where we used (50) and (51) in the fourth inequality.To conclude, notice that the above argument does not depend from the choice of x ∈ Ũ1 , so that it can be extended to every ỹ ∈ Ũ1 and then by compactness to every y ∈ M .
Proof of Lemma 4.1.With the notation introduced in the proof of Proposition 3.1, without loss of generality we assume that U is bounded and that ϕ can be extended to a neighborhood containing the closure of U .
First, since pushforward R of a C 2 retraction on R is a C 2 retraction itself of T R m on R m , we have the Taylor expansion R(ỹ, ṽ) = ỹ + ṽ + O( ṽ 2 ) , with the implicit constant uniform for ỹ varying in Ũ and ṽ chosen in R m .Second, for any fixed constant B > 0, by continuity we have for k → ∞, q ∈ R m with q ≤ B, and with a uniform implicit constant.Therefore where in the second inequality we used (54), and in the last equality we used where we used (53) and (55) for the first and the second summand in the second equality.In other words, ṽk → d.To conclude, lim sup where in the inequality we were able to apply (7.1) because ṽk → d by (56).

Implementation details
For all the problems, the manifold structure we used was the one available in the MANOPT library [9].After a basic tuning phase, we set the algorithm parameters as follows: we used γ 1 = 0.61, γ 2 = 1 and γ = 0.77 for Algorithm 1, γ 1 = 0.81, γ 2 = 3.12 and γ = 0.11 for Algorithm 2, and the stepsize 1.64/n (recall that n is the dimension of the ambient space) for the ZO-RGD method.
For the nonsmooth strategies RDS-DD+ and RDSE-DD+, we considered the same parameters of the smooth case for RDS-SB and RDSE-SB, setting α ǫ = 10 −3 , and for both RDS-DD and RDSE-DD used γ 1 = 0.95, γ 2 = 2, and γ = 1.The positive spanning basis was obtained both in Algorithm 1 and Algorithm 2 by projecting the positive spanning basis (e 1 , ..., e n , −e 1 , ..., −e n ) of the ambient space R n on the tangent space.The initial stepsize was set to 1 for all the direct search methods, with no fine tuning.We generated the starting point and the parameters related to the instances either with MATLAB rand function or by using the random element generators implemented in the MANOPT library.

Largest eigenvalue, singular value, and top singular values problem
In the largest eigenvalue problem [7, Section 2.3], given a symmetric matrix The largest singular value problem [7, Section 2.3] can be formulated generalizing (58): given A ∈ R m×h , we are interested in max Notice how the domain in (58) and ( 59) are a sphere and the product of two spheres respectively.Finally, to compute the sum of the top r singular values, as explained in [7, Section 2.5] it suffices to solve max for S(a, b) the Stiefel manifold with dimensions (a, b).

Dictionary learning
The dictionary learning problem [7, Section 2.4] can be formulated as for a fixed Y ∈ R d×k , λ > 0, • 1 the ℓ 1 − norm, and D 1 , ..., D h the columns of D.
In our implementation we smooth the objective by using a smoothed version In our tests, we generated the solution C using MATLAB sprand function, with a density of 0.3, set the regularization parameter λ to 0.01 and ε to 0.001.

Synchronization of rotations
Let SO(d) be the special orthogonal group: In the synchronization of rotations problem [7, Section 2.6], we need to find rotations R 1 , ..., R h ∈ SO(d) from noisy measurements In our tests, we considered the case h = 2 for simplicity.

Low-rank matrix completion
The low rank matrix completion problem [7, Section 2.7] can be written, for a fixed matrix M ∈ R m×h , as min given a positive integer r > 0 and a subset of indices Ω ⊂ [1 : m] × [1 : h].It can be proven that the optimization domain, that is the matrices in R m×n with fixed rank r, can be given a Riemannian manifold structure (see, e.g., [26]).

Gaussian mixture models
In where Sym(d) + is the manifold of positive definite matrices is the subset of strictly positive elements of the simplex ∆ K−1 , which can be given a manifold structure.In our tests, we considered the case K = 2 and the reformulation proposed in [15], which does not use the unconstrained variables (û 1 , ..., ûk ).

Procrustes problem
The Procrustes problem [1] is the following linear regression problem, for fixed A ∈ R l×n and B ∈ R l×p : In our tests, we assumed the variable X ∈ R n×p to be in the Stiefel manifold St(n, p), a choice leading to the so called unbalanced orthogonal Procrustes problem.

Nonsmooth problems
We report two nonsmooth problems taken from [16].

Sparsest vector in a subspace
Given an orthonormal matrix Q ∈ R m×n , the problem of finding the sparsest vector in the subspace generated by the columns of Q can be relaxed as min

Nonsmooth low-rank matrix completion
In the nonsmooth version of the low rank matrix completion problem (65) the Euclidean norm is replaced with the l 1 norm, so that in the objective we have a sum of absolute values: (70)

Additional numerical results
We include here the performance and data profiles split by problem size.

Proposition 4 . 1 .
If x i(k) → x * , di(k) is dense in the unit sphere, andd i(k) = P k ( di(k) )/ P k ( di(k) ) k for P k ( di(k) ) =0 and d i(k) = 0 otherwise, then it holds that the subsequence {x i(k) } is refining.Proof.Fix d ∈ T x * M, with d x * = 1, and let d = d/ d .By density, we have that dj(i(k)) → d for a proper choice of the subsequence {j(i(k))}.Then lim k→∞

Theorem 4 . 1 .
Let {x k } be generated by Algorithm 4. If {x i(k) } is refining, with x i(k) → x * , and i(k) is an unsuccessful iteration for every k ∈ N ∪ {0}, then x * is Clarke stationary.

Figure 2 :
Figure 2: Nonsmooth case: results for all the instances for d s.t.d ≤ B. By compactness we can choose (ϕ, U ) and B > 0 in such a way that, for every ỹ ∈ Ũ1 ⊂ Ũ and d with d ỹ ≤ B we have R(ỹ, d) ∈ Ũ2 ⊂ Ũ , with Ũ2 compact and B > 0 independent from x, ỹ, d.

Figure 3 :
Figure 3: From top to bottom: results for small, medium and large instances in the smooth case.

Figure 4 :
Figure 4: From top to bottom: results for small, medium and large instances in the nonsmooth case.
the Gaussian mixture model problem [7, Section 2.8], we are interested in computing a maximum likelihood estimation for a given set of observations x 1 , ..., x h :