Abstract
In this paper we propose an heuristic to improve the performances of the recently proposed derivativefree method for nonsmooth optimization CSDFN. The heuristic is based on a clusteringtype technique to compute an estimate of Clarke’s generalized gradient of the objective function, obtained via calculation of the (approximate) directional derivative along a certain set of directions. A search direction is then calculated by applying a nonsmooth Newtontype approach. As such, this direction (as it is shown by the numerical experiments) is a good descent direction for the objective function. We report some numerical results and comparison with the original CSDFN method to show the utility of the proposed improvement on a set of wellknown test problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the following unconstrained minimization problem
We assume that the objective function f, (though nonsmooth) is Lipschitz continuous and that firstorder information is unavailable or impractical to obtain. We require the following assumption.
Assumption 1
The function f(x) is coercive, i.e. every level set is compact.
Useless to say that there is plenty of problems with the above features, especially coming from the engineering context. In the literature, many approaches have been proposed to tackle the nonsmooth problem (1) in the derivativefree framework. They can be roughly subdivided into two main classes: directtype algorithms and modelbased algorithms.

Directtype methods. The algorithms belonging to this class make use of suitable sampling of the objective function. They occasionally can heuristically use modeling techniques, but the convergence theory hinges on the sampling technique. In this class of methods, we cite the mesh adaptive direct search algorithm implemented in the software package NOMAD [2, 10], the linesearch derivativefree algorithm CSDFN proposed in [6] and the discrete gradient method [3].

Modelbased methods. This class comprises all those algorithms whose convergence is based on the strategy used to build the approximating models. Within this class we can surely cite the recent trustregion derivativefree method proposed in [11].
In the relatively recent paper [6], a method for optimization of nonsmooth blackbox problems has been proposed, namely CSDFN. CSDFN is able to solve problems more general than problem (1) above since it can handle also nonlinear and bound constraints. It is based on a penalization approach, namely the nonlinear constraints are penalized by an exact penalization mechanism whereas (possible) bound constraints on the variables are handled explicitly.
In this paper, we propose an improvement of CSDFN by incorporating into its main algorithmic scheme a clustering heuristic to compute efficient search directions. Starting from an approximation of the directional derivatives along a certain set of directions, we construct a polyhedral approximation of the subdifferential which in turn is used to calculate a search direction in the steepest descent fashion. Along such direction we implement a linesearch procedure with extrapolation just like the one adopted by CSDFN to explore its directions.
To asses the potentialities of the proposed improvement, we carry out an experimentation and comparison of CSDFN with and without the proposed heuristic. The results, in our opinion, clearly show the advantages of the improved method over the original one.
The paper is organized as follows. In Sect. 2 we extend to a nonsmooth setting the steepest descent direction and a kind of Newtontype directions. In Sect. 3 we propose an heuristic to compute possibly efficient directions in a derivativefree context. In section 4 we describe an improved version of the CSDFN algorithm which is obtained by suitably employing the improved directions just described. In Sect. 5 we report the results of a numerical comparison between CSDFN and the proposed improved version on a set of wellknown test problems. Finally, Sect. 6 is devoted to some discussion and conclusions.
1.1 Definitions and notations
Definition 1
Given a point \(x\in \Re ^n\) and a direction \(d\in \Re ^n\), the Clarke directional derivative of f at x along d is defined as [4]
Moreover the Clarke generalized gradient (or subdifferential) \(\partial _C f(x)\) is defined as
\(\varOmega _f\) being the set (of zero measure) where f is not differentiable.
The following property holds:
Definition 2
A point \(x^*\in \Re ^n\) is Clarke stationary for Problem (1) when \(f^\circ (x^*,d) \ge 0\), for all \(d\in \Re ^n\).
In the following, we denote by \(e_i\), \(i=1,\dots ,n\), the ith column of the canonical basis in \(\Re ^n\) and by e a vector of all ones of appropriate dimensions.
2 Descent type directions
In the context of nonsmooth optimization, efficient search directions can be computed by using the information provided by the subdifferential of the objective function. In the following subsections, we describe how such directions can be obtained.
2.1 Steepest descent direction \(g_k^S\)
In this subsection we recall a classic approach [14] to compute a generalization to nonsmooth functions of the steepest descent direction for continuously differentiable functions.
Let us consider the vector which minimizes the following “first ordertype” model of the objective function.
Note that, in the case of continuously differentiable functions, we have that \(f^\circ (x_k;\,d)=\nabla f(x_k)^Td_k\) and that the solution of Problem (3) is given by \(d^*=\nabla f(x_k)\).
For nonsmooth functions, standard results [14] lead to the following proposition.
Proposition 1
Let \(d^S\) be the solution of Problem (3). Then

(i)
The vector \(d^*\) is given by
$$\begin{aligned} d^S=g_k^S \end{aligned}$$where
$$\begin{aligned} \begin{array}{l} g_k^S=\quad \hbox {argmin}\ \Vert \xi \Vert ^2\\ \qquad \qquad s.t.\ \xi \in \partial f(x_k) \end{array} \end{aligned}$$(4) 
(ii)
The vector \(d^S\) satisfies \(f^\circ (x_k; g_k^S)=\Vert g_k^S\Vert ^2\).

(iii)
For any \(\gamma \in (0,1)\) there exists a \(\bar{\alpha }>0\) such that
$$\begin{aligned} f(x_k\alpha g_k^S)\le f(x_k)  \alpha \gamma \Vert g_k^S\Vert ^2 \end{aligned}$$with \(\alpha \in (0, \bar{\alpha }]\)
The above \(d_k^S\) direction is a firstorder direction which (closely) resembles the steepestdescent direction for continuously differentiable case.
2.2 Newtontype direction \(d_k^N\)
In the nonsmooth case, obtaining a Newtontype direction is much more involved than in the differentiable case. In the latter case it suffices to premultiply the antigradient by the Hessian of the objective function. In the nonsmooth case instead of simply premultiplying direction \(g_k^S\) by any positive definite matrix, we resort to minimizing the following “second ordertype” model.
where \(B_k\) is a positive definite matrix. Let us call the solution of problem (5) \(d_k^N\).
For problem (5) the following proposition can be proved.
Proposition 2
Let \(B_k\) be positive definite and assume \(d_k^N\) be the solution of Problem (5). Then

(i)
The vector \(d_k^N\) is given by
$$\begin{aligned} d_k^N=B_k^{1}g_k^N \end{aligned}$$where
$$\begin{aligned} \begin{array}{l} g_k^N=\quad \hbox {argmin}\ \xi ^TB_k^{1}\xi \\ \qquad \qquad s.t. \ \xi \in \partial f(x_k) \end{array} \end{aligned}$$(6) 
(ii)
The vector \(d_k^N\) satisfies \(f^\circ (x_k; d_k^N)=(g_k^N)^TB_k^{1}g_k^N=(g_k^N)^Td_k^N\).

(iii)
For any \(\gamma \in (0,1)\) there exists a \(\bar{\alpha }>0\) such that
$$\begin{aligned} f(x_k\alpha B_k^{1}g_k^N)\le f(x_k)  \alpha \gamma (g_k^N)^TB_k^{1}g_k^N \end{aligned}$$with \(\alpha \in (0, \bar{\alpha }]\)
Proof
By repeating the similar arguments of proof of Theorem 5.2.8 in [14] we have that function \(\phi (d)=f^\circ (x_k; d)+\frac{1}{2}d^TB_kd\) is strictly convex. Therefore Problem (5) has a unique minimizer \(d^*\) such that:
Recalling Lemma 5.2.7 of [14] we have:
The relations (7) and (8) imply that a vector \(g_k^N\) exists such that:
and, hence,
which proves point (ii) by setting \(d_k^N=d^*\).
Now the definition of \(f^\circ (x_k; B_k^{1}g_k^N)\) and (9) give:
which implies
Therefore, (11) shows that the vector \(g_k^N\) is the unique solution of Problem 6.
Finally point (iii) again follows from definition of \(f^\circ (x_k; B_k^{1}g_k^N)\) and (9).\(\triangleleft\) \(\square\)
3 An heuristic approach to define efficient directions
At the base of the proposed heuristics is the hypothesis that nonsmoothness of the objective function is due to its finite \(\max\) structure. Such hypothesis appear realistic as a wide range of nonsmooth optimization problems, coming from practical applications, are of the \(\min \max\) type. The essence of our method is the attempt to construct an approximation of the subdifferential by estimating a certain set of subgradients (the generators in the following) starting from an estimate of the directional derivatives along a sufficiently large sets of directions.
The only assumptions about function f are Lipschitz continuity and Assumption 1, nevertheless, drawing inspiration from the paper [13] (see also [1]), given points \(y_j\in \Re ^n\), \(j= \{1,2,\dots ,p\}\), sufficiently close to x, we approximate f(x) by using the following piecewise quadratic model function,
with
where \(g_j\in \partial f(y_j)\) and \(H_j = H(y_j)\), \(j=1,\dots ,p\). We remark that, while we assume that the model structure of f is a \(\max\) of a finite number of functions, the number p of such functions is unknown and has to be estimated via a trial–and–error calculation process.
We can write,
Furthermore, by assuming that \(f(x)\approx f^\Box (x)\), we have
In the actual case, C(x) is the convex hull of a given number of generator vectors \(v_j\), \(j=1,\dots ,p\). We can try and estimate those generators by using the quantities computed by the algorithm.
More in particular, let \(x_k\) be the current iterate of the algorithm. Assume that a certain set of directions \(d_i\in \Re ^n\) and \(\alpha _i > 0\), \(i=1,\dots ,r\), along with their respective stepsizes, are available. They can be either directions where failure in the descent has previously occurred or even predefined ones, e.g. the unit vectors (\(d_i=\pm e_i\)). Define
By using (12), for \(i=1,\dots ,r\),
It is then possible to compute estimates of the generators \(v_j\), \(j=1,\dots ,p\), as those which provide the best approximation of the \(s_i\)’s, hence we solve the problem
The above problem is a hard, nonsmooth nonconvex problem of the clustering type. It can be put however in DC (Difference of Convex) form as in [9]. Since it has to be solved many times during the proposed algorithm, we prefer to resort, in our implementation, to a greedy heuristic of the kmeanstype [7, 12, 16]. It works as follows. An initial set of p tentative generators is defined. Then each couple \((d_i,s_i)\) is assigned to the generator which ensures the best approximation of \(s_i\). Once the couples have been clustered, the generators are updated in a leastsquare fashion and the procedure is repeated.
Then, we can compute an estimate of direction \(d_k^N\) by solving problem (6) where \(\partial f(x_k)\) is approximated by \(conv(\hat{v}_i,\dots ,\hat{v}_p)\). More precisely, we define the following algorithm that computes a search direction.
In the following we give an example of how the heuristic works.
Example 1
Consider the (convex) nonsmooth function maxl [13], defined as
Take point \(\bar{x}\), \(\bar{x}_i=1\), \(i=1,\ldots ,n\), where f exhibits a kink and it is \(f(\bar{x})=1\). Observe that none among the 2n (signed) coordinate directions \(\pm e_i\) is a descent one at \(\bar{x}\) (it is in fact \(f^\circ (\bar{x};e_i)=0\) and \(f^\circ (\bar{x};e_i)=1\), \(i=1,\ldots ,n\)). Calculation of the 2n ratios \(s_i\) as in (13), along the directions \(e_i\) and \(e_i\) leads to \(s_i=1\) and \(s_i=0\), respectively, for \(i=1,\ldots , n\). It is easy to verify that, letting \(p=n\) in Algorithm 1, an optimal solution to problem (14) is \(\hat{v}_j=e_j\), \(j=1,\ldots ,n\). Finally, solving
we obtain \(\bar{d}=\displaystyle \frac{e}{n}\), which is indeed a descent direction at \(\bar{x}\).
4 The improved CSDFN algorithm
This section is devoted to the definition of the improved version of algorithm CSDFN which we call FastCSDFN. The method is basically the CSDFN Algorithm introduced in reference [6], a derivativefree linesearchtype algorithm for the minimization of blackbox (possibly) nonsmooth functions. It works by performing derivativefree line searches along the coordinate directions and resorting to the use of a further search direction when the stepsizes used to explore the coordinate directions are sufficiently small. The rationale behind this choice is connected with the observation that the coordinate directions might not be descent directions near a nonstationary point of nonsmoothness. In such situations, a richer set of directions must be used to (at least asymptotically) be able to improve the nonstationary point. The convergence analysis of CSDFN carried out in [6] hinges on the use of asymptotically dense sequences of search directions so that, at nonstationary points, for sufficiently large k a direction of descent is used.
The algorithm that we propose, namely FastCSDFN, is a modification of CSDFN. The relevant differences between the two methods are:

1.
for the sake of simplicity, problem (1) is unconstrained; hence in FastCSDFN no control to enforce feasibility with respect to the bound constraints is needed;

2.
after the deployment of the direction \(d_k\), FastCSDFN makes use of Algorithm 2 to compute a direction that tries to exploits the information gathered during the optimization process to heuristically improve the last produced point.
The FastCSDFN Algorithm is reported in Algorithm 3.
Some comments about Algorithm FastCSDFN are in order.

1.
FastCSDFN except for steps 14–18 and for the mechanism used to produce \(G_{k+1}\) starting from \(G_k\), exactly is the CSDFN method as described in [6];

2.
The new direction \(\hat{d}_k^N\) is used when the stepsizes \(\alpha _k^i\) and \(\tilde{\alpha }_k^i\), \(i=1,\dots ,n\), are sufficiently small and after the deployment of the direction \(d_k\);

3.
The computation of the new direction \(\hat{d}_k^N\) performed at step 15 hinges (a) on the matrix \(B_k\) and (b) on the set of couples \(G_k^{n+2}\).

(a)
To build \(B_k\), we maintain a set of points \(Y_k\) which is managed in just the same way as described in [6];

(b)
As for the set \(G_k^{n+2}\), it stores information on the consecutive failures encountered up to the current point, i.e. in the deployment of the coordinate directions and the direction \(d_k\). The direction \(d_k\) is the one (possibly) used at step 11 of Algorithm FastCSDFN. Note also that set \(G_k^{n+1}\) is emptied every time a nonnull step is computed by the algorithm along any direction;

(a)

4.
The asymptotic convergence properties of FastCSDFN are analogous to that of CSDFN. The theoretical analysis follows quite easily from the results proved for CSDFN in [6]. As it can be noted, the new iterate \(x_{k+1}\) defined by Algorithm FastCSDFN is such that \(f(x_{k+1})\le f(y_k^{n+2})\). In fact, when step 17 is executed \(x_{k+1} = y_k^{n+2} + \check{\alpha }\check{d}_k\) and \(f(x_{k+1})\le f(y_k^{n+2})\). When step 21 is executed, then \(x_{k+1}=y_k^{n+2}\) and \(f(x_{k+1})=f(y_k^{n+2})\).
5 Numerical results
The proposed FastCSDFN algorithm has been implemented in Python 3.9 and compared with CSDFN [6] (available through the DFL library http://www.iasi.cnr.it/∼liuzzi/dfl as package FASTDFN). In the implementation of FastCSDFN we adopted all the choices of CSDFN and we set \(h_{\max } = 10\) in Algorithm 1 and \(\epsilon =1\) in Algorithm 2. The comparison has been carried out on a set of 47 nonsmooth problems. In the following subsections we briefly describe the test problems collection, the metrics adopted in the comparison and, finally, the obtained results.
5.1 Test problems collection
In Table 1 description of the test problems is reported. In particular, each table entry gives the problem name, the number n of variables and the reference where the problem definition can be found.
5.2 Metrics
To compare our derivativefree algorithms we resort to the use of the wellknown performance and data profiles (proposed in [5] and [15], respectively). In particular, let \(\mathcal P\) be a set of problems and \(\mathcal S\) a set of solvers used to tackle problems in \(\mathcal P\). Let \(\tau > 0\) be a required precision level and denote by \(t_{ps}\) the performance index, that is the number of function evaluations required by solver \(s\in \mathcal S\) to solve problem \(p\in \mathcal P\). Problem p is claimed to be solved when a point x has been obtained such that the following criterion is satisfied
where \(f(x_0)\) is the initial function value and \(f_L\) denotes the best function value found by any solver on problem p itself. Then, the performance ratio \(r_{ps}\) is
Finally, the performance and data profiles of solver s are so defined
where \(n_p\) is the number of variables of problem p. Particularly, the performance profile \(\rho _s(\alpha )\) tells us the fraction of problems that solver s solves with a number of function evaluation which is at most \(\alpha\) times the number of function evaluations required by the best performing solver on that problem. On the other hand, the data profile \(d_s(\kappa )\) indicates the fraction of problems solved by s with a number of function evaluations which is at most equal to \(\kappa (n_p+1)\), that is the number of function evaluations required to compute \(\kappa\) simplex gradients.
When using performance and data profiles for benchmarking derivativefree algorithms, it is quite usual to consider (at least) three different levels of precision (low, medium and high) corresponding to \(\tau = 10^{1},10^{3},10^{5}\), respectively.
5.3 Results
Figure 1 reports the results of the comparison by means of performance and data profiles between FastCSDFN and CSDFN.
As we can see, the new algorithm FastCSDFN is always more robust, namely it is able to solve the largest portion of problems within a given amount of computational effort. More in particular, from the performance profiles, we can also say that the new method is invariably more efficient than the original one since the profile curves always have higher values for \(\alpha = 1\).
6 Conclusions
In the paper, we propose a strategy to compute (possibly) good descent directions that can be further heuristically exploited within derivativefree algorithms for nonsmooth optimization. In fact, we show that the use of the proposed direction within the CSDFN algorithm [6] improves the performances of the method. Numerical results on a set of nonsmooth optimization problems from the literature show the efficiency of the proposed direction computation strategy.
As a final remark, we point out that the proposed strategy could be embedded in virtually any optimization algorithm as an heuristic to try and produce improving points.
Data availability
The data sets generated during and/or analyzed during the current study are available in the DFL repository, http://www.iasi.cnr.it/∼liuzzi/dfl as package FASTDFN.
References
Astorino, A., Frangioni, A., Gaudioso, M., Gorgone, E.: Piecewise quadratic approximations in convex numerical optimization. SIAM J. Optim. 21(4), 1418–1438 (2011)
Audet, C., Le Digabel, S., Rochon Montplaisir, V., Tribes, C.: Nomad version 4: Nonlinear optimization with the mads algorithm. arxiv:2104.1167 (2021)
Bagirov, A.M., Karasözen, B., Sezer, M.: Discrete gradient method: derivativefree method for nonsmooth optimization. J. Optim. Theory Appl. 137(2), 317–334 (2008)
Clarke, F.H.: Optimization and nonsmooth analysis. SIAM (1990)
Dolan, E.D., Morè, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Fasano, G., Liuzzi, G., Lucidi, S., Rinaldi, F.: A linesearchbased derivativefree approach for nonsmooth constrained optimization. SIAM J. Optim. 24(3), 959–992 (2014)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21(3), 768–769 (1965)
Karmitsa, N.: Test problems for largescale nonsmooth minimization. Reports of the Department of Mathematical Information Technology. Series B, Scientific computing, 4/2007 (2007)
Khalaf, W., Astorino, A., d’Alessandro, P., Gaudioso, M.: A dc optimizationbased clustering technique for edge detection. Optim. Lett. 11, 627–640 (2017)
Le Digabel, S.: Algorithm 909: Nomad: nonlinear optimization with the mads algorithm. ACM Trans. Math. Softw. 37(4), 44:144:15 (2011)
Liuzzi, G., Lucidi, S., Rinaldi, F., Vicente, L.N.: Trustregion methods for the derivativefree optimization of nonsmooth blackbox functions. SIAM J. Optim. 29(4), 3012–3035 (2019)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Luks̆an, L., Vlec̆ek, J.: A bundleNewton method for nonsmooth unconstrained minimization. Math. Program. 83, 373–391 (1998)
Mäkelā, M.M., Neittaanmāki, P.: Nonsmooth optimization: analysis and algorithms with applications to optimal control. World Scientific Press (1992)
Morè, J.J., Wild, S.M.: Benchmarking derivativefree optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009)
Selim, S.Z., Ismail, M.A.: Kmeanstype algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 1, 81–87 (1984)
Acknowledgements
We are really indebted with the anonymous Reviewer for their useful comments and suggestions which greatly helped us improving the manuscript.
Funding
Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUICARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gaudioso, M., Liuzzi, G. & Lucidi, S. A clustering heuristic to improve a derivativefree algorithm for nonsmooth optimization. Optim Lett 18, 57–71 (2024). https://doi.org/10.1007/s11590023020424
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590023020424