Abstract
This paper proposes an algorithm for the unconstrained minimization of a class of nonsmooth and nonconvex functions that can be written as finite-max functions. A gradient and function-based sampling method is proposed which, under special circumstances, either moves superlinearly to a minimizer of the problem of interest or improves the optimality certificate. Global and local convergence analysis are presented, as well as examples that illustrate the obtained theoretical results.
Similar content being viewed by others
Notes
For example, setting \(s_j\) as the orthogonal projection of \(v_j\) over the hyperplane generated by \(\{ v_i : i\in \mathcal I(x)\text{, }~i\ne j\}\), one can consider \(d_j = (v_j - s_j)/\Vert v_j - s_j\Vert \).
The GS code can be found at http://cs.nyu.edu/overton/papers/gradsamp/alg/.
References
Balinski, M.L., Wolfe, P.: Nondifferentiable Optimization. Mathematical Programming Studies, vol. 3. North-Holland, Amsterdam (1975)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27(3), 567–584 (2002)
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Montreal (1990)
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory, vol. 178. Springer, New York (2008)
Crema, A., Loreto, M., Raydan, M.: Spectral projected subgradient with a momentum term for the Lagrangean dual approach. Comput. Oper. Res. 34(10), 3174–3186 (2007)
Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22(2), 474–500 (2012)
Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for non-smooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013)
Curtis, F.E., Que, X.: A quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Math. Program. Comput. 7(4), 399–428 (2015)
Daniilidis, A., Sagastizábal, C., Solodov, M.: Identifying structure of nonsmooth convex functions by the bundle technique. SIAM J. Optim. 20(2), 820–840 (2009)
Di Pillo, G., Grippo, L., Lucidi, S.: A smooth method for the finite minimax problem. Math. Program. 60(1), 187–214 (1993)
Do, T.M.T., Artières, T.: Regularized bundle methods for convex and non-convex risks. J. Mach. Learn. Res. 13(1), 3539–3583 (2012)
Dotta, D., Silva, A.S., Decker, I.C.: Design of power system controllers by nonsmooth, nonconvex optimization. In: Power Energy Society General Meeting, 2009. PES ’09. IEEE, pp. 1–7 (2009)
Du, D.Z., Pardalos, P.M.: Minimax and Applications, vol. 4. Springer, Boston (2013)
Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization. Optim. Methods Softw. 19(1), 89–102 (2004)
Gaudioso, M., Gorgone, E., Monaco, M.F.: Piecewise linear approximations in nonconvex nonsmooth optimization. Numerische Mathematik 113(1), 73–88 (2009)
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2005)
Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)
Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Grothey, A., McKinnon, K.: A superlinearly convergent trust region bundle method. Report, Department of Mathematics & Statistics, Edinburgh University (1998)
Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the differentiability check in gradient sampling methods. Optim. Methods Softw. 31(5), 983–1007 (2016)
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Appl. 175(1), 137–157 (2017)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, New York (1993)
Huber, G.: Gamma function derivation of n-sphere volumes. Am. Math. Mon. 89(5), 301–302 (1982)
Kelley Jr., J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)
Kiwiel, K.C.: Methods of descent for nondifferentiable optimization, vol. 1133. Springer, Berlin (1985)
Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6(1), 227–249 (1996)
Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)
Lemaréchal, C., Mifflin, R.: Global and superlinear convergence of an algorithm for one-dimensional minimization of convex functions. Math. Program. 24(1), 241–256 (1982)
Lemaréchal, C., Oustry, F., Sagastizábal, C.: The U-Lagrangian of a convex function. Trans. Am. Math. Soc. 352(2), 711–729 (2000)
Lemaréchal, C., Sagastizábal, C.: Practical aspects of the moreau–yosida regularization: theoretical preliminaries. SIAM J. Optim. 7(2), 367–385 (1997)
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1–2), 135–163 (2013)
Loreto, M., Aponte, H., Cores, D., Raydan, M.: Nonsmooth spectral gradient methods for unconstrained optimization. EURO J. Comput. Optim. 5(4), 529–553 (2017)
Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83(1–3), 373–391 (1998)
Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002)
Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems. Ph.D. thesis, Imperial College, London (1978)
Maréchal, P., Ye, J.J.: Optimizing condition numbers. SIAM J. Optim. 20(2), 935–947 (2009)
Mifflin, R., Sagastizábal, C.: VU-decomposition derivatives for convex max-functions. In: Théra, M., Tichatschke, R. (eds.) Ill-posed Variational Problems and Regularization Techniques. Lecture Notes in Economics and Mathematical Systems, vol. 477, pp. 167–186. Springer, Berlin (1999)
Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. 104(2–3), 583–608 (2005)
Mifflin, R., Sagastizábal, C.: A science fiction story in nonsmooth optimization originating at IIASA. In: Grötschel, M. (ed.) Documenta Mathematica Optimization Stories, pp. 291–300. Deutschen Mathematiker-Vereinigung, Bielefeld (2012)
Miller, S.A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among \(\cal{U}\)-Lagrangian, Riemannian Newton and SQP methods. Math. Program. 104(2), 609–633 (2005)
Moreau, J.J., Panagiotopoulos, P.D.: Nonsmooth Mechanics and Applications, vol. 302. Springer, Vienna (2014)
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Oliveira, W., Sagastizábal, C.: Bundle methods in the XXIst century: a bird’s-eye view. Pesquisa Operacional 34(3), 647–670 (2014)
Outrata, J., Kočvara, M., Zowe, J.: Nonsmooth Approach to Optimization Problems with Equilibrium Constraints: Theory, Applications and Numerical Results, vol. 28. Kluwer Academic Publishers, The Netherlands (2013)
Peng, C., Jin, X., Shi, M.: Epidemic threshold and immunization on generalized networks. Phys. A Stat. Mech. Appl. 389(3), 549–560 (2010)
Wang, F.C., Chen, H.T.: Design and implementation of fixed-order robust controllers for a proton exchange membrane fuel cell system. Int. J. Hydrog. Energy 34(6), 2705–2717 (2009)
Zhang, J., Kim, N.H., Lasdon, L.: An improved successive linear programming algorithm. Manag. Sci. 31(10), 1312–1331 (1985)
Acknowledgements
We are greatly thankful to the anonymous referees who carefully read a previous version of this paper and contributed with valuable suggestions to improve our manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Elias S. Helou was partially supported by FAPESP Grants 2013/16508-3 and 2013/07375-0 and by CNPq Grant 311476/2014-7. Sandra A. Santos was partially supported by CNPq Grant 302915/2016-8, FAPESP Grants 2013/05475-7 and 2013/07375-0 and PRONEX Optimization. Lucas E. A. Simões was supported by FAPESP Grants 2013/14615-7 and 2016/22989-2.
Appendix
Appendix
The aim of this appendix is to show that the assumption \((\lambda _{k,{\overline{l}}_k})_i = 0\), whenever \(i \notin \mathcal I(x_*)\) and \(k\in \mathcal K\), with \(\mathcal K\) defined in Corollary 2, is not necessary. For this goal, we will show that even without such an assumption, the results from the local convergence subsection remain the same.
We divide our reasoning in two cases and remind the reader that we have assumed \(\mathcal I(x_*) = \{1,\ldots ,r+1\}\):
-
(A1)
The cardinality of \(\mathcal I(x_*)\) is \(n+1\);
-
(A2)
The cardinality of \(\mathcal I(x_*)\) is \(r+1\) with \(r < n\).
Suppose first that A1 holds and let us consider an iterate \(x_k\) sufficiently close to \(x_*\). Moreover, assume that \(k\in \mathcal K\), where \(\mathcal K\) is the index set defined in Corollary 2. Then, looking at the optimization problem in (22), we see that any additional active constraint will generate an additional active constraint to (22) in a way that it will be a linear combination of the first \(n+1\) active constraints (by Remark 2 and because the rank of \({\tilde{J}}_{k}\) remains constant in a close neighborhood of \(x_*\)). Hence, the solution obtained with, or without, this additional constraint is the same, which yields that the results presented at the local convergence subsection do not change for this special case.
So, let us consider the more intricate case A2. Moreover, let us assume that there is only one additional constraint, i.e., the number of active constraints is \(r+2\) (we will see that the occurrence of more than one additional constraint will be a straightforward generalization of this simpler case). In other words, we are saying that solving (5) is equivalent to minimize
where here we assume that rearrangements were done in order to have the additional constraint as the \((r+2)\)-th constraint and that it has the associated sampled point \(x_{k,r+2}^{{\overline{l}}_k}\). Therefore, for an iterate \(x_k\) sufficiently close to the solution and a sufficiently small sampling radius, we have, by the continuity of the functions \(\phi _i\), that only the functions \(\phi _1,\ldots ,\phi _{r+1}\) can assume the maximum at any sampled point. So, there exists \(j\in \{1,\ldots ,r+1\}\) such that \(f(x_{k,r+2}^{{\overline{l}}_k}) = \phi _j(x_{k,r+2}^{{\overline{l}}_k})\). Consequently, recalling that \(k\in \mathcal K\), the dual problem of the above minimization problem can be seen as
Therefore, we can turn this last constrained maximization problem into an unconstrained one by making the following substitution \(\lambda _{r+2} = 1 - \sum _{i=1}^{r+1} \lambda _i\). So, we have
Since the above problem is concave and smooth, its solution \({\overline{\lambda }}\in {\mathbb {R}}^{r+1}\) can be obtained by equaling the derivative of the objective function to the null vector. Consequently, assuming without loss of generality that the function \(\phi _j\) involved in the additional constraint is \(\phi _{r+1}\), we have
Now, changing the points \(x_{k,r+2}^{{\overline{l}}_k}\) for \(x_{k,r+1}^{{\overline{l}}_k}\) and redefining
we get
This last linear system yields
Therefore, following the same reasoning used by us to get here, it is possible to see that the first r components of the dual variable \({{\hat{\lambda }}} \in {\mathbb {R}}^{r+1}\) linked to the problem (21) must satisfy the last linear system obtained above (not considering the remaining error vector) and, moreover,
Therefore, considering \(\lambda ^*\in {\mathbb {R}}^{r+2}\) the solution of (36) and using equation (37), we must have
So, to complete our reasoning, we write the following relation between the primal-dual variables
Hence, \(d_{k,{\overline{l}}_k}\) is exactly the search direction obtained in (21) with an additional error vector. Therefore, the term \(O\left( \tau _{k,{\overline{l}}_k}\right) \) is absorbed by the other error vectors in Theorem 3 and the result is still valid.
Finally, remember that we have considered just one additional active constraint to the others \(r+1\) active constraints. However, it is straightforward to see that exactly the same reasoning can be used to prove the result for any other number of additional constraints.
Rights and permissions
About this article
Cite this article
Helou, E.S., Santos, S.A. & Simões, L.E.A. A fast gradient and function sampling method for finite-max functions. Comput Optim Appl 71, 673–717 (2018). https://doi.org/10.1007/s10589-018-0030-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-018-0030-2