A fast gradient and function sampling method for finite-max functions

Helou, Elias S.; Santos, Sandra A.; Simões, Lucas E. A.

doi:10.1007/s10589-018-0030-2

A fast gradient and function sampling method for finite-max functions

Published: 23 August 2018

Volume 71, pages 673–717, (2018)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

336 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

This paper proposes an algorithm for the unconstrained minimization of a class of nonsmooth and nonconvex functions that can be written as finite-max functions. A gradient and function-based sampling method is proposed which, under special circumstances, either moves superlinearly to a minimizer of the problem of interest or improves the optimality certificate. Global and local convergence analysis are presented, as well as examples that illustrate the obtained theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A conjugate gradient sampling method for nonsmooth optimization

Article 06 May 2019

On the Local Convergence Analysis of the Gradient Sampling Method for Finite Max-Functions

Article 30 August 2017

Gradient Sampling Methods for Nonsmooth Optimization

Notes

For example, setting $s_j$ as the orthogonal projection of $v_j$ over the hyperplane generated by $\{ v_i : i\in \mathcal I(x)\text{, }~i\ne j\}$, one can consider $d_j = (v_j - s_j)/\Vert v_j - s_j\Vert $.
The GS code can be found at http://cs.nyu.edu/overton/papers/gradsamp/alg/.

References

Balinski, M.L., Wolfe, P.: Nondifferentiable Optimization. Mathematical Programming Studies, vol. 3. North-Holland, Amsterdam (1975)
Google Scholar
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Book Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27(3), 567–584 (2002)
Article MathSciNet Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Article MathSciNet Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Montreal (1990)
Book Google Scholar
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory, vol. 178. Springer, New York (2008)
MATH Google Scholar
Crema, A., Loreto, M., Raydan, M.: Spectral projected subgradient with a momentum term for the Lagrangean dual approach. Comput. Oper. Res. 34(10), 3174–3186 (2007)
Article MathSciNet Google Scholar
Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22(2), 474–500 (2012)
Article MathSciNet Google Scholar
Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for non-smooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013)
Article MathSciNet Google Scholar
Curtis, F.E., Que, X.: A quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Math. Program. Comput. 7(4), 399–428 (2015)
Article MathSciNet Google Scholar
Daniilidis, A., Sagastizábal, C., Solodov, M.: Identifying structure of nonsmooth convex functions by the bundle technique. SIAM J. Optim. 20(2), 820–840 (2009)
Article MathSciNet Google Scholar
Di Pillo, G., Grippo, L., Lucidi, S.: A smooth method for the finite minimax problem. Math. Program. 60(1), 187–214 (1993)
Article MathSciNet Google Scholar
Do, T.M.T., Artières, T.: Regularized bundle methods for convex and non-convex risks. J. Mach. Learn. Res. 13(1), 3539–3583 (2012)
MathSciNet MATH Google Scholar
Dotta, D., Silva, A.S., Decker, I.C.: Design of power system controllers by nonsmooth, nonconvex optimization. In: Power Energy Society General Meeting, 2009. PES ’09. IEEE, pp. 1–7 (2009)
Du, D.Z., Pardalos, P.M.: Minimax and Applications, vol. 4. Springer, Boston (2013)
Google Scholar
Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization. Optim. Methods Softw. 19(1), 89–102 (2004)
Article MathSciNet Google Scholar
Gaudioso, M., Gorgone, E., Monaco, M.F.: Piecewise linear approximations in nonconvex nonsmooth optimization. Numerische Mathematik 113(1), 73–88 (2009)
Article MathSciNet Google Scholar
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2005)
Article MathSciNet Google Scholar
Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)
Article MathSciNet Google Scholar
Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Book Google Scholar
Grothey, A., McKinnon, K.: A superlinearly convergent trust region bundle method. Report, Department of Mathematics & Statistics, Edinburgh University (1998)
Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)
Article MathSciNet Google Scholar
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the differentiability check in gradient sampling methods. Optim. Methods Softw. 31(5), 983–1007 (2016)
Article MathSciNet Google Scholar
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Appl. 175(1), 137–157 (2017)
Article MathSciNet Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, New York (1993)
Book Google Scholar
Huber, G.: Gamma function derivation of n-sphere volumes. Am. Math. Mon. 89(5), 301–302 (1982)
Article MathSciNet Google Scholar
Kelley Jr., J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)
Article MathSciNet Google Scholar
Kiwiel, K.C.: Methods of descent for nondifferentiable optimization, vol. 1133. Springer, Berlin (1985)
MATH Google Scholar
Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6(1), 227–249 (1996)
Article MathSciNet Google Scholar
Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)
Article MathSciNet Google Scholar
Lemaréchal, C., Mifflin, R.: Global and superlinear convergence of an algorithm for one-dimensional minimization of convex functions. Math. Program. 24(1), 241–256 (1982)
Article MathSciNet Google Scholar
Lemaréchal, C., Oustry, F., Sagastizábal, C.: The U-Lagrangian of a convex function. Trans. Am. Math. Soc. 352(2), 711–729 (2000)
Article MathSciNet Google Scholar
Lemaréchal, C., Sagastizábal, C.: Practical aspects of the moreau–yosida regularization: theoretical preliminaries. SIAM J. Optim. 7(2), 367–385 (1997)
Article MathSciNet Google Scholar
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)
Article MathSciNet Google Scholar
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1–2), 135–163 (2013)
Article MathSciNet Google Scholar
Loreto, M., Aponte, H., Cores, D., Raydan, M.: Nonsmooth spectral gradient methods for unconstrained optimization. EURO J. Comput. Optim. 5(4), 529–553 (2017)
Article MathSciNet Google Scholar
Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83(1–3), 373–391 (1998)
MathSciNet MATH Google Scholar
Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002)
Article MathSciNet Google Scholar
Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems. Ph.D. thesis, Imperial College, London (1978)
Maréchal, P., Ye, J.J.: Optimizing condition numbers. SIAM J. Optim. 20(2), 935–947 (2009)
Article MathSciNet Google Scholar
Mifflin, R., Sagastizábal, C.: VU-decomposition derivatives for convex max-functions. In: Théra, M., Tichatschke, R. (eds.) Ill-posed Variational Problems and Regularization Techniques. Lecture Notes in Economics and Mathematical Systems, vol. 477, pp. 167–186. Springer, Berlin (1999)
Chapter Google Scholar
Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. 104(2–3), 583–608 (2005)
Article MathSciNet Google Scholar
Mifflin, R., Sagastizábal, C.: A science fiction story in nonsmooth optimization originating at IIASA. In: Grötschel, M. (ed.) Documenta Mathematica Optimization Stories, pp. 291–300. Deutschen Mathematiker-Vereinigung, Bielefeld (2012)
MATH Google Scholar
Miller, S.A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among $\cal{U}$-Lagrangian, Riemannian Newton and SQP methods. Math. Program. 104(2), 609–633 (2005)
Article MathSciNet Google Scholar
Moreau, J.J., Panagiotopoulos, P.D.: Nonsmooth Mechanics and Applications, vol. 302. Springer, Vienna (2014)
MATH Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Oliveira, W., Sagastizábal, C.: Bundle methods in the XXIst century: a bird’s-eye view. Pesquisa Operacional 34(3), 647–670 (2014)
Article Google Scholar
Outrata, J., Kočvara, M., Zowe, J.: Nonsmooth Approach to Optimization Problems with Equilibrium Constraints: Theory, Applications and Numerical Results, vol. 28. Kluwer Academic Publishers, The Netherlands (2013)
MATH Google Scholar
Peng, C., Jin, X., Shi, M.: Epidemic threshold and immunization on generalized networks. Phys. A Stat. Mech. Appl. 389(3), 549–560 (2010)
Article Google Scholar
Wang, F.C., Chen, H.T.: Design and implementation of fixed-order robust controllers for a proton exchange membrane fuel cell system. Int. J. Hydrog. Energy 34(6), 2705–2717 (2009)
Article Google Scholar
Zhang, J., Kim, N.H., Lasdon, L.: An improved successive linear programming algorithm. Manag. Sci. 31(10), 1312–1331 (1985)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are greatly thankful to the anonymous referees who carefully read a previous version of this paper and contributed with valuable suggestions to improve our manuscript.

Author information

Authors and Affiliations

Institute of Mathematical Sciences and Computation, University of São Paulo, São Carlos, SP, Brazil
Elias S. Helou
Department of Applied Mathematics, University of Campinas, Campinas, SP, Brazil
Sandra A. Santos & Lucas E. A. Simões

Authors

Elias S. Helou
View author publications
You can also search for this author in PubMed Google Scholar
Sandra A. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Lucas E. A. Simões
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas E. A. Simões.

Additional information

Elias S. Helou was partially supported by FAPESP Grants 2013/16508-3 and 2013/07375-0 and by CNPq Grant 311476/2014-7. Sandra A. Santos was partially supported by CNPq Grant 302915/2016-8, FAPESP Grants 2013/05475-7 and 2013/07375-0 and PRONEX Optimization. Lucas E. A. Simões was supported by FAPESP Grants 2013/14615-7 and 2016/22989-2.

Appendix

The aim of this appendix is to show that the assumption $(\lambda _{k,{\overline{l}}_k})_i = 0$, whenever $i \notin \mathcal I(x_*)$ and $k\in \mathcal K$, with $\mathcal K$ defined in Corollary 2, is not necessary. For this goal, we will show that even without such an assumption, the results from the local convergence subsection remain the same.

We divide our reasoning in two cases and remind the reader that we have assumed $\mathcal I(x_*) = \{1,\ldots ,r+1\}$:

(A1)
The cardinality of $\mathcal I(x_*)$ is $n+1$;
(A2)
The cardinality of $\mathcal I(x_*)$ is $r+1$ with $r < n$.

Suppose first that A1 holds and let us consider an iterate $x_k$ sufficiently close to $x_*$. Moreover, assume that $k\in \mathcal K$, where $\mathcal K$ is the index set defined in Corollary 2. Then, looking at the optimization problem in (22), we see that any additional active constraint will generate an additional active constraint to (22) in a way that it will be a linear combination of the first $n+1$ active constraints (by Remark 2 and because the rank of ${\tilde{J}}_{k}$ remains constant in a close neighborhood of $x_*$). Hence, the solution obtained with, or without, this additional constraint is the same, which yields that the results presented at the local convergence subsection do not change for this special case.

So, let us consider the more intricate case A2. Moreover, let us assume that there is only one additional constraint, i.e., the number of active constraints is $r+2$ (we will see that the occurrence of more than one additional constraint will be a straightforward generalization of this simpler case). In other words, we are saying that solving (5) is equivalent to minimize

$$\begin{aligned} \begin{aligned} \min _{\left( d,z\right) \in {\mathbb {R}}^{n+1}~~}&z + \frac{1}{2}d^TH_kd \\ \text {s.t. }&f\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla f\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k + d - x_{k,i}^{{\overline{l}}_k}\right) = z_\text{, }~~1\le i\le r+2\text{, } \end{aligned} \end{aligned}$$

where here we assume that rearrangements were done in order to have the additional constraint as the $(r+2)$-th constraint and that it has the associated sampled point $x_{k,r+2}^{{\overline{l}}_k}$. Therefore, for an iterate $x_k$ sufficiently close to the solution and a sufficiently small sampling radius, we have, by the continuity of the functions $\phi _i$, that only the functions $\phi _1,\ldots ,\phi _{r+1}$ can assume the maximum at any sampled point. So, there exists $j\in \{1,\ldots ,r+1\}$ such that $f(x_{k,r+2}^{{\overline{l}}_k}) = \phi _j(x_{k,r+2}^{{\overline{l}}_k})$. Consequently, recalling that $k\in \mathcal K$, the dual problem of the above minimization problem can be seen as

$$\begin{aligned} \begin{aligned}&\max _{\lambda \in {\mathbb {R}}^{r+2}~~} \sum _{i=1}^{r+1} \lambda _i\left[ \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,i}^{{\overline{l}}_k}\right) \right] \\&\quad + \lambda _{r+2}\left[ \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \right] \\&\quad - \frac{1}{2}\left\| \sum _{i=1}^{r+1} \lambda _i \nabla \phi _i(x_{k,i}^{{\overline{l}}_k}) + \lambda _{r+2} \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right\| _{H_k^{-1}}^2\\&\text {s.t. } e^T\lambda = 1\text{. } \end{aligned} \end{aligned}$$

(36)

Therefore, we can turn this last constrained maximization problem into an unconstrained one by making the following substitution $\lambda _{r+2} = 1 - \sum _{i=1}^{r+1} \lambda _i$. So, we have

$$\begin{aligned} \begin{aligned}&\max _{\lambda \in {\mathbb {R}}^{r+1}~~} \sum _{i=1}^{r+1} \lambda _i\left[ \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,i}^{{\overline{l}}_k}\right) - \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) \right. \\&\left. \quad - \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \right] + \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) \\&\quad + \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \\&\quad - \frac{1}{2}\left\| \sum _{i=1}^{r+1} \lambda _i \left[ \nabla \phi _i(x_{k,i}^{{\overline{l}}_k}) - \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right] + \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right\| _{H_k^{-1}}^2\text{. }\\ \end{aligned} \end{aligned}$$

Since the above problem is concave and smooth, its solution ${\overline{\lambda }}\in {\mathbb {R}}^{r+1}$ can be obtained by equaling the derivative of the objective function to the null vector. Consequently, assuming without loss of generality that the function $\phi _j$ involved in the additional constraint is $\phi _{r+1}$, we have

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) ^T{\overline{\lambda }} \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad - \left( \begin{array}{c} \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Now, changing the points $x_{k,r+2}^{{\overline{l}}_k}$ for $x_{k,r+1}^{{\overline{l}}_k}$ and redefining

$$\begin{aligned} \tau _{k,{\overline{l}}_k} := \max _{1\le i\le r+2}\left\| x_{k,i}^{{\overline{l}}_k} - x_k\right\| \text{, } \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) ^T{\overline{\lambda }} \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) + \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ 0^T \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

This last linear system yields

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) ^T \left( \begin{array}{c} {{\overline{\lambda }}}_1\\ \vdots \\ {{\overline{\lambda }}}_r \end{array}\right) \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) + \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Therefore, following the same reasoning used by us to get here, it is possible to see that the first r components of the dual variable ${{\hat{\lambda }}} \in {\mathbb {R}}^{r+1}$ linked to the problem (21) must satisfy the last linear system obtained above (not considering the remaining error vector) and, moreover,

$$\begin{aligned} {\hat{\lambda }}_{r+1} = 1 - \sum _{i=1}^r {\hat{\lambda }}_i\text{. } \end{aligned}$$

(37)

Therefore, considering $\lambda ^*\in {\mathbb {R}}^{r+2}$ the solution of (36) and using equation (37), we must have

$$\begin{aligned} \lambda ^* = \left( \begin{array}{c} {\hat{\lambda }}_1\\ \vdots \\ {\hat{\lambda }}_r\\ \lambda ^*_{r+1}\\ 1-\sum _{i=1}^r {\hat{\lambda }}_i - \lambda ^*_{r+1} \end{array}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) = \left( \begin{array}{c} {\hat{\lambda }}_1\\ \vdots \\ {\hat{\lambda }}_r\\ \lambda ^*_{r+1}\\ {\hat{\lambda }}_{r+1} - \lambda ^*_{r+1} \end{array}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned}$$

So, to complete our reasoning, we write the following relation between the primal-dual variables

$$\begin{aligned} \begin{aligned} d_{k,{\overline{l}}_k}&= -H_k^{-1}\left[ \sum _{i=1}^{r+1}\lambda _i^*\nabla \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + \lambda ^*_{r+2}\nabla \phi _{r+1}\Big (x_{k,r+2}^{{\overline{l}}_k}\Big ) \right] \\&= -H_k^{-1}\left[ \sum _{i=1}^{r}\lambda ^*_i\nabla \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + \left( \lambda ^*_{r+1} + \lambda ^*_{r+2}\right) \nabla \phi _{r+1}\Big (x_{k,r+1}^{{\overline{l}}_k}\Big ) \right] + O\left( \tau _{k,{\overline{l}}_k}\right) \\&= -H_k^{-1}\sum _{i=1}^{r+1}{\hat{\lambda }_i\nabla } \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Hence, $d_{k,{\overline{l}}_k}$ is exactly the search direction obtained in (21) with an additional error vector. Therefore, the term $O\left( \tau _{k,{\overline{l}}_k}\right) $ is absorbed by the other error vectors in Theorem 3 and the result is still valid.

Finally, remember that we have considered just one additional active constraint to the others $r+1$ active constraints. However, it is straightforward to see that exactly the same reasoning can be used to prove the result for any other number of additional constraints.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Helou, E.S., Santos, S.A. & Simões, L.E.A. A fast gradient and function sampling method for finite-max functions. Comput Optim Appl 71, 673–717 (2018). https://doi.org/10.1007/s10589-018-0030-2

Download citation

Received: 25 September 2017
Published: 23 August 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10589-018-0030-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast gradient and function sampling method for finite-max functions

Abstract

Access this article

Similar content being viewed by others

A conjugate gradient sampling method for nonsmooth optimization

On the Local Convergence Analysis of the Gradient Sampling Method for Finite Max-Functions

Gradient Sampling Methods for Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast gradient and function sampling method for finite-max functions

Abstract

Access this article

Similar content being viewed by others

A conjugate gradient sampling method for nonsmooth optimization

On the Local Convergence Analysis of the Gradient Sampling Method for Finite Max-Functions

Gradient Sampling Methods for Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation