Skip to main content
Log in

A fast gradient and function sampling method for finite-max functions

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an algorithm for the unconstrained minimization of a class of nonsmooth and nonconvex functions that can be written as finite-max functions. A gradient and function-based sampling method is proposed which, under special circumstances, either moves superlinearly to a minimizer of the problem of interest or improves the optimality certificate. Global and local convergence analysis are presented, as well as examples that illustrate the obtained theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. For example, setting \(s_j\) as the orthogonal projection of \(v_j\) over the hyperplane generated by \(\{ v_i : i\in \mathcal I(x)\text{, }~i\ne j\}\), one can consider \(d_j = (v_j - s_j)/\Vert v_j - s_j\Vert \).

  2. The GS code can be found at http://cs.nyu.edu/overton/papers/gradsamp/alg/.

References

  1. Balinski, M.L., Wolfe, P.: Nondifferentiable Optimization. Mathematical Programming Studies, vol. 3. North-Holland, Amsterdam (1975)

    Google Scholar 

  2. Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

    Book  Google Scholar 

  4. Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27(3), 567–584 (2002)

    Article  MathSciNet  Google Scholar 

  5. Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)

    Article  MathSciNet  Google Scholar 

  6. Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Montreal (1990)

    Book  Google Scholar 

  7. Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory, vol. 178. Springer, New York (2008)

    MATH  Google Scholar 

  8. Crema, A., Loreto, M., Raydan, M.: Spectral projected subgradient with a momentum term for the Lagrangean dual approach. Comput. Oper. Res. 34(10), 3174–3186 (2007)

    Article  MathSciNet  Google Scholar 

  9. Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22(2), 474–500 (2012)

    Article  MathSciNet  Google Scholar 

  10. Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for non-smooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013)

    Article  MathSciNet  Google Scholar 

  11. Curtis, F.E., Que, X.: A quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Math. Program. Comput. 7(4), 399–428 (2015)

    Article  MathSciNet  Google Scholar 

  12. Daniilidis, A., Sagastizábal, C., Solodov, M.: Identifying structure of nonsmooth convex functions by the bundle technique. SIAM J. Optim. 20(2), 820–840 (2009)

    Article  MathSciNet  Google Scholar 

  13. Di Pillo, G., Grippo, L., Lucidi, S.: A smooth method for the finite minimax problem. Math. Program. 60(1), 187–214 (1993)

    Article  MathSciNet  Google Scholar 

  14. Do, T.M.T., Artières, T.: Regularized bundle methods for convex and non-convex risks. J. Mach. Learn. Res. 13(1), 3539–3583 (2012)

    MathSciNet  MATH  Google Scholar 

  15. Dotta, D., Silva, A.S., Decker, I.C.: Design of power system controllers by nonsmooth, nonconvex optimization. In: Power Energy Society General Meeting, 2009. PES ’09. IEEE, pp. 1–7 (2009)

  16. Du, D.Z., Pardalos, P.M.: Minimax and Applications, vol. 4. Springer, Boston (2013)

    Google Scholar 

  17. Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization. Optim. Methods Softw. 19(1), 89–102 (2004)

    Article  MathSciNet  Google Scholar 

  18. Gaudioso, M., Gorgone, E., Monaco, M.F.: Piecewise linear approximations in nonconvex nonsmooth optimization. Numerische Mathematik 113(1), 73–88 (2009)

    Article  MathSciNet  Google Scholar 

  19. Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2005)

    Article  MathSciNet  Google Scholar 

  20. Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)

    Article  MathSciNet  Google Scholar 

  21. Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)

    Book  Google Scholar 

  22. Grothey, A., McKinnon, K.: A superlinearly convergent trust region bundle method. Report, Department of Mathematics & Statistics, Edinburgh University (1998)

  23. Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)

    Article  MathSciNet  Google Scholar 

  24. Helou, E.S., Santos, S.A., Simões, L.E.A.: On the differentiability check in gradient sampling methods. Optim. Methods Softw. 31(5), 983–1007 (2016)

    Article  MathSciNet  Google Scholar 

  25. Helou, E.S., Santos, S.A., Simões, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Appl. 175(1), 137–157 (2017)

    Article  MathSciNet  Google Scholar 

  26. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, New York (1993)

    Book  Google Scholar 

  27. Huber, G.: Gamma function derivation of n-sphere volumes. Am. Math. Mon. 89(5), 301–302 (1982)

    Article  MathSciNet  Google Scholar 

  28. Kelley Jr., J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)

    Article  MathSciNet  Google Scholar 

  29. Kiwiel, K.C.: Methods of descent for nondifferentiable optimization, vol. 1133. Springer, Berlin (1985)

    MATH  Google Scholar 

  30. Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6(1), 227–249 (1996)

    Article  MathSciNet  Google Scholar 

  31. Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)

    Article  MathSciNet  Google Scholar 

  32. Lemaréchal, C., Mifflin, R.: Global and superlinear convergence of an algorithm for one-dimensional minimization of convex functions. Math. Program. 24(1), 241–256 (1982)

    Article  MathSciNet  Google Scholar 

  33. Lemaréchal, C., Oustry, F., Sagastizábal, C.: The U-Lagrangian of a convex function. Trans. Am. Math. Soc. 352(2), 711–729 (2000)

    Article  MathSciNet  Google Scholar 

  34. Lemaréchal, C., Sagastizábal, C.: Practical aspects of the moreau–yosida regularization: theoretical preliminaries. SIAM J. Optim. 7(2), 367–385 (1997)

    Article  MathSciNet  Google Scholar 

  35. Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)

    Article  MathSciNet  Google Scholar 

  36. Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1–2), 135–163 (2013)

    Article  MathSciNet  Google Scholar 

  37. Loreto, M., Aponte, H., Cores, D., Raydan, M.: Nonsmooth spectral gradient methods for unconstrained optimization. EURO J. Comput. Optim. 5(4), 529–553 (2017)

    Article  MathSciNet  Google Scholar 

  38. Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83(1–3), 373–391 (1998)

    MathSciNet  MATH  Google Scholar 

  39. Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002)

    Article  MathSciNet  Google Scholar 

  40. Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems. Ph.D. thesis, Imperial College, London (1978)

  41. Maréchal, P., Ye, J.J.: Optimizing condition numbers. SIAM J. Optim. 20(2), 935–947 (2009)

    Article  MathSciNet  Google Scholar 

  42. Mifflin, R., Sagastizábal, C.: VU-decomposition derivatives for convex max-functions. In: Théra, M., Tichatschke, R. (eds.) Ill-posed Variational Problems and Regularization Techniques. Lecture Notes in Economics and Mathematical Systems, vol. 477, pp. 167–186. Springer, Berlin (1999)

    Chapter  Google Scholar 

  43. Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. 104(2–3), 583–608 (2005)

    Article  MathSciNet  Google Scholar 

  44. Mifflin, R., Sagastizábal, C.: A science fiction story in nonsmooth optimization originating at IIASA. In: Grötschel, M. (ed.) Documenta Mathematica Optimization Stories, pp. 291–300. Deutschen Mathematiker-Vereinigung, Bielefeld (2012)

    MATH  Google Scholar 

  45. Miller, S.A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among \(\cal{U}\)-Lagrangian, Riemannian Newton and SQP methods. Math. Program. 104(2), 609–633 (2005)

    Article  MathSciNet  Google Scholar 

  46. Moreau, J.J., Panagiotopoulos, P.D.: Nonsmooth Mechanics and Applications, vol. 302. Springer, Vienna (2014)

    MATH  Google Scholar 

  47. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  48. Oliveira, W., Sagastizábal, C.: Bundle methods in the XXIst century: a bird’s-eye view. Pesquisa Operacional 34(3), 647–670 (2014)

    Article  Google Scholar 

  49. Outrata, J., Kočvara, M., Zowe, J.: Nonsmooth Approach to Optimization Problems with Equilibrium Constraints: Theory, Applications and Numerical Results, vol. 28. Kluwer Academic Publishers, The Netherlands (2013)

    MATH  Google Scholar 

  50. Peng, C., Jin, X., Shi, M.: Epidemic threshold and immunization on generalized networks. Phys. A Stat. Mech. Appl. 389(3), 549–560 (2010)

    Article  Google Scholar 

  51. Wang, F.C., Chen, H.T.: Design and implementation of fixed-order robust controllers for a proton exchange membrane fuel cell system. Int. J. Hydrog. Energy 34(6), 2705–2717 (2009)

    Article  Google Scholar 

  52. Zhang, J., Kim, N.H., Lasdon, L.: An improved successive linear programming algorithm. Manag. Sci. 31(10), 1312–1331 (1985)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are greatly thankful to the anonymous referees who carefully read a previous version of this paper and contributed with valuable suggestions to improve our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas E. A. Simões.

Additional information

Elias S. Helou was partially supported by FAPESP Grants 2013/16508-3 and 2013/07375-0 and by CNPq Grant 311476/2014-7. Sandra A. Santos was partially supported by CNPq Grant 302915/2016-8, FAPESP Grants 2013/05475-7 and 2013/07375-0 and PRONEX Optimization. Lucas E. A. Simões was supported by FAPESP Grants 2013/14615-7 and 2016/22989-2.

Appendix

Appendix

The aim of this appendix is to show that the assumption \((\lambda _{k,{\overline{l}}_k})_i = 0\), whenever \(i \notin \mathcal I(x_*)\) and \(k\in \mathcal K\), with \(\mathcal K\) defined in Corollary 2, is not necessary. For this goal, we will show that even without such an assumption, the results from the local convergence subsection remain the same.

We divide our reasoning in two cases and remind the reader that we have assumed \(\mathcal I(x_*) = \{1,\ldots ,r+1\}\):

  1. (A1)

    The cardinality of \(\mathcal I(x_*)\) is \(n+1\);

  2. (A2)

    The cardinality of \(\mathcal I(x_*)\) is \(r+1\) with \(r < n\).

Suppose first that A1 holds and let us consider an iterate \(x_k\) sufficiently close to \(x_*\). Moreover, assume that \(k\in \mathcal K\), where \(\mathcal K\) is the index set defined in Corollary 2. Then, looking at the optimization problem in (22), we see that any additional active constraint will generate an additional active constraint to (22) in a way that it will be a linear combination of the first \(n+1\) active constraints (by Remark 2 and because the rank of \({\tilde{J}}_{k}\) remains constant in a close neighborhood of \(x_*\)). Hence, the solution obtained with, or without, this additional constraint is the same, which yields that the results presented at the local convergence subsection do not change for this special case.

So, let us consider the more intricate case A2. Moreover, let us assume that there is only one additional constraint, i.e., the number of active constraints is \(r+2\) (we will see that the occurrence of more than one additional constraint will be a straightforward generalization of this simpler case). In other words, we are saying that solving (5) is equivalent to minimize

$$\begin{aligned} \begin{aligned} \min _{\left( d,z\right) \in {\mathbb {R}}^{n+1}~~}&z + \frac{1}{2}d^TH_kd \\ \text {s.t. }&f\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla f\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k + d - x_{k,i}^{{\overline{l}}_k}\right) = z_\text{, }~~1\le i\le r+2\text{, } \end{aligned} \end{aligned}$$

where here we assume that rearrangements were done in order to have the additional constraint as the \((r+2)\)-th constraint and that it has the associated sampled point \(x_{k,r+2}^{{\overline{l}}_k}\). Therefore, for an iterate \(x_k\) sufficiently close to the solution and a sufficiently small sampling radius, we have, by the continuity of the functions \(\phi _i\), that only the functions \(\phi _1,\ldots ,\phi _{r+1}\) can assume the maximum at any sampled point. So, there exists \(j\in \{1,\ldots ,r+1\}\) such that \(f(x_{k,r+2}^{{\overline{l}}_k}) = \phi _j(x_{k,r+2}^{{\overline{l}}_k})\). Consequently, recalling that \(k\in \mathcal K\), the dual problem of the above minimization problem can be seen as

$$\begin{aligned} \begin{aligned}&\max _{\lambda \in {\mathbb {R}}^{r+2}~~} \sum _{i=1}^{r+1} \lambda _i\left[ \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,i}^{{\overline{l}}_k}\right) \right] \\&\quad + \lambda _{r+2}\left[ \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \right] \\&\quad - \frac{1}{2}\left\| \sum _{i=1}^{r+1} \lambda _i \nabla \phi _i(x_{k,i}^{{\overline{l}}_k}) + \lambda _{r+2} \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right\| _{H_k^{-1}}^2\\&\text {s.t. } e^T\lambda = 1\text{. } \end{aligned} \end{aligned}$$
(36)

Therefore, we can turn this last constrained maximization problem into an unconstrained one by making the following substitution \(\lambda _{r+2} = 1 - \sum _{i=1}^{r+1} \lambda _i\). So, we have

$$\begin{aligned} \begin{aligned}&\max _{\lambda \in {\mathbb {R}}^{r+1}~~} \sum _{i=1}^{r+1} \lambda _i\left[ \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) + \nabla \phi _i\left( x_{k,i}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,i}^{{\overline{l}}_k}\right) - \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) \right. \\&\left. \quad - \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \right] + \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) \\&\quad + \nabla \phi _j\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \\&\quad - \frac{1}{2}\left\| \sum _{i=1}^{r+1} \lambda _i \left[ \nabla \phi _i(x_{k,i}^{{\overline{l}}_k}) - \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right] + \nabla \phi _j(x_{k,r+2}^{{\overline{l}}_k}) \right\| _{H_k^{-1}}^2\text{. }\\ \end{aligned} \end{aligned}$$

Since the above problem is concave and smooth, its solution \({\overline{\lambda }}\in {\mathbb {R}}^{r+1}\) can be obtained by equaling the derivative of the objective function to the null vector. Consequently, assuming without loss of generality that the function \(\phi _j\) involved in the additional constraint is \(\phi _{r+1}\), we have

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) ^T{\overline{\lambda }} \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad - \left( \begin{array}{c} \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) + \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+2}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+2}^{{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Now, changing the points \(x_{k,r+2}^{{\overline{l}}_k}\) for \(x_{k,r+1}^{{\overline{l}}_k}\) and redefining

$$\begin{aligned} \tau _{k,{\overline{l}}_k} := \max _{1\le i\le r+2}\left\| x_{k,i}^{{\overline{l}}_k} - x_k\right\| \text{, } \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) ^T{\overline{\lambda }} \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) + \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ 0^T \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ 0^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

This last linear system yields

$$\begin{aligned} \begin{aligned}&\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1} \left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) ^T \left( \begin{array}{c} {{\overline{\lambda }}}_1\\ \vdots \\ {{\overline{\lambda }}}_r \end{array}\right) \\&\quad = \left( \begin{array}{c} \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) + \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,1}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \\ \vdots \\ \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) + \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r}^{{\overline{l}}_k}\right) - \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\left( x_k - x_{k,r+1}^{{\overline{l}}_k}\right) \end{array}\right) \\&\qquad -\left( \begin{array}{c} \nabla \phi _1\left( x_{k,1}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T\\ \vdots \\ \nabla \phi _{r}\left( x_{k,r}^{{\overline{l}}_k} \right) ^T - \nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) ^T \end{array}\right) H_k^{-1}\nabla \phi _{r+1}\left( x_{k,r+1}^{{\overline{l}}_k}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Therefore, following the same reasoning used by us to get here, it is possible to see that the first r components of the dual variable \({{\hat{\lambda }}} \in {\mathbb {R}}^{r+1}\) linked to the problem (21) must satisfy the last linear system obtained above (not considering the remaining error vector) and, moreover,

$$\begin{aligned} {\hat{\lambda }}_{r+1} = 1 - \sum _{i=1}^r {\hat{\lambda }}_i\text{. } \end{aligned}$$
(37)

Therefore, considering \(\lambda ^*\in {\mathbb {R}}^{r+2}\) the solution of (36) and using equation (37), we must have

$$\begin{aligned} \lambda ^* = \left( \begin{array}{c} {\hat{\lambda }}_1\\ \vdots \\ {\hat{\lambda }}_r\\ \lambda ^*_{r+1}\\ 1-\sum _{i=1}^r {\hat{\lambda }}_i - \lambda ^*_{r+1} \end{array}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) = \left( \begin{array}{c} {\hat{\lambda }}_1\\ \vdots \\ {\hat{\lambda }}_r\\ \lambda ^*_{r+1}\\ {\hat{\lambda }}_{r+1} - \lambda ^*_{r+1} \end{array}\right) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned}$$

So, to complete our reasoning, we write the following relation between the primal-dual variables

$$\begin{aligned} \begin{aligned} d_{k,{\overline{l}}_k}&= -H_k^{-1}\left[ \sum _{i=1}^{r+1}\lambda _i^*\nabla \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + \lambda ^*_{r+2}\nabla \phi _{r+1}\Big (x_{k,r+2}^{{\overline{l}}_k}\Big ) \right] \\&= -H_k^{-1}\left[ \sum _{i=1}^{r}\lambda ^*_i\nabla \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + \left( \lambda ^*_{r+1} + \lambda ^*_{r+2}\right) \nabla \phi _{r+1}\Big (x_{k,r+1}^{{\overline{l}}_k}\Big ) \right] + O\left( \tau _{k,{\overline{l}}_k}\right) \\&= -H_k^{-1}\sum _{i=1}^{r+1}{\hat{\lambda }_i\nabla } \phi _i\Big (x_{k,i}^{{\overline{l}}_k}\Big ) + O\left( \tau _{k,{\overline{l}}_k}\right) \text{. } \end{aligned} \end{aligned}$$

Hence, \(d_{k,{\overline{l}}_k}\) is exactly the search direction obtained in (21) with an additional error vector. Therefore, the term \(O\left( \tau _{k,{\overline{l}}_k}\right) \) is absorbed by the other error vectors in Theorem 3 and the result is still valid.

Finally, remember that we have considered just one additional active constraint to the others \(r+1\) active constraints. However, it is straightforward to see that exactly the same reasoning can be used to prove the result for any other number of additional constraints.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Helou, E.S., Santos, S.A. & Simões, L.E.A. A fast gradient and function sampling method for finite-max functions. Comput Optim Appl 71, 673–717 (2018). https://doi.org/10.1007/s10589-018-0030-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-018-0030-2

Keywords

Navigation