Skip to main content
Log in

Finding Second-Order Stationary Points in Constrained Minimization: A Feasible Direction Approach

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a method for computing points satisfying the second-order necessary optimality conditions for nonconvex minimization problems subject to a closed and convex constraint set. The method comprises two independent steps corresponding to the first- and second-order conditions. The first-order step is a generic closed map algorithm, which can be chosen from a variety of first-order algorithms, making it adjustable to the given problem. The second-order step can be viewed as a second-order feasible direction step for nonconvex minimization subject to a convex set. We prove that any limit point of the resulting scheme satisfies the second-order necessary optimality condition, and establish the scheme’s convergence rate and complexity, under standard and mild assumptions. Numerical tests illustrate the proposed scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Provided at https://www.tau.ac.il/~becka/BB_Documentation.7z.

References

  1. Auslender, A.: Computing points that satisfy second order necessary optimality conditions for unconstrained minimization. SIAM J. Optim. 20(4), 1868–1884 (2010)

    MathSciNet  MATH  Google Scholar 

  2. Yuan, Y.: Recent advances in trust region algorithms. Math. Program. 151(1), 249–281 (2015)

    MathSciNet  MATH  Google Scholar 

  3. Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2, Ser. A), 245–295 (2011)

    MathSciNet  MATH  Google Scholar 

  4. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods, vol. 1. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)

    MATH  Google Scholar 

  5. Nesterov, Y., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Facchinei, F., Lucidi, S.: Convergence to second order stationary points in inequality constrained optimization. Math. Oper. Res. 23(3), 746–766 (1998)

    MathSciNet  MATH  Google Scholar 

  7. Forsgren, A., Murray, W.: Newton methods for large-scale linear inequality-constrained minimization. SIAM J. Optim. 7(1), 162–176 (1997)

    MathSciNet  MATH  Google Scholar 

  8. Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14(1–2), 75–98 (2000)

    MathSciNet  MATH  Google Scholar 

  9. Gill, P.E., Murray, W.: Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)

    MathSciNet  MATH  Google Scholar 

  10. More, J.J., Sorensen, D.C.: On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)

    MathSciNet  MATH  Google Scholar 

  11. Pillo, G.D., Lucidi, S., Palagi, L.: Convergence to second-order stationary points of a primal-dual algorithm model for nonlinear programming. Math. Oper. Res. 30(4), 897–915 (2005)

    MathSciNet  MATH  Google Scholar 

  12. Cartis, C., Gould, N.I.M., Toint, P.L.: Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization. Found. Comput. Math. 18(5), 1073–1107 (2018)

    MathSciNet  MATH  Google Scholar 

  13. Cartis, C., Gould, N.I.M., Toint, P.L.: An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity. IMA J. Numer. Anal. 32(4), 1662–1695 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Zoutendijk, G.: Methods of Feasible Directions: A Study in Linear and Non-linear Programming. Elsevier, Amsterdam (1960)

    MATH  Google Scholar 

  15. Zangwill, W.I.: Nonlinear Programming: A Unified Approach, vol. 196. Prentice-Hall Englewood Cliffs, Upper Saddle River, NJ (1969)

    MATH  Google Scholar 

  16. Fu, M., Luo, Z., Ye, Y.: Approximation algorithms for quadratic programming. J. Comb. Optim. 2(1), 29–50 (1998)

    MathSciNet  MATH  Google Scholar 

  17. Bienstock, D.: A note on polynomial solvability of the cdt problem. SIAM J. Optim. 26(1), 488–498 (2016)

    MathSciNet  MATH  Google Scholar 

  18. Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trust-region subproblems. SIAM J. Optim. 23(1), 432–451 (2013)

    MathSciNet  MATH  Google Scholar 

  19. Bienstock, D., Michalka, A.: Polynomial solvability of variants of the trust-region sub-problem. In: Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 380–390. SIAM (2014)

  20. Beck, A., Pan, D.: A branch and bound algorithm for nonconvex quadratic optimization with ball and linear constraints. J. Glob. Optim. 69(2), 309–342 (2017)

    MathSciNet  MATH  Google Scholar 

  21. Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear constraints. Math. Program. 149(1–2), 253–264 (2015)

    MathSciNet  MATH  Google Scholar 

  22. Jeyakumar, V., Li, G.: Trust-region problems with linear inequality constraints: exact SDP relaxation, global optimality and robust optimization. Math. Program. 147(1–2), 171–206 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Bomze, I.M., Jeyakumar, V., Li, G.: Extended trust-region problems with one or two balls: exact copositive and lagrangian relaxations. J. Glob. Optim. 71(3), 551–569 (2018)

    MathSciNet  MATH  Google Scholar 

  24. Ho-Nguyen, N., Kilinc-Karzan, F.: A second-order cone based approach for solving the trust-region subproblem and its variants. SIAM J. Optim. 27(3), 1485–1512 (2017)

    MathSciNet  MATH  Google Scholar 

  25. Montanher, T., Neumaier, A., Domes, F.: A computational study of global optimization solvers on two trust region subproblems. J. Glob. Optim. 71(4), 915–934 (2018)

    MathSciNet  MATH  Google Scholar 

  26. Sakaue, S., Nakatsukasa, Y., Takeda, A., Iwata, S.: Solving generalized cdt problems via two-parameter eigenvalues. SIAM J. Optim. 26(3), 1669–1694 (2016)

    MathSciNet  MATH  Google Scholar 

  27. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, Hoboken (2006)

    MATH  Google Scholar 

  28. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  29. Huard, P.: Optimization algorithms and point-to-set-maps. Math. Program. 8(1), 308–331 (1975)

    MathSciNet  MATH  Google Scholar 

  30. Beck, A.: Introduction to Nonlinear Optimization, MOS-SIAM Series on Optimization, vol. 19. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2014)

    Google Scholar 

  31. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (1970)

    MATH  Google Scholar 

  32. Hansen, E.R.: Global optimization using interval analysis: the one-dimensional case. J. Optim. Theory Appl. 29(3), 331–344 (1979)

    MathSciNet  MATH  Google Scholar 

  33. Arbenz, P.: Lecture Notes on Solving Large Scale Eigenvalue Problems (2016). https://people.inf.ethz.ch/arbenz/ewp/Lnotes/lsevp.pdf

  34. Griewank, A.: The Modification of Newton’s Method for Unconstrained Optimization by Bounding Cubic Terms. Technical Report NA/12 (1981)

  35. Berge, C.: Topological Spaces: Including a Treatment of Multi-valued Functions, Vector Spaces, and Convexity. Courier Corporation, Chelmsford (1997)

    MATH  Google Scholar 

  36. Fiacco, A.V.: Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Mathematics in Science and Engineering, vol. 165. Academic Press, Cambridge (1983)

    MATH  Google Scholar 

  37. Hogan, W.W.: Point-to-set maps in mathematical programming. SIAM Rev. 15(3), 591–603 (1973)

    MathSciNet  MATH  Google Scholar 

  38. Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)

    MATH  Google Scholar 

  39. Mei, S., Bai, Y., Montanari, A.: The landscape of empirical risk for nonconvex losses. Ann. Stat. 46(6A), 2747–2774 (2018)

    MathSciNet  MATH  Google Scholar 

  40. Yang, X.: Nature-Inspired Metaheuristic Algorithms, 2nd edn. Luniver Press, Luniver Press (2010)

    Google Scholar 

  41. Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2009)

    Google Scholar 

Download references

Acknowledgements

The research of N. Hallak was conducted at Tel-Aviv University and was supported by a postdoctoral fellowship under ISF Grant 1844-16. The research of M. Teboulle’s was partially supported by the Israel Science Foundation, under ISF Grant 1844-16.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Teboulle.

Additional information

Communicated by Alexey F. Izmailov.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Continuity of the Second-Order Optimality Measure

We start by recalling here some well-known definitions/properties on multi-valued maps that can be found for example in [37]. Assume hereafter that \(C\subseteq \mathbb {R}^n\) is closed, convex, nonempty set.

Definition A.1

(Basic properties of point-to-set maps) Let \(\Omega \) be a point-to-set map at \(\bar{\mathbf{x}}\in C\). Then:

  1. 1.

    \(\Omega \) is open at \(\bar{\mathbf{x}}\), if \(\{\mathbf{x}^k\}_{k\ge 0}\subseteq C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), and \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) imply the existence of an integer K and a sequence \(\{ \mathbf{y}^k \}_{k\ge 0}\subseteq C\) such that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\) for \(k\ge K\) and \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\).

  2. 2.

    \(\Omega \) is closed at \(\bar{\mathbf{x}}\), if \(\{\mathbf{x}^k\}_{k\ge 0}\subseteq C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\), and \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\) imply that \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\).

  3. 3.

    \(\Omega \) is continuous at \(\bar{\mathbf{x}}\) if it is both open and closed at \(\bar{\mathbf{x}}\).

  4. 4.

    \(\Omega \) is uniformly compact near \(\bar{\mathbf{x}}\) if there is a neighborhood N of \(\bar{\mathbf{x}}\) such that the closure of the set \(\bigcup _{\mathbf{x}\in N}\) is compact.

The continuity property of Q follows from the next result (which we mildly adjusted to our setting).

Theorem A.1

(Continuity of problems [37, Theorem 7]) Let \(\Omega \) be a point-to-set map, and let

$$\begin{aligned} \nu (\mathbf{x}) = \displaystyle \mathop {\mathrm{sup}}_{\mathbf{y}\in \Omega (\mathbf{x})} \phi (\mathbf{x},\mathbf{y}), \end{aligned}$$
(16)

where \( \phi : C \times C \rightarrow ]-\infty ,\infty [\). If \(\Omega \) is continuous at \(\bar{\mathbf{x}}\in C\) and uniformly compact near \(\bar{\mathbf{x}}\), and if \(\phi \) is continuous on \(\bar{\mathbf{x}}\times \Omega (\bar{\mathbf{x}})\), then \(\nu \) is continuous at \(\bar{\mathbf{x}}\).

To apply Theorem A.1 in the case of Q, we will restrict the discussion to the optimization problem (16) with the point-to-set mapping

$$\begin{aligned} \Omega (\mathbf{x}) = \{ \mathbf{y}\in C : \mathbf{a}(\mathbf{x})^T (\mathbf{y}-\mathbf{x}) \le 0, \Vert \mathbf{y}- \mathbf{x}\Vert _2 \le r \}, \qquad \forall \mathbf{x}\in C, \end{aligned}$$
(17)

with \(\mathbf{a}:\mathbb {R}^n\rightarrow \mathbb {R}^n\) being a continuous vector mapping, while assuming that

$$\begin{aligned} \phi : C \times C \rightarrow ]-\infty ,\infty [ \end{aligned}$$

is a continuous function. Clearly, the optimality measure \(Q(\cdot )\) can be defined in the form of (16) by setting \(\mathbf{y}= \mathbf{x}+ \mathbf{d}\), \(\phi (\mathbf{x},\mathbf{y}) = -f''(\mathbf{x}; \mathbf{y}-\mathbf{x})\), and \(\mathbf{a}(\mathbf{x}) = \nabla f(\mathbf{x})\).

We will prove that \(\Omega \) given by (17) is uniformly compact and continuous, where the continuity of \(\Omega \) will be established using the following two results.

Theorem A.2

[37, Theorem 10] Let \(\Omega \) be a point-to-set map given by

$$\begin{aligned} \Omega (\mathbf{x}) = \{ \mathbf{y}\in C : g(\mathbf{x},\mathbf{y}) \le 0\}, \end{aligned}$$

where each component of \(g:C\times C \rightarrow [-\infty ,\infty ]^m\) is lower semicontinuous in \(\{\bar{\mathbf{x}}\}\times C\). Then, \(\Omega \) is closed at \(\bar{\mathbf{x}}\).

Noting that C is convex, we have the following (adjusted) result (credited to Geoffrion).

Theorem A.3

[37, Theorem 12] Let \(\Omega \) be a point-to-set map given by

$$\begin{aligned} \Omega (\mathbf{x}) = \{ \mathbf{y}\in C : g(\mathbf{x},\mathbf{y}) \le 0\}, \end{aligned}$$

where each component of \(g:C\times C \rightarrow [-\infty ,\infty ]^m\) is continuous on \(\{\bar{\mathbf{x}}\}\times \Omega (\bar{\mathbf{x}})\), and convex in \(\mathbf{y}\) for each fixed \(\mathbf{x}\in C\). If there exists a \(\bar{\mathbf{y}}\) such that \(g(\bar{\mathbf{x}},\bar{\mathbf{y}})<0\), then \(\Omega \) is open at \(\bar{\mathbf{x}}\).

We can now prove the continuity and uniform compactness of \(\Omega \).

Lemma A.1

Suppose that \(\Omega \) is given by (17). Then for any \(\bar{\mathbf{x}}\in C\), we have that \(\Omega \) is uniformly compact and continuous at \(\bar{\mathbf{x}}\).

Proof

The uniform compactness of \(\Omega \) follows trivially from its definition as the intersection of a half-plane and a norm ball.

To prove that \(\Omega \) is continuous, we will show that it is both closed and open. The closeness of \(\Omega \) is implied by Theorem A.2 as for any \(\bar{\mathbf{x}}\in C\), all functions defining the point-to-set mapping \(\Omega \) are continuous.

We will now prove that \(\Omega \) is open. Let \(\bar{\mathbf{x}}\in C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), and \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\). We need to prove that there exists a sequence \(\{\mathbf{y}^k\}_{k\ge 0}\) such that \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\), satisfying that for any \(k\ge 0\) it holds that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\).

Suppose that there exists \(\hat{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\hat{\mathbf{y}} - \bar{\mathbf{x}}) < 0\). If \(\Vert \hat{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 = r\), then there exists \(\mathbf{d}\in \mathbb {R}^n \) and a sufficient small \(\alpha \) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\hat{\mathbf{y}} + \alpha \mathbf{d}- \bar{\mathbf{x}}) < 0\) and \(\Vert \hat{\mathbf{y}} + \alpha \mathbf{d}- \bar{\mathbf{x}} \Vert _2 < r\). Hence, there exists \(\tilde{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\tilde{\mathbf{y}} - \bar{\mathbf{x}}) < 0\) and \(\Vert \tilde{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 < r\). Subsequently, we can invoke Theorem A.3 to obtain the desired claim that \(\Omega (\bar{\mathbf{x}})\) is open. Assume hereafter that \(\mathbf{a}(\bar{\mathbf{x}})^T (\mathbf{y}- \bar{\mathbf{x}}) = 0\) for any \(\mathbf{y}\in \Omega (\bar{\mathbf{x}})\).

Set \(\bar{r} = \Vert \bar{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 \le r\), and define for any \(k\ge 0\)

$$\begin{aligned} \mathbf{y}^k&= {\left\{ \begin{array}{ll} \bar{\mathbf{y}}, &{} \mathbf{x}^k = \bar{\mathbf{y}},\\ \mathbf{x}^k + \frac{\bar{r}}{\Vert \mathbf{d}^k \Vert _2 } \mathbf{d}^k, &{} \mathbf{x}^k \ne \bar{\mathbf{y}}, \end{array}\right. } \text { and } \mathbf{d}^k&= {\left\{ \begin{array}{ll} \bar{\mathbf{y}} - \mathbf{x}^k, &{} \mathbf{a}(\mathbf{x}^k) = 0, \\ \bar{\mathbf{y}} - \mathbf{x}^k - \frac{\mathbf{a}(\mathbf{x}^k)^T (\bar{\mathbf{y}} - \mathbf{x}^k)}{\Vert \mathbf{a}(\mathbf{x}^k)\Vert _2^2} \mathbf{a}(\mathbf{x}^k), &{} \mathbf{a}(\mathbf{x}^k) \ne 0 . \end{array}\right. } \end{aligned}$$

To see that indeed \(\mathbf{y}^k \rightarrow \bar{\mathbf{y}}\), note that by the continuity of all terms and the assumption that \(\mathbf{a}(\bar{\mathbf{x}})^T (\bar{\mathbf{y}} - \bar{\mathbf{x}}) = 0\) we have that

$$\begin{aligned} \mathbf{d}^k \rightarrow \bar{\mathbf{y}} - \bar{\mathbf{x}}, \text { and } \mathbf{y}^k \rightarrow \bar{\mathbf{y}} . \end{aligned}$$

The claim \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\) for any \(k\ge 0\) follows from the fact that \(\mathbf{y}^k = \bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) whenever \(\mathbf{x}^k = \bar{\mathbf{y}}\) or \(\mathbf{a}(\mathbf{x}^k) = 0 \), and otherwise

$$\begin{aligned} \mathbf{a}(\mathbf{x}^k)^T \mathbf{y}^k&= \mathbf{a}(\mathbf{x}^k)^T \mathbf{x}^k + \frac{\bar{r}}{\Vert \mathbf{d}^k \Vert _2} \mathbf{a}(\mathbf{x}^k)^T \left( \bar{\mathbf{y}} - \mathbf{x}^k - \frac{\mathbf{a}(\mathbf{x}^k)^T (\bar{\mathbf{y}} - \mathbf{x}^k)}{\Vert \mathbf{a}(\mathbf{x}^k)\Vert _2^2} \mathbf{a}(\mathbf{x}^k) \right) = \mathbf{a}(\mathbf{x}^k)^T \mathbf{x}^k,\\ \Vert \mathbf{y}^k - \mathbf{x}^k \Vert _2&=\bar{r} \le r. \end{aligned}$$

To summarize, for any \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\), there exists a sequence \(\{\mathbf{y}^k\}_{k\ge 0}\) such that \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\) satisfying that for any \(k\ge 0\) it holds that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\), meaning that \(\Omega (\bar{\mathbf{x}})\) is open. \(\square \)

Proof of Lemma 4.3

(Continuity of the second-order optimality measure Q) Since \(\Omega \) is continuous and uniformly compact (Lemma A.1) at every point \(\mathbf{x}\in C\), and \(f''(\mathbf{x};\mathbf{d})\) is continuous, we have by Theorem A.1 that Q is continuous. \(\square \)

Appendix B: The Projected Gradient Descent Method with Backtracking

The Projected Gradient Descent (PGD) method is a classical algorithm with abundant references describing it in details; see for instance the survey of [41] and the extensive list references therein. The complexity of the PGD to find an \(\varepsilon \) first order stationary point (measured through the usual norm of the gradient map) is of the order of \((O(\varepsilon ^{-2}))\); see, e.g., [30, Theorem 9.15]. Below, we describe the PGD used in our numerical tests with step size determined by backtracking policy below (cf. [30, Section 9.4]).

figure f

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hallak, N., Teboulle, M. Finding Second-Order Stationary Points in Constrained Minimization: A Feasible Direction Approach. J Optim Theory Appl 186, 480–503 (2020). https://doi.org/10.1007/s10957-020-01713-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01713-x

Keywords

Mathematics Subject Classification

Navigation