Abstract
This paper introduces a method for computing points satisfying the second-order necessary optimality conditions for nonconvex minimization problems subject to a closed and convex constraint set. The method comprises two independent steps corresponding to the first- and second-order conditions. The first-order step is a generic closed map algorithm, which can be chosen from a variety of first-order algorithms, making it adjustable to the given problem. The second-order step can be viewed as a second-order feasible direction step for nonconvex minimization subject to a convex set. We prove that any limit point of the resulting scheme satisfies the second-order necessary optimality condition, and establish the scheme’s convergence rate and complexity, under standard and mild assumptions. Numerical tests illustrate the proposed scheme.
Similar content being viewed by others
Notes
Provided at https://www.tau.ac.il/~becka/BB_Documentation.7z.
References
Auslender, A.: Computing points that satisfy second order necessary optimality conditions for unconstrained minimization. SIAM J. Optim. 20(4), 1868–1884 (2010)
Yuan, Y.: Recent advances in trust region algorithms. Math. Program. 151(1), 249–281 (2015)
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2, Ser. A), 245–295 (2011)
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods, vol. 1. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)
Nesterov, Y., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Facchinei, F., Lucidi, S.: Convergence to second order stationary points in inequality constrained optimization. Math. Oper. Res. 23(3), 746–766 (1998)
Forsgren, A., Murray, W.: Newton methods for large-scale linear inequality-constrained minimization. SIAM J. Optim. 7(1), 162–176 (1997)
Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14(1–2), 75–98 (2000)
Gill, P.E., Murray, W.: Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)
More, J.J., Sorensen, D.C.: On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Pillo, G.D., Lucidi, S., Palagi, L.: Convergence to second-order stationary points of a primal-dual algorithm model for nonlinear programming. Math. Oper. Res. 30(4), 897–915 (2005)
Cartis, C., Gould, N.I.M., Toint, P.L.: Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization. Found. Comput. Math. 18(5), 1073–1107 (2018)
Cartis, C., Gould, N.I.M., Toint, P.L.: An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity. IMA J. Numer. Anal. 32(4), 1662–1695 (2012)
Zoutendijk, G.: Methods of Feasible Directions: A Study in Linear and Non-linear Programming. Elsevier, Amsterdam (1960)
Zangwill, W.I.: Nonlinear Programming: A Unified Approach, vol. 196. Prentice-Hall Englewood Cliffs, Upper Saddle River, NJ (1969)
Fu, M., Luo, Z., Ye, Y.: Approximation algorithms for quadratic programming. J. Comb. Optim. 2(1), 29–50 (1998)
Bienstock, D.: A note on polynomial solvability of the cdt problem. SIAM J. Optim. 26(1), 488–498 (2016)
Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trust-region subproblems. SIAM J. Optim. 23(1), 432–451 (2013)
Bienstock, D., Michalka, A.: Polynomial solvability of variants of the trust-region sub-problem. In: Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 380–390. SIAM (2014)
Beck, A., Pan, D.: A branch and bound algorithm for nonconvex quadratic optimization with ball and linear constraints. J. Glob. Optim. 69(2), 309–342 (2017)
Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear constraints. Math. Program. 149(1–2), 253–264 (2015)
Jeyakumar, V., Li, G.: Trust-region problems with linear inequality constraints: exact SDP relaxation, global optimality and robust optimization. Math. Program. 147(1–2), 171–206 (2014)
Bomze, I.M., Jeyakumar, V., Li, G.: Extended trust-region problems with one or two balls: exact copositive and lagrangian relaxations. J. Glob. Optim. 71(3), 551–569 (2018)
Ho-Nguyen, N., Kilinc-Karzan, F.: A second-order cone based approach for solving the trust-region subproblem and its variants. SIAM J. Optim. 27(3), 1485–1512 (2017)
Montanher, T., Neumaier, A., Domes, F.: A computational study of global optimization solvers on two trust region subproblems. J. Glob. Optim. 71(4), 915–934 (2018)
Sakaue, S., Nakatsukasa, Y., Takeda, A., Iwata, S.: Solving generalized cdt problems via two-parameter eigenvalues. SIAM J. Optim. 26(3), 1669–1694 (2016)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, Hoboken (2006)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Huard, P.: Optimization algorithms and point-to-set-maps. Math. Program. 8(1), 308–331 (1975)
Beck, A.: Introduction to Nonlinear Optimization, MOS-SIAM Series on Optimization, vol. 19. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2014)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (1970)
Hansen, E.R.: Global optimization using interval analysis: the one-dimensional case. J. Optim. Theory Appl. 29(3), 331–344 (1979)
Arbenz, P.: Lecture Notes on Solving Large Scale Eigenvalue Problems (2016). https://people.inf.ethz.ch/arbenz/ewp/Lnotes/lsevp.pdf
Griewank, A.: The Modification of Newton’s Method for Unconstrained Optimization by Bounding Cubic Terms. Technical Report NA/12 (1981)
Berge, C.: Topological Spaces: Including a Treatment of Multi-valued Functions, Vector Spaces, and Convexity. Courier Corporation, Chelmsford (1997)
Fiacco, A.V.: Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Mathematics in Science and Engineering, vol. 165. Academic Press, Cambridge (1983)
Hogan, W.W.: Point-to-set maps in mathematical programming. SIAM Rev. 15(3), 591–603 (1973)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Mei, S., Bai, Y., Montanari, A.: The landscape of empirical risk for nonconvex losses. Ann. Stat. 46(6A), 2747–2774 (2018)
Yang, X.: Nature-Inspired Metaheuristic Algorithms, 2nd edn. Luniver Press, Luniver Press (2010)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2009)
Acknowledgements
The research of N. Hallak was conducted at Tel-Aviv University and was supported by a postdoctoral fellowship under ISF Grant 1844-16. The research of M. Teboulle’s was partially supported by the Israel Science Foundation, under ISF Grant 1844-16.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Alexey F. Izmailov.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Continuity of the Second-Order Optimality Measure
We start by recalling here some well-known definitions/properties on multi-valued maps that can be found for example in [37]. Assume hereafter that \(C\subseteq \mathbb {R}^n\) is closed, convex, nonempty set.
Definition A.1
(Basic properties of point-to-set maps) Let \(\Omega \) be a point-to-set map at \(\bar{\mathbf{x}}\in C\). Then:
-
1.
\(\Omega \) is open at \(\bar{\mathbf{x}}\), if \(\{\mathbf{x}^k\}_{k\ge 0}\subseteq C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), and \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) imply the existence of an integer K and a sequence \(\{ \mathbf{y}^k \}_{k\ge 0}\subseteq C\) such that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\) for \(k\ge K\) and \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\).
-
2.
\(\Omega \) is closed at \(\bar{\mathbf{x}}\), if \(\{\mathbf{x}^k\}_{k\ge 0}\subseteq C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\), and \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\) imply that \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\).
-
3.
\(\Omega \) is continuous at \(\bar{\mathbf{x}}\) if it is both open and closed at \(\bar{\mathbf{x}}\).
-
4.
\(\Omega \) is uniformly compact near \(\bar{\mathbf{x}}\) if there is a neighborhood N of \(\bar{\mathbf{x}}\) such that the closure of the set \(\bigcup _{\mathbf{x}\in N}\) is compact.
The continuity property of Q follows from the next result (which we mildly adjusted to our setting).
Theorem A.1
(Continuity of problems [37, Theorem 7]) Let \(\Omega \) be a point-to-set map, and let
where \( \phi : C \times C \rightarrow ]-\infty ,\infty [\). If \(\Omega \) is continuous at \(\bar{\mathbf{x}}\in C\) and uniformly compact near \(\bar{\mathbf{x}}\), and if \(\phi \) is continuous on \(\bar{\mathbf{x}}\times \Omega (\bar{\mathbf{x}})\), then \(\nu \) is continuous at \(\bar{\mathbf{x}}\).
To apply Theorem A.1 in the case of Q, we will restrict the discussion to the optimization problem (16) with the point-to-set mapping
with \(\mathbf{a}:\mathbb {R}^n\rightarrow \mathbb {R}^n\) being a continuous vector mapping, while assuming that
is a continuous function. Clearly, the optimality measure \(Q(\cdot )\) can be defined in the form of (16) by setting \(\mathbf{y}= \mathbf{x}+ \mathbf{d}\), \(\phi (\mathbf{x},\mathbf{y}) = -f''(\mathbf{x}; \mathbf{y}-\mathbf{x})\), and \(\mathbf{a}(\mathbf{x}) = \nabla f(\mathbf{x})\).
We will prove that \(\Omega \) given by (17) is uniformly compact and continuous, where the continuity of \(\Omega \) will be established using the following two results.
Theorem A.2
[37, Theorem 10] Let \(\Omega \) be a point-to-set map given by
where each component of \(g:C\times C \rightarrow [-\infty ,\infty ]^m\) is lower semicontinuous in \(\{\bar{\mathbf{x}}\}\times C\). Then, \(\Omega \) is closed at \(\bar{\mathbf{x}}\).
Noting that C is convex, we have the following (adjusted) result (credited to Geoffrion).
Theorem A.3
[37, Theorem 12] Let \(\Omega \) be a point-to-set map given by
where each component of \(g:C\times C \rightarrow [-\infty ,\infty ]^m\) is continuous on \(\{\bar{\mathbf{x}}\}\times \Omega (\bar{\mathbf{x}})\), and convex in \(\mathbf{y}\) for each fixed \(\mathbf{x}\in C\). If there exists a \(\bar{\mathbf{y}}\) such that \(g(\bar{\mathbf{x}},\bar{\mathbf{y}})<0\), then \(\Omega \) is open at \(\bar{\mathbf{x}}\).
We can now prove the continuity and uniform compactness of \(\Omega \).
Lemma A.1
Suppose that \(\Omega \) is given by (17). Then for any \(\bar{\mathbf{x}}\in C\), we have that \(\Omega \) is uniformly compact and continuous at \(\bar{\mathbf{x}}\).
Proof
The uniform compactness of \(\Omega \) follows trivially from its definition as the intersection of a half-plane and a norm ball.
To prove that \(\Omega \) is continuous, we will show that it is both closed and open. The closeness of \(\Omega \) is implied by Theorem A.2 as for any \(\bar{\mathbf{x}}\in C\), all functions defining the point-to-set mapping \(\Omega \) are continuous.
We will now prove that \(\Omega \) is open. Let \(\bar{\mathbf{x}}\in C\), \(\mathbf{x}^k\rightarrow \bar{\mathbf{x}}\), and \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\). We need to prove that there exists a sequence \(\{\mathbf{y}^k\}_{k\ge 0}\) such that \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\), satisfying that for any \(k\ge 0\) it holds that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\).
Suppose that there exists \(\hat{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\hat{\mathbf{y}} - \bar{\mathbf{x}}) < 0\). If \(\Vert \hat{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 = r\), then there exists \(\mathbf{d}\in \mathbb {R}^n \) and a sufficient small \(\alpha \) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\hat{\mathbf{y}} + \alpha \mathbf{d}- \bar{\mathbf{x}}) < 0\) and \(\Vert \hat{\mathbf{y}} + \alpha \mathbf{d}- \bar{\mathbf{x}} \Vert _2 < r\). Hence, there exists \(\tilde{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) such that \(\mathbf{a}(\bar{\mathbf{x}})^T (\tilde{\mathbf{y}} - \bar{\mathbf{x}}) < 0\) and \(\Vert \tilde{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 < r\). Subsequently, we can invoke Theorem A.3 to obtain the desired claim that \(\Omega (\bar{\mathbf{x}})\) is open. Assume hereafter that \(\mathbf{a}(\bar{\mathbf{x}})^T (\mathbf{y}- \bar{\mathbf{x}}) = 0\) for any \(\mathbf{y}\in \Omega (\bar{\mathbf{x}})\).
Set \(\bar{r} = \Vert \bar{\mathbf{y}} - \bar{\mathbf{x}} \Vert _2 \le r\), and define for any \(k\ge 0\)
To see that indeed \(\mathbf{y}^k \rightarrow \bar{\mathbf{y}}\), note that by the continuity of all terms and the assumption that \(\mathbf{a}(\bar{\mathbf{x}})^T (\bar{\mathbf{y}} - \bar{\mathbf{x}}) = 0\) we have that
The claim \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\) for any \(k\ge 0\) follows from the fact that \(\mathbf{y}^k = \bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\) whenever \(\mathbf{x}^k = \bar{\mathbf{y}}\) or \(\mathbf{a}(\mathbf{x}^k) = 0 \), and otherwise
To summarize, for any \(\bar{\mathbf{y}}\in \Omega (\bar{\mathbf{x}})\), there exists a sequence \(\{\mathbf{y}^k\}_{k\ge 0}\) such that \(\mathbf{y}^k\rightarrow \bar{\mathbf{y}}\) satisfying that for any \(k\ge 0\) it holds that \(\mathbf{y}^k\in \Omega (\mathbf{x}^k)\), meaning that \(\Omega (\bar{\mathbf{x}})\) is open. \(\square \)
Proof of Lemma 4.3
(Continuity of the second-order optimality measure Q) Since \(\Omega \) is continuous and uniformly compact (Lemma A.1) at every point \(\mathbf{x}\in C\), and \(f''(\mathbf{x};\mathbf{d})\) is continuous, we have by Theorem A.1 that Q is continuous. \(\square \)
Appendix B: The Projected Gradient Descent Method with Backtracking
The Projected Gradient Descent (PGD) method is a classical algorithm with abundant references describing it in details; see for instance the survey of [41] and the extensive list references therein. The complexity of the PGD to find an \(\varepsilon \) first order stationary point (measured through the usual norm of the gradient map) is of the order of \((O(\varepsilon ^{-2}))\); see, e.g., [30, Theorem 9.15]. Below, we describe the PGD used in our numerical tests with step size determined by backtracking policy below (cf. [30, Section 9.4]).
Rights and permissions
About this article
Cite this article
Hallak, N., Teboulle, M. Finding Second-Order Stationary Points in Constrained Minimization: A Feasible Direction Approach. J Optim Theory Appl 186, 480–503 (2020). https://doi.org/10.1007/s10957-020-01713-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01713-x
Keywords
- Feasible direction methods
- Second-order methods
- Constrained optimization
- Second-order necessary optimality conditions