Abstract
We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher’s augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches \(\varepsilon \)-approximate second-order critical points of the original optimization problem in at most \({\mathcal {O}}(\varepsilon ^{-3})\) iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher’s augmented Lagrangian, which may be of independent interest.
Similar content being viewed by others
Notes
We review various proposals that have been made, with their pros and cons, in Appendix A of the ArXiv version of this paper [23].
References
Ablin, P., Peyré, G.: Fast and accurate optimization on the orthogonal manifold without retraction. In International Conference on Artificial Intelligence and Statistics, pp. 5636–5657. PMLR (2022)
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Andreani, R., Martínez, J.M., Schuverdt, M.L.: On second-order optimality conditions for nonlinear programming. Optimization 56(5–6), 529–542 (2007). https://doi.org/10.1080/02331930701618617
Bai, Y., Mei, S.: Analysis of Sequential Quadratic Programming Through the Lens of Riemannian Optimization. arXiv preprint arXiv:1805.08756 (2018)
Bai, Y., Duchi, J., Mei, S.: Proximal Algorithms for Constrained Composite Optimization, with Applications to Solving Low-Rank SDPs. arXiv preprint arXiv:1903.00184 (2019)
Bento, G.C., Ferreira, O.P., Melo, J.G.: iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, Cambridge (1982). https://doi.org/10.1016/C2013-0-10366-2
Birgin, E.G., Martínez, J.M.: Complexity and performance of an augmented Lagrangian algorithm. Optim. Methods Softw. 35(5), 885–920 (2020). https://doi.org/10.1080/10556788.2020.1746962
Boumal, N., Absil, P.-A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019). https://doi.org/10.1093/imanum/drx080
Boumal, Nicolas: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge (2023). https://doi.org/10.1017/9781009166164
Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003). https://doi.org/10.1007/s10107-002-0352-8
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012). https://doi.org/10.1016/j.jco.2011.06.001
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization. J. Complex. 53, 68–94 (2019). https://doi.org/10.1016/j.jco.2018.11.001
Cifuentes, D., Moitra, A.: Polynomial time guarantees for the Burer–Monteiro method. Adv. Neural Inf. Process. Syst. 35, 23923–23935 (2022)
Di Pillo, G.: Exact penalty methods. In: Algorithms for Continuous Optimization, pp. 209–253. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-009-0369-2_8
Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986). https://doi.org/10.1007/BF02591986
Di Pillo, G., Grippo, L.: Exact penalty functions in constrained optimization. SIAM J. Control. Optim. 27(6), 1333–1360 (1989). https://doi.org/10.1137/0327068
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for equality-constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1809–A1835 (2020). https://doi.org/10.1137/19M1238265
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020). https://doi.org/10.1137/19M1255069
Fletcher, R.: A class of methods for nonlinear programming with termination and convergence properties. In: Integer and nonlinear programming, pp. 157–173. Amsterdam (1970)
Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019). https://doi.org/10.1137/18M1221679
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Proceedings of The 28th Conference on Learning Theory, pp. 797–842. PMLR (2015)
Goyens, F., Eftekhari, A., Boumal, N.: Computing second-order points under equality constraints: revisiting Fletcher’s augmented Lagrangian. arXiv preprint arXiv:2204.01448 (2022)
Grapiglia, G.N., Yuan, Y.-X.: On the complexity of an augmented Lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2), 1508–1530 (2021). https://doi.org/10.1093/imanum/draa021
Grubišić, I., Pietersz, R.: Efficient rank reduction of correlation matrices. Linear Algebra Appl. 422(2), 629–653 (2007). https://doi.org/10.1016/j.laa.2006.11.024
He, C., Lu, Z., Pong, T. K.: A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees. arXiv preprint arXiv:2301.03139 (2023)
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991). https://doi.org/10.1017/CBO9780511840371
Lee, John M.: Introduction to Riemannian Manifolds, vol. 2. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91755-9
Ling, S.: Solving orthogonal group synchronization via convex and low-rank optimization: tightness and landscape analysis. Math. Program. 200(1), 589–628 (2023). https://doi.org/10.1007/s10107-022-01896-3
Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di geometria, pp. 115–117, (1982)
Nesterov, Yurii: Introductory Lectures on Convex Optimization. Springer, New York (2004). https://doi.org/10.1007/978-1-4419-8853-9
Polyak, B.T.: Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3(4), 643–653 (1963). https://doi.org/10.1016/0041-5553(63)90382-3
Polyak, R.A.: On the local quadratic convergence of the primal-dual augmented Lagrangian method. Optim. Methods Softw. 24(3), 369–379 (2009). https://doi.org/10.1080/10556780802699433
Rosen, D.M., Doherty, K.J., Terán Espinoza, A., Leonard, J.J.: Advances in inference and representation for simultaneous localization and mapping. Annu. Rev. Control Robot. Auton. Syst. 4(1), 215–242 (2021). https://doi.org/10.1146/annurev-control-072720-082553
Royer, C.W., O’Neill, M., Wright, S.J.: A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math. Program. 180(1), 451–488 (2020). https://doi.org/10.1007/s10107-019-01362-7
Schechtman, S., Tiapkin, D., Muehlebach, M., Moulines, E.: Orthogonal Directions Constrained Gradient Method: From non-linear equality constraints to Stiefel manifold. arXiv preprint arXiv:2303.09261 (2023)
Wright, S.J., Recht, B.: Optimization for Data Analysis. Cambridge University Press, Cambridge (2022). https://doi.org/10.1017/9781009004282
Xiao, N., Liu, X.: Solving optimization problems over the Stiefel manifold by smooth exact penalty function. arXiv preprint arXiv:2110.08986 (2021)
Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optim. Methods Softw. 37(4), 1205–1241 (2022). https://doi.org/10.1080/10556788.2020.1852236
Xie, Y., Wright, S.J.: Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3), 1–30 (2021). https://doi.org/10.1007/s10915-021-01409-y
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638. PMLR (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Aram Arutyunov.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof of Proposition 2.8
Proof of Proposition 2.8
Proof
Define \(\varphi (x) = \dfrac{1}{2}\left\| h(x)\right\| ^2\) and take any \(x_0 \in {\mathcal {C}}= \lbrace x\in {\mathcal {E}}: \varphi (x) \le R^2/2\rbrace \). Consider the following differential system:
The fundamental theorem of flows [28, Theorem A.42] guarantees the existence of a unique maximal integral curve starting at \(x_0\) for (A.1). Let \(z( \cdot ):I \rightarrow {\mathcal {E}}\) denote this maximal integral curve and \(T>0\) be the supremum of the interval I on which \(z(\cdot )\) is defined. We rely on the Escape Lemma [28, Lemma A.43] to show that z(t) is defined for all times \(t\ge 0\). For \(t< T\), we write \(\ell = \varphi \circ z\) and find
This implies that \(z(t) \in {\mathcal {C}}\) for all \(0\le t < T\). We show that the trajectory z(t) has finite length. To that end, we note that
for all \(x\in {\mathcal {C}}\). The length of the trajectory from time \(t=0\) to \(t=T\) is bounded as follows, using a classical argument [30]:
The length is bounded independently of T and therefore the flow has finite length. The Escape Lemma states that for a maximum integral curve \(z(\cdot ) :I \rightarrow {\mathcal {E}}\), if I has a finite upper bound, then the curve \(z(\cdot )\) must be unbounded. Since \(z(\cdot )\) is contained in a compact set by (A.6), the converse ensures that the interval I does not have a finite upper bound and therefore, \(I={\mathbb {R}}_+\). Since the trajectory z(t) is bounded for \(t\ge 0\), it must have an accumulation point \({\bar{z}}\). From A1, we have \(\sigma _\textrm{min}(\textrm{D}h(z(t)) \ge {\underline{\sigma }}>0\) for all \(t \ge 0\). This gives the bound \(\ell '(t) \le - {\underline{\sigma }}^2 \left\| h(z(t))\right\| ^2 = -2{\underline{\sigma }}^2 \ell (t)\). Gronwall’s inequality then yields
Therefore \(\ell (t) \rightarrow 0 \) as \(t \rightarrow \infty \), which implies \(h(z(t))\rightarrow 0 \) as \(t\rightarrow \infty \). We conclude that the accumulation point satisfies \(h({\bar{z}}) = 0\). Since \({\mathcal {C}}\) is closed, the point \({{\bar{z}}}\) is in \({\mathcal {C}}\). Therefore, \({{\bar{z}}}\) is both in \({\mathcal {M}}\) and in the connected component of \({\mathcal {C}}\) that contains \(z(0) = x_0\). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Goyens, F., Eftekhari, A. & Boumal, N. Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02421-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10957-024-02421-6