Skip to main content
Log in

Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher’s augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches \(\varepsilon \)-approximate second-order critical points of the original optimization problem in at most \({\mathcal {O}}(\varepsilon ^{-3})\) iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher’s augmented Lagrangian, which may be of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3

Similar content being viewed by others

Notes

  1. We review various proposals that have been made, with their pros and cons, in Appendix A of the ArXiv version of this paper [23].

References

  1. Ablin, P., Peyré, G.: Fast and accurate optimization on the orthogonal manifold without retraction. In International Conference on Artificial Intelligence and Statistics, pp. 5636–5657. PMLR (2022)

  2. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244

    Book  Google Scholar 

  3. Andreani, R., Martínez, J.M., Schuverdt, M.L.: On second-order optimality conditions for nonlinear programming. Optimization 56(5–6), 529–542 (2007). https://doi.org/10.1080/02331930701618617

    Article  MathSciNet  Google Scholar 

  4. Bai, Y., Mei, S.: Analysis of Sequential Quadratic Programming Through the Lens of Riemannian Optimization. arXiv preprint arXiv:1805.08756 (2018)

  5. Bai, Y., Duchi, J., Mei, S.: Proximal Algorithms for Constrained Composite Optimization, with Applications to Solving Low-Rank SDPs. arXiv preprint arXiv:1903.00184 (2019)

  6. Bento, G.C., Ferreira, O.P., Melo, J.G.: iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4

    Article  MathSciNet  Google Scholar 

  7. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, Cambridge (1982). https://doi.org/10.1016/C2013-0-10366-2

    Book  Google Scholar 

  8. Birgin, E.G., Martínez, J.M.: Complexity and performance of an augmented Lagrangian algorithm. Optim. Methods Softw. 35(5), 885–920 (2020). https://doi.org/10.1080/10556788.2020.1746962

    Article  MathSciNet  Google Scholar 

  9. Boumal, N., Absil, P.-A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019). https://doi.org/10.1093/imanum/drx080

    Article  MathSciNet  Google Scholar 

  10. Boumal, Nicolas: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge (2023). https://doi.org/10.1017/9781009166164

    Book  Google Scholar 

  11. Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003). https://doi.org/10.1007/s10107-002-0352-8

    Article  MathSciNet  Google Scholar 

  12. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012). https://doi.org/10.1016/j.jco.2011.06.001

    Article  MathSciNet  Google Scholar 

  13. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization. J. Complex. 53, 68–94 (2019). https://doi.org/10.1016/j.jco.2018.11.001

    Article  MathSciNet  Google Scholar 

  14. Cifuentes, D., Moitra, A.: Polynomial time guarantees for the Burer–Monteiro method. Adv. Neural Inf. Process. Syst. 35, 23923–23935 (2022)

    Google Scholar 

  15. Di Pillo, G.: Exact penalty methods. In: Algorithms for Continuous Optimization, pp. 209–253. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-009-0369-2_8

  16. Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986). https://doi.org/10.1007/BF02591986

    Article  MathSciNet  Google Scholar 

  17. Di Pillo, G., Grippo, L.: Exact penalty functions in constrained optimization. SIAM J. Control. Optim. 27(6), 1333–1360 (1989). https://doi.org/10.1137/0327068

    Article  MathSciNet  Google Scholar 

  18. Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for equality-constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1809–A1835 (2020). https://doi.org/10.1137/19M1238265

    Article  MathSciNet  Google Scholar 

  19. Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020). https://doi.org/10.1137/19M1255069

    Article  MathSciNet  Google Scholar 

  20. Fletcher, R.: A class of methods for nonlinear programming with termination and convergence properties. In: Integer and nonlinear programming, pp. 157–173. Amsterdam (1970)

  21. Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019). https://doi.org/10.1137/18M1221679

    Article  MathSciNet  Google Scholar 

  22. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Proceedings of The 28th Conference on Learning Theory, pp. 797–842. PMLR (2015)

  23. Goyens, F., Eftekhari, A., Boumal, N.: Computing second-order points under equality constraints: revisiting Fletcher’s augmented Lagrangian. arXiv preprint arXiv:2204.01448 (2022)

  24. Grapiglia, G.N., Yuan, Y.-X.: On the complexity of an augmented Lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2), 1508–1530 (2021). https://doi.org/10.1093/imanum/draa021

    Article  MathSciNet  Google Scholar 

  25. Grubišić, I., Pietersz, R.: Efficient rank reduction of correlation matrices. Linear Algebra Appl. 422(2), 629–653 (2007). https://doi.org/10.1016/j.laa.2006.11.024

    Article  MathSciNet  Google Scholar 

  26. He, C., Lu, Z., Pong, T. K.: A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees. arXiv preprint arXiv:2301.03139 (2023)

  27. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991). https://doi.org/10.1017/CBO9780511840371

    Book  Google Scholar 

  28. Lee, John M.: Introduction to Riemannian Manifolds, vol. 2. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91755-9

    Book  Google Scholar 

  29. Ling, S.: Solving orthogonal group synchronization via convex and low-rank optimization: tightness and landscape analysis. Math. Program. 200(1), 589–628 (2023). https://doi.org/10.1007/s10107-022-01896-3

    Article  MathSciNet  Google Scholar 

  30. Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di geometria, pp. 115–117, (1982)

  31. Nesterov, Yurii: Introductory Lectures on Convex Optimization. Springer, New York (2004). https://doi.org/10.1007/978-1-4419-8853-9

    Book  Google Scholar 

  32. Polyak, B.T.: Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3(4), 643–653 (1963). https://doi.org/10.1016/0041-5553(63)90382-3

    Article  MathSciNet  Google Scholar 

  33. Polyak, R.A.: On the local quadratic convergence of the primal-dual augmented Lagrangian method. Optim. Methods Softw. 24(3), 369–379 (2009). https://doi.org/10.1080/10556780802699433

    Article  MathSciNet  Google Scholar 

  34. Rosen, D.M., Doherty, K.J., Terán Espinoza, A., Leonard, J.J.: Advances in inference and representation for simultaneous localization and mapping. Annu. Rev. Control Robot. Auton. Syst. 4(1), 215–242 (2021). https://doi.org/10.1146/annurev-control-072720-082553

    Article  Google Scholar 

  35. Royer, C.W., O’Neill, M., Wright, S.J.: A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math. Program. 180(1), 451–488 (2020). https://doi.org/10.1007/s10107-019-01362-7

    Article  MathSciNet  Google Scholar 

  36. Schechtman, S., Tiapkin, D., Muehlebach, M., Moulines, E.: Orthogonal Directions Constrained Gradient Method: From non-linear equality constraints to Stiefel manifold. arXiv preprint arXiv:2303.09261 (2023)

  37. Wright, S.J., Recht, B.: Optimization for Data Analysis. Cambridge University Press, Cambridge (2022). https://doi.org/10.1017/9781009004282

    Book  Google Scholar 

  38. Xiao, N., Liu, X.: Solving optimization problems over the Stiefel manifold by smooth exact penalty function. arXiv preprint arXiv:2110.08986 (2021)

  39. Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optim. Methods Softw. 37(4), 1205–1241 (2022). https://doi.org/10.1080/10556788.2020.1852236

    Article  MathSciNet  Google Scholar 

  40. Xie, Y., Wright, S.J.: Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3), 1–30 (2021). https://doi.org/10.1007/s10915-021-01409-y

    Article  MathSciNet  Google Scholar 

  41. Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638. PMLR (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florentin Goyens.

Additional information

Communicated by Aram Arutyunov.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Proposition 2.8

Proof of Proposition 2.8

Proof

Define \(\varphi (x) = \dfrac{1}{2}\left\| h(x)\right\| ^2\) and take any \(x_0 \in {\mathcal {C}}= \lbrace x\in {\mathcal {E}}: \varphi (x) \le R^2/2\rbrace \). Consider the following differential system:

$$\begin{aligned} \left\{ \begin{aligned} \dfrac{\textrm{d}}{\textrm{d} t}x(t)&= - \nabla \varphi (x(t)) \\ x(0)&= x_0. \end{aligned} \right. \end{aligned}$$
(A.1)

The fundamental theorem of flows [28, Theorem A.42] guarantees the existence of a unique maximal integral curve starting at \(x_0\) for (A.1). Let \(z( \cdot ):I \rightarrow {\mathcal {E}}\) denote this maximal integral curve and \(T>0\) be the supremum of the interval I on which \(z(\cdot )\) is defined. We rely on the Escape Lemma [28, Lemma A.43] to show that z(t) is defined for all times \(t\ge 0\). For \(t< T\), we write \(\ell = \varphi \circ z\) and find

$$\begin{aligned} \ell '(t)&= \textrm{D}\varphi (z(t))\left[ \frac{\textrm{d}}{\textrm{d}t}z(t)\right] = \left\langle {\nabla \varphi (z(t))},{\frac{\textrm{d}}{\textrm{d}t}z(t)}\right\rangle \end{aligned}$$
(A.2)
$$\begin{aligned}&= - \left\| \nabla \varphi (z(t))\right\| ^2 \end{aligned}$$
(A.3)
$$\begin{aligned}&= - \left\| \textrm{D}h(z(t))^*[h(z(t))]\right\| ^2 \le 0. \end{aligned}$$
(A.4)

This implies that \(z(t) \in {\mathcal {C}}\) for all \(0\le t < T\). We show that the trajectory z(t) has finite length. To that end, we note that

$$\begin{aligned} \dfrac{1}{2}\left\| \nabla \varphi (x)\right\| ^2 = \dfrac{1}{2} \left\| \textrm{D}h(x)^* [h(x)] \right\| ^2 \ge {\underline{\sigma }}^2 \dfrac{1}{2} \left\| h(x)\right\| ^2 = {\underline{\sigma }}^2 \varphi (x), \end{aligned}$$
(A.5)

for all \(x\in {\mathcal {C}}\). The length of the trajectory from time \(t=0\) to \(t=T\) is bounded as follows, using a classical argument [30]:

$$\begin{aligned} \int _0^T \left\| \frac{\textrm{d}}{\textrm{d}t}z(t)\right\| \textrm{d} t&= \int _0^T \left\| - \nabla \varphi (z(t))\right\| \textrm{d} t \nonumber \\&= \int _0^T \dfrac{ \left\| \nabla \varphi (z(t))\right\| ^2}{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t \nonumber \\&= \int _0^T \dfrac{ \left\langle {- \nabla \varphi (z(t))},{\frac{\textrm{d}}{\textrm{d}t}z(t)}\right\rangle }{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t\nonumber \\&= \int _0^T \dfrac{ - (\varphi \circ z)'(t)}{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t\nonumber \\&\le \int _0^T \dfrac{ - (\varphi \circ z)'(t)}{{\underline{\sigma }}\sqrt{2(\varphi \circ z)(t)}} \textrm{d} t\nonumber \\&= \dfrac{-\sqrt{2}}{{\underline{\sigma }}} \left[ \sqrt{\varphi (z(T))} - \sqrt{\varphi (z(0))}\right] \nonumber \\&\le \dfrac{\sqrt{2 \varphi (z(0))}}{{\underline{\sigma }}}. \end{aligned}$$
(A.6)

The length is bounded independently of T and therefore the flow has finite length. The Escape Lemma states that for a maximum integral curve \(z(\cdot ) :I \rightarrow {\mathcal {E}}\), if I has a finite upper bound, then the curve \(z(\cdot )\) must be unbounded. Since \(z(\cdot )\) is contained in a compact set by (A.6), the converse ensures that the interval I does not have a finite upper bound and therefore, \(I={\mathbb {R}}_+\). Since the trajectory z(t) is bounded for \(t\ge 0\), it must have an accumulation point \({\bar{z}}\). From A1, we have \(\sigma _\textrm{min}(\textrm{D}h(z(t)) \ge {\underline{\sigma }}>0\) for all \(t \ge 0\). This gives the bound \(\ell '(t) \le - {\underline{\sigma }}^2 \left\| h(z(t))\right\| ^2 = -2{\underline{\sigma }}^2 \ell (t)\). Gronwall’s inequality then yields

$$\begin{aligned} \ell (t) \le \varphi (x_0) e^{-2{\underline{\sigma }}^2 t}. \end{aligned}$$
(A.7)

Therefore \(\ell (t) \rightarrow 0 \) as \(t \rightarrow \infty \), which implies \(h(z(t))\rightarrow 0 \) as \(t\rightarrow \infty \). We conclude that the accumulation point satisfies \(h({\bar{z}}) = 0\). Since \({\mathcal {C}}\) is closed, the point \({{\bar{z}}}\) is in \({\mathcal {C}}\). Therefore, \({{\bar{z}}}\) is both in \({\mathcal {M}}\) and in the connected component of \({\mathcal {C}}\) that contains \(z(0) = x_0\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goyens, F., Eftekhari, A. & Boumal, N. Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02421-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10957-024-02421-6

Keywords

Navigation