Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian

Goyens, Florentin; Eftekhari, Armin; Boumal, Nicolas

doi:10.1007/s10957-024-02421-6

Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian

Published: 11 April 2024

(2024)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

53 Accesses
Explore all metrics

Abstract

We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher’s augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches $\varepsilon $-approximate second-order critical points of the original optimization problem in at most ${\mathcal {O}}(\varepsilon ^{-3})$ iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher’s augmented Lagrangian, which may be of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-order methods beyond the classical complexity bounds: inexact high-order proximal-point methods

Article Open access 04 January 2024

The Equivalence of Three Types of Error Bounds for Weakly and Approximately Convex Functions

Article 26 March 2022

Smooth exact penalty functions II: a reduction to standard exact penalty functions

Article 15 October 2015

Notes

We review various proposals that have been made, with their pros and cons, in Appendix A of the ArXiv version of this paper [23].

References

Ablin, P., Peyré, G.: Fast and accurate optimization on the orthogonal manifold without retraction. In International Conference on Artificial Intelligence and Statistics, pp. 5636–5657. PMLR (2022)
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Book Google Scholar
Andreani, R., Martínez, J.M., Schuverdt, M.L.: On second-order optimality conditions for nonlinear programming. Optimization 56(5–6), 529–542 (2007). https://doi.org/10.1080/02331930701618617
Article MathSciNet Google Scholar
Bai, Y., Mei, S.: Analysis of Sequential Quadratic Programming Through the Lens of Riemannian Optimization. arXiv preprint arXiv:1805.08756 (2018)
Bai, Y., Duchi, J., Mei, S.: Proximal Algorithms for Constrained Composite Optimization, with Applications to Solving Low-Rank SDPs. arXiv preprint arXiv:1903.00184 (2019)
Bento, G.C., Ferreira, O.P., Melo, J.G.: iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4
Article MathSciNet Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, Cambridge (1982). https://doi.org/10.1016/C2013-0-10366-2
Book Google Scholar
Birgin, E.G., Martínez, J.M.: Complexity and performance of an augmented Lagrangian algorithm. Optim. Methods Softw. 35(5), 885–920 (2020). https://doi.org/10.1080/10556788.2020.1746962
Article MathSciNet Google Scholar
Boumal, N., Absil, P.-A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019). https://doi.org/10.1093/imanum/drx080
Article MathSciNet Google Scholar
Boumal, Nicolas: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge (2023). https://doi.org/10.1017/9781009166164
Book Google Scholar
Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003). https://doi.org/10.1007/s10107-002-0352-8
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012). https://doi.org/10.1016/j.jco.2011.06.001
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization. J. Complex. 53, 68–94 (2019). https://doi.org/10.1016/j.jco.2018.11.001
Article MathSciNet Google Scholar
Cifuentes, D., Moitra, A.: Polynomial time guarantees for the Burer–Monteiro method. Adv. Neural Inf. Process. Syst. 35, 23923–23935 (2022)
Google Scholar
Di Pillo, G.: Exact penalty methods. In: Algorithms for Continuous Optimization, pp. 209–253. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-009-0369-2_8
Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986). https://doi.org/10.1007/BF02591986
Article MathSciNet Google Scholar
Di Pillo, G., Grippo, L.: Exact penalty functions in constrained optimization. SIAM J. Control. Optim. 27(6), 1333–1360 (1989). https://doi.org/10.1137/0327068
Article MathSciNet Google Scholar
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for equality-constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1809–A1835 (2020). https://doi.org/10.1137/19M1238265
Article MathSciNet Google Scholar
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020). https://doi.org/10.1137/19M1255069
Article MathSciNet Google Scholar
Fletcher, R.: A class of methods for nonlinear programming with termination and convergence properties. In: Integer and nonlinear programming, pp. 157–173. Amsterdam (1970)
Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019). https://doi.org/10.1137/18M1221679
Article MathSciNet Google Scholar
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Proceedings of The 28th Conference on Learning Theory, pp. 797–842. PMLR (2015)
Goyens, F., Eftekhari, A., Boumal, N.: Computing second-order points under equality constraints: revisiting Fletcher’s augmented Lagrangian. arXiv preprint arXiv:2204.01448 (2022)
Grapiglia, G.N., Yuan, Y.-X.: On the complexity of an augmented Lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2), 1508–1530 (2021). https://doi.org/10.1093/imanum/draa021
Article MathSciNet Google Scholar
Grubišić, I., Pietersz, R.: Efficient rank reduction of correlation matrices. Linear Algebra Appl. 422(2), 629–653 (2007). https://doi.org/10.1016/j.laa.2006.11.024
Article MathSciNet Google Scholar
He, C., Lu, Z., Pong, T. K.: A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees. arXiv preprint arXiv:2301.03139 (2023)
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991). https://doi.org/10.1017/CBO9780511840371
Book Google Scholar
Lee, John M.: Introduction to Riemannian Manifolds, vol. 2. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91755-9
Book Google Scholar
Ling, S.: Solving orthogonal group synchronization via convex and low-rank optimization: tightness and landscape analysis. Math. Program. 200(1), 589–628 (2023). https://doi.org/10.1007/s10107-022-01896-3
Article MathSciNet Google Scholar
Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di geometria, pp. 115–117, (1982)
Nesterov, Yurii: Introductory Lectures on Convex Optimization. Springer, New York (2004). https://doi.org/10.1007/978-1-4419-8853-9
Book Google Scholar
Polyak, B.T.: Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3(4), 643–653 (1963). https://doi.org/10.1016/0041-5553(63)90382-3
Article MathSciNet Google Scholar
Polyak, R.A.: On the local quadratic convergence of the primal-dual augmented Lagrangian method. Optim. Methods Softw. 24(3), 369–379 (2009). https://doi.org/10.1080/10556780802699433
Article MathSciNet Google Scholar
Rosen, D.M., Doherty, K.J., Terán Espinoza, A., Leonard, J.J.: Advances in inference and representation for simultaneous localization and mapping. Annu. Rev. Control Robot. Auton. Syst. 4(1), 215–242 (2021). https://doi.org/10.1146/annurev-control-072720-082553
Article Google Scholar
Royer, C.W., O’Neill, M., Wright, S.J.: A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math. Program. 180(1), 451–488 (2020). https://doi.org/10.1007/s10107-019-01362-7
Article MathSciNet Google Scholar
Schechtman, S., Tiapkin, D., Muehlebach, M., Moulines, E.: Orthogonal Directions Constrained Gradient Method: From non-linear equality constraints to Stiefel manifold. arXiv preprint arXiv:2303.09261 (2023)
Wright, S.J., Recht, B.: Optimization for Data Analysis. Cambridge University Press, Cambridge (2022). https://doi.org/10.1017/9781009004282
Book Google Scholar
Xiao, N., Liu, X.: Solving optimization problems over the Stiefel manifold by smooth exact penalty function. arXiv preprint arXiv:2110.08986 (2021)
Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optim. Methods Softw. 37(4), 1205–1241 (2022). https://doi.org/10.1080/10556788.2020.1852236
Article MathSciNet Google Scholar
Xie, Y., Wright, S.J.: Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3), 1–30 (2021). https://doi.org/10.1007/s10915-021-01409-y
Article MathSciNet Google Scholar
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638. PMLR (2016)

Download references

Author information

Authors and Affiliations

LAMSADE, Université Paris Dauphine-PSL, Paris, France
Florentin Goyens
Umeå, Sweden
Armin Eftekhari
Institute of Mathematics, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Nicolas Boumal

Authors

Florentin Goyens
View author publications
You can also search for this author in PubMed Google Scholar
Armin Eftekhari
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Boumal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florentin Goyens.

Additional information

Communicated by Aram Arutyunov.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Proposition 2.8

Proof

Define $\varphi (x) = \dfrac{1}{2}\left\| h(x)\right\| ^2$ and take any $x_0 \in {\mathcal {C}}= \lbrace x\in {\mathcal {E}}: \varphi (x) \le R^2/2\rbrace $. Consider the following differential system:

$$\begin{aligned} \left\{ \begin{aligned} \dfrac{\textrm{d}}{\textrm{d} t}x(t)&= - \nabla \varphi (x(t)) \\ x(0)&= x_0. \end{aligned} \right. \end{aligned}$$

(A.1)

The fundamental theorem of flows [28, Theorem A.42] guarantees the existence of a unique maximal integral curve starting at $x_0$ for (A.1). Let $z( \cdot ):I \rightarrow {\mathcal {E}}$ denote this maximal integral curve and $T>0$ be the supremum of the interval I on which $z(\cdot )$ is defined. We rely on the Escape Lemma [28, Lemma A.43] to show that z(t) is defined for all times $t\ge 0$. For $t< T$, we write $\ell = \varphi \circ z$ and find

$$\begin{aligned} \ell '(t)&= \textrm{D}\varphi (z(t))\left[ \frac{\textrm{d}}{\textrm{d}t}z(t)\right] = \left\langle {\nabla \varphi (z(t))},{\frac{\textrm{d}}{\textrm{d}t}z(t)}\right\rangle \end{aligned}$$

(A.2)

$$\begin{aligned}&= - \left\| \nabla \varphi (z(t))\right\| ^2 \end{aligned}$$

(A.3)

$$\begin{aligned}&= - \left\| \textrm{D}h(z(t))^*[h(z(t))]\right\| ^2 \le 0. \end{aligned}$$

(A.4)

This implies that $z(t) \in {\mathcal {C}}$ for all $0\le t < T$. We show that the trajectory z(t) has finite length. To that end, we note that

$$\begin{aligned} \dfrac{1}{2}\left\| \nabla \varphi (x)\right\| ^2 = \dfrac{1}{2} \left\| \textrm{D}h(x)^* [h(x)] \right\| ^2 \ge {\underline{\sigma }}^2 \dfrac{1}{2} \left\| h(x)\right\| ^2 = {\underline{\sigma }}^2 \varphi (x), \end{aligned}$$

(A.5)

for all $x\in {\mathcal {C}}$. The length of the trajectory from time $t=0$ to $t=T$ is bounded as follows, using a classical argument [30]:

$$\begin{aligned} \int _0^T \left\| \frac{\textrm{d}}{\textrm{d}t}z(t)\right\| \textrm{d} t&= \int _0^T \left\| - \nabla \varphi (z(t))\right\| \textrm{d} t \nonumber \\&= \int _0^T \dfrac{ \left\| \nabla \varphi (z(t))\right\| ^2}{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t \nonumber \\&= \int _0^T \dfrac{ \left\langle {- \nabla \varphi (z(t))},{\frac{\textrm{d}}{\textrm{d}t}z(t)}\right\rangle }{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t\nonumber \\&= \int _0^T \dfrac{ - (\varphi \circ z)'(t)}{ \left\| \nabla \varphi (z(t))\right\| } \textrm{d} t\nonumber \\&\le \int _0^T \dfrac{ - (\varphi \circ z)'(t)}{{\underline{\sigma }}\sqrt{2(\varphi \circ z)(t)}} \textrm{d} t\nonumber \\&= \dfrac{-\sqrt{2}}{{\underline{\sigma }}} \left[ \sqrt{\varphi (z(T))} - \sqrt{\varphi (z(0))}\right] \nonumber \\&\le \dfrac{\sqrt{2 \varphi (z(0))}}{{\underline{\sigma }}}. \end{aligned}$$

(A.6)

The length is bounded independently of T and therefore the flow has finite length. The Escape Lemma states that for a maximum integral curve $z(\cdot ) :I \rightarrow {\mathcal {E}}$, if I has a finite upper bound, then the curve $z(\cdot )$ must be unbounded. Since $z(\cdot )$ is contained in a compact set by (A.6), the converse ensures that the interval I does not have a finite upper bound and therefore, $I={\mathbb {R}}_+$. Since the trajectory z(t) is bounded for $t\ge 0$, it must have an accumulation point ${\bar{z}}$. From A1, we have $\sigma _\textrm{min}(\textrm{D}h(z(t)) \ge {\underline{\sigma }}>0$ for all $t \ge 0$. This gives the bound $\ell '(t) \le - {\underline{\sigma }}^2 \left\| h(z(t))\right\| ^2 = -2{\underline{\sigma }}^2 \ell (t)$. Gronwall’s inequality then yields

$$\begin{aligned} \ell (t) \le \varphi (x_0) e^{-2{\underline{\sigma }}^2 t}. \end{aligned}$$

(A.7)

Therefore $\ell (t) \rightarrow 0 $ as $t \rightarrow \infty $, which implies $h(z(t))\rightarrow 0 $ as $t\rightarrow \infty $. We conclude that the accumulation point satisfies $h({\bar{z}}) = 0$. Since ${\mathcal {C}}$ is closed, the point ${{\bar{z}}}$ is in ${\mathcal {C}}$. Therefore, ${{\bar{z}}}$ is both in ${\mathcal {M}}$ and in the connected component of ${\mathcal {C}}$ that contains $z(0) = x_0$. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Goyens, F., Eftekhari, A. & Boumal, N. Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02421-6

Download citation

Received: 02 September 2023
Accepted: 08 March 2024
Published: 11 April 2024
DOI: https://doi.org/10.1007/s10957-024-02421-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian

Abstract

Access this article

Similar content being viewed by others

High-order methods beyond the classical complexity bounds: inexact high-order proximal-point methods

The Equivalence of Three Types of Error Bounds for Weakly and Approximately Convex Functions

Smooth exact penalty functions II: a reduction to standard exact penalty functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Proposition 2.8

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computing Second-Order Points Under Equality Constraints: Revisiting Fletcher’s Augmented Lagrangian

Abstract

Access this article

Similar content being viewed by others

High-order methods beyond the classical complexity bounds: inexact high-order proximal-point methods

The Equivalence of Three Types of Error Bounds for Weakly and Approximately Convex Functions

Smooth exact penalty functions II: a reduction to standard exact penalty functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Proposition 2.8

Proof of Proposition 2.8

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation