Abstract
We analyze worst-case complexity of a Proximal augmented Lagrangian (Proximal AL) framework for nonconvex optimization with nonlinear equality constraints. When an approximate first-order (second-order) optimal point is obtained in the subproblem, an \(\epsilon \) first-order (second-order) optimal point for the original problem can be guaranteed within \({\mathcal {O}}(1/ \epsilon ^{2 - \eta })\) outer iterations (where \(\eta \) is a user-defined parameter with \(\eta \in [0,2]\) for the first-order result and \(\eta \in [1,2]\) for the second-order result) when the proximal term coefficient \(\beta \) and penalty parameter \(\rho \) satisfy \(\beta = {\mathcal {O}}(\epsilon ^\eta )\) and \(\rho = \varOmega (1/\epsilon ^\eta )\), respectively. We also investigate the total iteration complexity and operation complexity when a Newton-conjugate-gradient algorithm is used to solve the subproblems. Finally, we discuss an adaptive scheme for determining a value of the parameter \(\rho \) that satisfies the requirements of the analysis.
This is a preview of subscription content, access via your institution.
Notes
Circumstances under which the penalty parameter sequence of ALGENCAN is bounded are discussed in [1, Section 5].
References
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2008). https://doi.org/10.1137/060654797
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: Second-order negative-curvature methods for box-constrained and general constrained optimization. Comput. Optim. Appl. 45(2), 209–236 (2010). https://doi.org/10.1007/s10589-009-9240-y
Andreani, R., Fazzio, N., Schuverdt, M., Secchin, L.: A sequential optimality condition related to the quasi-normality constraint qualification and its algorithmic consequences. SIAM J. Optim. 29(1), 743–766 (2019). https://doi.org/10.1137/17M1147330
Andreani, R., Haeser, G., Ramos, A., Silva, P.J.S.: A second-order sequential optimality condition associated to the convergence of optimization algorithms. IMA J. Numer. Anal. 37(4), 1902–1929 (2017)
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.S.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016). https://doi.org/10.1137/15M1008488
Andreani, R., Secchin, L., Silva, P.: Convergence properties of a second order augmented Lagrangian method for mathematical programs with complementarity constraints. SIAM J. Optim. 28(3), 2574–2600 (2018). https://doi.org/10.1137/17M1125698
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, Cambridge (2014)
Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 149(1), 301–327 (2015). https://doi.org/10.1007/s10107-014-0753-5
Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125(1), 139–162 (2010). https://doi.org/10.1007/s10107-009-0264-y
Birgin, E.G., Gardenghi, J., Martínez, J.M., Santos, S.A., Toint, P.L.: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Program. 163(1–2), 359–368 (2017)
Birgin, E.G., Haeser, G., Ramos, A.: Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. 69(1), 51–75 (2018). https://doi.org/10.1007/s10589-017-9937-2
Birgin, E.G., Martínez, J.M.: Complexity and performance of an augmented Lagrangian algorithm. Optim. Methods Softw. (2020). https://doi.org/10.1080/10556788.2020.1746962
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). https://doi.org/10.1561/2200000016
Cartis, C., Gould, N., Toint, P.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
Cartis, C., Gould, N., Toint, P.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012)
Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of cubic regularization methods for potentially rank-deficient nonlinear least-squares problems and its relevance to constrained nonlinear optimization. SIAM J. Optim. 23(3), 1553–1574 (2013). https://doi.org/10.1137/120869687
Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Program. Ser. A 144, 93–106 (2014)
Cartis, C., Gould, N.I.M., Toint, P.L.: Optimization of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization. J. Complex. 53, 68–94 (2019)
Curtis, F.E., Jiang, H., Robinson, D.P.: An adaptive augmented Lagrangian method for large-scale constrained optimization. Math. Program. 152(1), 201–245 (2015). https://doi.org/10.1007/s10107-014-0784-y
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1), 59–99 (2016). https://doi.org/10.1007/s10107-015-0871-8
Grapiglia, G.N., Nesterov, Y.: Regularized Newton methods for minimizing functions with Hölder continuous Hessians. SIAM J. Optim. 27(1), 478–506 (2017). https://doi.org/10.1137/16M1087801
Grapiglia, G.N., Yuan, Y.X.: On the complexity of an augmented Lagrangian method for nonconvex optimization. arXiv e-prints arXiv:1906.05622 (2019)
Haeser, G., Liu, H., Ye, Y.: Optimality condition and complexity analysis for linearly-constrained optimization without differentiability on the boundary. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1290-4
Hajinezhad, D., Hong, M.: Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01365-4
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969). https://doi.org/10.1007/BF00927673
Hong, M., Hajinezhad, D., Zhao, M.M.: Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: D. Precup, Y.W. Teh (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 1529–1538. PMLR (2017). http://proceedings.mlr.press/v70/hong17a.html
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019). https://doi.org/10.1007/s10589-018-0034-y
Liu, K., Li, Q., Wang, H., Tang, G.: Spherical principal component analysis. In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 387–395 (2019). https://doi.org/10.1137/1.9781611975673.44
Nouiehed, M., Lee, J.D., Razaviyayn, M.: Convergence to second-order stationarity for constrained non-convex optimization. arXiv e-prints arXiv:1810.02024 (2018)
O’Neill, M., Wright, S.J.: A log-barrier Newton-CG method for bound constrained optimization with complexity guarantees. IMA J. Numer. Anal. (2020). https://doi.org/10.1093/imanum/drz074
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Optimization (Sympos., Univ. Keele, Keele, 1968), pp. 283–298. Academic Press, London (1969)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976). https://doi.org/10.1287/moor.1.2.97
Royer, C.W., O’Neill, M., Wright, S.J.: A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math. Program. (2019)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere. In: 2015 International Conference on Sampling Theory and Applications (SampTA), pp. 407–410 (2015)
Zhang, J., Luo, Z.Q.: A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM J. Optim. 30(3), 2272–2302 (2020). https://doi.org/10.1137/19M1242276
Acknowledgements
Research supported by Award N660011824020 from the DARPA Lagrange Program, NSF Awards 1628384, 1634597, and 1740707; and Subcontract 8F-30039 from Argonne National Laboratory.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs of Elementary Results
Appendix: Proofs of Elementary Results
Proof of Theorem 1
Since \(x^*\) is a local minimizer of (1), it is the unique global solution of
for \(\delta > 0\) sufficiently small. For the same \(\delta \), we define \(x_k\) to be the global solution of
for a given \(\rho _k\), where \(\{ \rho _k \}_{k \ge 1}\) is a positive sequence such that \(\rho _k \rightarrow +\infty \). Note that \(x_k\) is well defined because the feasible region is compact and the objective is continuous. Suppose that z is any accumulation point of \(\{ x_k \}_{k \ge 1}\), that is, \(x_k \rightarrow z\) for \(k \in {\mathcal {K}}\), for some subsequence \({\mathcal {K}}\). Such a z exists because \(\{ x_k \}_{k \ge 1}\) lies in a compact set, and moreover, \(\Vert z - x^* \Vert \le \delta \). We want to show that \(z = x^*\). By the definition of \(x_k\), we have for any \(k \ge 1\) that
By taking the limit over \({\mathcal {K}}\), we have \(f(x^*) \ge f(z) + \frac{1}{4} \Vert z - x^* \Vert ^4\). From (70), we have
By taking limits over \({\mathcal {K}}\), we have that \(c(z) = 0\). Therefore, z is the global solution of (68), so that \(z = x^*\).
Without loss of generality, suppose that \(x_k \rightarrow x^*\) and \(\Vert x_k - x^* \Vert < \delta \). By first and second-order optimality conditions for (69), we have
Define \(\lambda _k \triangleq \rho _k c(x_k)\) and \(\epsilon _k \triangleq \max \{ \Vert x_k - x^* \Vert ^3, 3 \Vert x_k - x^* \Vert ^2, \sqrt{ 2( f(x^*) - \inf _{k \ge 1} f(x_k) )/\rho _k} \}\). Then by (71), (72), (73) and Definition 2, \(x_k\) is \(\epsilon _k\)-2o. Note that \(x_k \rightarrow x^*\) and \(\rho _k \rightarrow +\infty \), so \(\epsilon _k \rightarrow 0^+\). \(\square \)
Proof of Lemma 1
We prove by contradiction. Otherwise for any \(\alpha \) we could select sequence \(\{ x_k \}_{k \ge 1} \subseteq S_{\alpha }^0\) such that \(f(x_k) + \frac{\rho _0}{2} \Vert c(x_k) \Vert ^2 < - k\). Let \(x^*\) be an accumulation point of \(\{ x_k \}_{k \ge 1}\) (which exists by compactness of \(S_\alpha ^0\)). Then there exists index K such that \(f(x^*) + \frac{\rho _0}{2} \Vert c(x^*) \Vert ^2 \ge -K + 1 > f(x_k) + \frac{\rho _0}{2} \Vert c(x_k) \Vert ^2 + 1\) for all \(k \ge K\), which contradicts the continuity of \(f(x) + \frac{\rho _0}{2} \Vert c(x) \Vert ^2\). \(\square \)
Rights and permissions
About this article
Cite this article
Xie, Y., Wright, S.J. Complexity of Proximal Augmented Lagrangian for Nonconvex Optimization with Nonlinear Equality Constraints. J Sci Comput 86, 38 (2021). https://doi.org/10.1007/s10915-021-01409-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-021-01409-y
Keywords
- Optimization with nonlinear equality constraints
- Nonconvex optimization
- Proximal augmented Lagrangian
- Complexity analysis
- Newton-conjugate-gradient
Mathematics Subject Classification
- 68Q25
- 90C06
- 90C26
- 90C30
- 90C60