Abstract
This work presents an adaptive superfast proximal augmented Lagrangian (AS-PAL) method for solving linearly-constrained smooth nonconvex composite optimization problems. Each iteration of AS-PAL inexactly solves a possibly nonconvex proximal augmented Lagrangian (AL) subproblem obtained by an aggressive/adaptive choice of prox stepsize with the aim of substantially improving its computational performance followed by a full Lagrange multiplier update. A major advantage of AS-PAL compared to other AL methods is that it requires no knowledge of parameters (e.g., size of constraint matrix, objective function curvatures, etc) associated with the optimization problem, due to its adaptive nature not only in choosing the prox stepsize but also in using a crucial adaptive accelerated composite gradient variant to solve the proximal AL subproblems. The speed and efficiency of AS-PAL is demonstrated through extensive computational experiments showing that it can solve many instances more than ten times faster than other state-of-the-art penalty and AL methods, particularly when high accuracy is required.
Similar content being viewed by others
Data and code availability
The code and data used for the experiments in this paper are publicly available in the AS-PAL GitHub repository (See https://github.com/asujanani6/AS-PAL).
Notes
A resolvent evaluation of h is an evaluation of \((I+\gamma \partial h)^{-1}(\cdot )\) for some \(\gamma >0\).
See the MovieLens 100K dataset containing 610 users and 9724 movies which can be found in https://grouplens.org/datasets/movielens/.
References
Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)
Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)
Florea, M.I., Vorobyov, S.A.: An accelerated composite gradient method for large-scale composite objective problems. IEEE Trans. Signal Process. 67(2), 444–459 (2018)
Goncalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems. Pac. J. Optim. 15(3), 379–398 (2019)
Gu, Q., Wang, Z., Liu, H.: Sparse PCA with oracle property. In: Advances in Neural Information Processing Systems 27, pp. 1529–1537. Curran Associates, Inc. (2014)
Hajinezhad, D., Hong, M.: Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019)
He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25(4), 2182–2211 (2015)
He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016)
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(3), 115–157 (2019)
Kong, W.: Accelerated inexact first-order methods for solving nonconvex composite optimization problems (2021). arXiv:2104.09685
Kong, W.: Complexity-optimal and curvature-free first-order methods for finding stationary points of composite optimization problems (2022). arXiv:2205.13055
Kong, W., Melo, J.G., Monteiro, R.D.C.: FISTA and Extensions—Review and New Insights. Optimization Online (2021)
Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)
Kong, W., Melo, J.G., Monteiro, R.D.C.: An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems. Comput. Optim. Appl. 76(2), 305–346 (2019)
Kong, W., Melo, J.G., Monteiro, R.D.C.: Iteration-complexity of a proximal augmented Lagrangian method for solving nonconvex composite optimization problems with nonlinear convex constraints. Math. Oper. Res. (2023)
Kong, W., Melo, J.G., Monteiro, R.D.C.: Iteration complexity of an inner accelerated inexact proximal augmented Lagrangian method based on the classical lagrangian function. SIAM J. Optim. 33(1), 181–210 (2023)
Kong, W., Monteiro, R.D.C.: An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM J. Optim. 31(4), 2558–2585 (2021)
Kong, W., Monteiro, R.D.C.: An accelerated inexact dampened augmented Lagrangian method for linearly-constrained nonconvex composite optimization problems. Comput. Optim, Appl (2023)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138(1), 115–139 (2013)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2016)
Li, Z., Chen, P.-Y., Liu, S., Lu, S., Xu, Y.: Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization (2020). arXiv:2007.01284
Li, Z., Xu, Y.: Augmented Lagrangian based first-order methods for convex and nonconvex programs: nonergodic convergence and iteration complexity (2020). arXiv e-prints, pages arXiv–2003
Lin, Q., Ma, R., Xu, Y.: Inexact proximal-point penalty methods for non-convex optimization with non-convex constraints (2019). arXiv:1908.11518
Lin, Q., Ma, R., Xu, Y.: Inexact proximal-point penalty methods for constrained non-convex optimization (2020). arXiv:1908.11518
Liu, Y.F., Liu, X., Ma, S.: On the nonergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. Math. Oper. Res. 44(2), 632–650 (2019)
Lu, Z., Zhou, Z.: Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming (2018). arXiv:1803.09941
Melo, J.G., Monteiro, R.D.C., Wang, H.: Iteration-complexity of an inexact proximal accelerated augmented Lagrangian method for solving linearly constrained smooth nonconvex composite optimization problems (2020). arXiv:2006.08048
Monteiro, R.D.C., Ortiz, C., Svaiter, B.F.: An adaptive accelerated first-order method for convex optimization. Comput. Optim. Appl. 64, 31–73 (2016)
Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Methods Softw. 1–31 (2017)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publisher, Amsterdam (2004)
Nesterov, Y.E.: Gradient methods for minimizing composite functions. Math. Program. 1–37 (2012)
Patrascu, A., Necoara, I., Tran-Dinh, Q.: Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization. Optim. Lett. 11(3), 609–626 (2017)
Sahin, M., Eftekhari, A., Alacaoglu, A., Latorre, F., Cevher, V.: An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints (2019). arXiv:1906.11357
Sun, K., Sun, A.: Dual Descent ALM and ADMM (2022). arXiv:2109.13214
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program. (2019)
Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Mach. Learn. Res. 18, 179–1 (2017)
Zeng, J., Yin, W., Zhou, D.: Moreau Envelope Augmented Lagrangian method for Nonconvex Optimization with Linear Constraints. J. Sci. Comput. 91(61) (2022)
Zhang, J., Luo, Z.-Q.: A global dual error bound and its application to the analysis of linearly constrained nonconvex optimization (2020). arXiv:2006.16440
Zhang, J., Luo, Z.-Q.: A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM J. Optim. 30(3), 2272–2302 (2020)
Zhang, J., Pu, W., Luo, Z.: On the Iteration Complexity of Smoothed Proximal ALM for Nonconvex Optimization Problem with Convex Constraints (2022). arXiv:2207.06304
Acknowledgements
The authors were partially supported by AFORS Grant FA9550-22-1-0088.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
ADAP-FISTA Algorithm
1.1 ADAP-FISTA Method
This subsection presents an adaptive ACG variant, called ADAP-FISTA, which is an important tool in the development of the AS-PAL method. We first introduce the assumptions on the problem it solves. ADAP-FISTA considers the following problem
where \(\psi _s\) and \(\psi _n\) are assumed to satisfy the following assumptions:
- (I):
-
\(\psi _n:\Re ^n \rightarrow \Re \cup \{+\infty \}\) is a possibly nonsmooth convex function;
- (II):
-
\(\psi _s: \Re ^n\rightarrow \Re \) is a differentiable function and there exists \({{\bar{L}}} \ge 0\) such that
$$\begin{aligned} \Vert \nabla \psi _s(z') - \nabla \psi _s(z)\Vert \le {{\bar{L}}} \Vert z'-z\Vert \quad \forall z,z' \in \Re ^n. \end{aligned}$$(A.2)
We now describe the type of approximate solution that ADAP-FISTA aims to find.
Problem A: Given \(\psi \) satisfying the above assumptions, a point \(x_0 \in \textrm{dom}\,\psi _n\), a parameter \(\sigma \in (0,\infty )\), the problem is to find a pair \((y,u) \in \textrm{dom}\,\psi _n \times \Re ^n\) such that
We are now ready to present the ADAP-FISTA algorithm below.
We now make some remarks about ADAP-FISTA. First, usual FISTA methods for solving the strongly convex version of (A.1) consist of repeatedly invoking only steps 2 and 3 of ADAP-FISTA either with a static Lipschitz constant (of the gradient), namely, \(L_{j+1}=L\) for all \(j \ge 0\) for some \(L\ge {{\bar{L}}}\), or by adaptively searching for a suitable Lipschitz \(L_{j+1}\) (as in step 2 of ADAP-FISTA) satisfying a condition similar to (A.6). Second, the pair \((y_{j+1},u_{j+1})\) always satisfies the inclusion in (A.3) (see Lemma A.3 below) so if ADAP-FISTA stops successfully in step 5, or equivalently (A.12) holds, the pair solves Problem A above. Finally, if condition (A.10) in step 4 is never violated, ADAP-FISTA must stop successfully in step 5 (see Proposition A.1 below).
We now discuss how ADAP-FISTA compares with existing ACG variants for solving (A.1) under the assumption that \(\psi _s\) is \(\mu \)-strongly convex. Under this assumption, FISTA variants have been studied, for example, in [3, 11, 12, 28, 30], while other ACG variants have been studied, for example, in [7, 8, 31]. A crucial difference between ADAP-FISTA and these variants is that: (i) ADAP-FISTA stops based on a different relative criterion, namely, (A.12) (see Problem A above) and attempts to approximately solve (A.1) in this sense even when \(\psi _s\) is not \(\mu \)-strongly convex, and (ii) ADAP-FISTA provides a key and easy to check inequality whose validity at every iteration guarantees its successful termination. On the other hand, ADAP-FISTA shares similar features with these other methods in that: (i) it has a reasonable iteration complexity guarantee regardless of whether it succeeds or fails, and (ii) it successfully terminates when \(\psi _s\) is \(\mu \)-strongly convex (see Propositions A.1–A.2 below). Moreover, like the method in [3], ADAP-FISTA adaptively searches for a suitable Lipschitz estimate \(L_{j+1}\) that is used in (A.5).
We now present the main convergence results of ADAP-FISTA, which is invoked by AS-PAL for solving the sequence of subproblems (1.4). The first result, namely Proposition A.1 below, gives an iteration complexity bound regardless if ADAP-FISTA terminates with success or failure and shows that if ADAP-FISTA successfully stops, then it obtains a stationary solution of (A.1) with respect to a relative error criterion. The second result, namely Proposition A.2 below, shows that ADAP-FISTA always stops successfully whenever \(\psi _s\) is \(\mu \)-strongly convex.
Proposition A.1
The following statements about ADAP-FISTA hold:
-
(a)
if \(L_0 = {{{\mathcal {O}}}}({{\bar{L}}})\), it always stops (with either success or failure) in at most
$$\begin{aligned} {{\mathcal {O}}}_1\left( \sqrt{\frac{{{\bar{L}}}}{\mu }}\log ^+_1 ({{\bar{L}}}) \right) \end{aligned}$$iterations/resolvent evaluations;
-
(b)
if it stops successfully, it terminates with a triple \((y,u,L) \in \textrm{dom}\,\psi _{n} \times \Re ^{n}\) satisfying
$$\begin{aligned}&u \in \nabla \psi _s(y)+\partial \psi _n(y), \quad \Vert u\Vert \le \sigma \Vert y-x_0\Vert , \quad L \le \max \{L_0, \omega {{\bar{L}}}\} . \end{aligned}$$(A.13)
Proposition A.2
If \(\psi _s\) is \(\mu \)-convex, then ADAP-FISTA always terminates with success and its output (y, u, L), in addition to satisfying (A.13) also satisfies the inclusion \(u \in \partial (\psi _s+\psi _n)(y)\).
The rest of this section is broken up into two subsections which are dedicated to proving Propositions A.1 and A.2, respectively.
1.2 Proof of Proposition A.1
This subsection is dedicated to proving Proposition A.1. The first lemma below presents key definitions and inequalities used in the convergence analysis of ADAP-FISTA.
Lemma A.3
Define
Then, the following statements hold:
-
(a)
\(\{L_j\}\) is nondecreasing;
-
(b)
for every \(j \ge 0\), we have
$$\begin{aligned}&\tau _j = 1+A_j\mu , \quad \frac{\tau _j A_{j+1}}{ a_j^2}=L_{j+1}-\mu ; \end{aligned}$$(A.15)$$\begin{aligned}&L_0\le L_j\le \max \{L_0,\omega {{\bar{L}}}\}; \end{aligned}$$(A.16)$$\begin{aligned}&u_{j+1} \in \nabla \psi _s(y_{j+1}) + \partial \psi _n(y_{j+1}), \quad \Vert u_{j+1}\Vert \le \zeta \Vert y_{j+1}-{\tilde{x}}_{j}\Vert . \end{aligned}$$(A.17)
Proof
(a) It is clear from the update rule in the beginning of Step 1 that \(\{L_j\}\) is nondecreasing.
(b) The first equality in (A.15) follows directly from both of the relations in (A.7). The second equality in (A.15) follows immediately from the definition of \(a_j\) in (A.4) and the first relation in (A.7).
We prove (A.16) by induction. It clearly holds for \(j=0\). Suppose now (A.16) holds for \(j \ge 0\) and let us show that it holds for \(j+1\). Note that if \(L_{j+1}=L_j\), then relation (A.16) immediately holds. Assume then that \(L_{j+1}>L_j\). It then follows from the way \(L_{j+1}\) is chosen in step 1 that (A.6) is not satisfied with \(L_{j+1}/\beta \). This fact together with the inequality (A.2) at the points \((y_{j+1},{\tilde{x}}_j)\) imply that
The relation in (A.16) then immediately follows from the definition of \(\omega \) in (A.14).
Now, by the definition of \(u_{j+1}\) in (A.11), triangle inequality, (A.2), the bound (A.16) on \(L_{j+1}\), and the definition of \(\zeta \) we have
which immediately implies the inequality in (A.17). It follows from (A.5) and its associated optimality condition that \(0 \in \nabla \psi _s({\widetilde{x}}_{j}) + \partial \psi _n(y_{j+1})-L_{j+1}({\tilde{x}}_j-y_{j+1})\), which in view of the definition of \(u_{j+1}\) in (A.11) implies the inclusion in (A.17). \(\square \)
The result below gives some estimates on the sequence \(\{A_j\}\), which will be important for the convergence analysis of the method.
Lemma A.4
Define
where \(\omega \) is as in (A.14). Then, for every \(j \ge 1\), we have
Proof
Let integer \(j \ge 1\) be given. Define \(\xi _{j}=1/(L_{j}-\mu )\). Using the first equality in (A.7) and the definition of \(a_j\) in (A.4), we have that for every \(i \le j\),
Passing the above inequality to its square root and using Lemma A.3(a) and the fact that (A.15) implies that \(\tau _{i-1} \ge \max \{1,\mu A_{i-1}\}\), we then conclude that for every \(i \le j\),
where the last inequality in (A.22) follows from the definition of \(\xi _j\), the relation in (A.16), and the definition of Q in (A.19). Adding the inequality in (A.21) from \(i=1\) to \(i=j\) and using the fact that \(A_0=0\), we conclude that \( \sqrt{A_j} \ge j \sqrt{\xi _j} /2\) and hence that the first bound in (A.20) holds in view of the fact that \(\xi _j \ge 1/L_j\). Now, multiplying the inequality in (A.22) from \(i=2\) to \(i=j\) and using Lemma A.3(a) and the fact that \(A_1= \xi _1\), we conclude that \( \sqrt{A_j} \ge \sqrt{\xi _1} (1+Q^{-1})^{j-1} \ge \sqrt{\xi _j} (1+Q^{-1})^{j-1}\), and hence that the second bound in (A.20) holds in view of the fact that \(\xi _j \ge 1/L_j\). \(\square \)
Proposition A.5
Let \(\zeta \) and Q be as in (A.14) and (A.19), respectively. ADAP-FISTA always stops (with either success or failure) and does so by performing at most
iterations/resolvent evaluations.
Proof
Let l denote the first quantity in (A.23). Using this definition and the inequality \(\log (1+ \alpha ) \ge \alpha /(1+\alpha )\) for any \(\alpha >-1\), it is easy to verify that
We claim that ADAP-FISTA terminates with success or failure in at most l iterations. Indeed, it suffices to show that if ADAP-FISTA has not stopped with failure up to (and including) the l-th iteration, then it must stop successfully at the l-th iteration. So, assume that ADAP-FISTA has not stopped with failure up to the l-th iteration. In view of step 4 of ADAP-FISTA, it follows that (A.10) holds with \(j=l-1\).
This observation together with the inequality in (A.17) with \(j=l-1\), (A.20) with \(j=l\), and (A.24), then imply that
and hence that (A.12) is satisfied. In view of Step 5 of ADAP-FISTA, the method must successfully stop at the end of the l-th iteration. We have thus shown that the above claim holds. Moreover, in view of (A.16), it follows that the second term in (A.23) is a bound on the total number of times \(L_j\) is multiplied by \(\beta \) and step 2 is repeated. Since exactly one resolvent evaluation occurs every time step 2 is executed, the desired conclusion follows. \(\square \)
We are now ready to give the proof of Proposition A.1.
Proof of Proposition A.1
-
(a)
The result immediately follows from Proposition A.5 and the assumption that \(L_0 = {{{\mathcal {O}}}}({{\bar{L}}})\).
-
(b)
This is immediate from the termination criterion (A.12) in step 5 of ADAP-FISTA, the inclusion in (A.17), and relation (A.16).
\(\square \)
1.3 Proof of Proposition A.2
This subsection is dedicated to proving Proposition A.2. Thus, for the remainder of this subsection, assume that \(\psi _s\) is \(\mu \)-strongly convex. The first lemma below presents important properties of the iterates generated by ADAP-FISTA.
Lemma A.6
For every \( j\ge 0 \) and \(x \in \Re ^n\), define
where \(\psi :=\psi _s+\psi _n\) and \(s_{j+1}\) are as in (A.1) and (A.8), respectively. Then, for every \(j \ge 0\), we have:
Proof
Since \(\nabla \gamma _j(y_{j+1})=s_{j+1}\), it follows from (A.8) that \(y_{j+1}\) satisfies the optimality condition for (A.27), and thus the relation in (A.27) follows. Furthermore, we have that:
and thus (A.28) follows. \(\square \)
Before stating the next lemma, recall that if a closed function \(\varPsi :\Re ^n\rightarrow \Re \cup \{+\infty \}\) is \(\nu \)-convex with modulus \(\nu >0\), then it has an unique global minimum \(z^*\) and
Lemma A.7
For every \( j\ge 0 \) and \(x \in \Re ^n\), we have
Proof
Using (A.28), the second identity in (A.7), and the fact that \(\varPsi _j:=a_j\gamma _j(\cdot )+\tau _j \Vert \cdot -x_j\Vert ^2/2\) is \((\tau _j+\mu a_j)\)-convex, it follows from (A.29) with \(\varPsi =\varPsi _j\) and \(\nu =\tau _{j+1}\) that
Using the convexity of \( \gamma _j \), the definitions of \(A_{j+1}\) and \( {\widetilde{x}}_j \) in (A.7) and (A.4), respectively, and the second equality in (A.15), we have
The conclusion of the lemma now follows by combining the above two relations. \(\square \)
Lemma A.8
For every \(j \ge 0\), we have \(\gamma _j \le \psi \).
Proof
Define:
It follows immediately from the fact that \(\psi _s\) is \(\mu \)-convex that \({{\tilde{\gamma }}}_j \le \psi \). Furthermore, immediately from the definition of \(y_{j+1}\) in (A.5), we can write:
Now, clearly from (A.32) and the definition of \(s_{j+1}\) in (A.8), we see that \(s_{j+1} \in \partial {{\tilde{\gamma }}}_j(y_{j+1})\). Furthermore, since \({{\tilde{\gamma }}}_j\) is \(\mu \)-convex, it follows from the subgradient rule for the sum of convex functions that the above inclusion is equivalent to \(s_{j+1} \in \partial \left( {{\tilde{\gamma }}}_j(\cdot )-\frac{\mu }{2}\Vert \cdot -y_{j+1}\Vert ^2\right) (y_{j+1}).\) Hence, the subgradient inequality and the fact that \({{\tilde{\gamma }}}_j(x) \le \psi (x)\) imply that for all \(x\in \Re ^{n}\):
and thus the statement of the lemma follows. \(\square \)
Lemma A.9
For every \(j \ge 0\) and \(x \in \textrm{dom}\,\psi _n\), we have
where
Proof
Subtracting \(A_{j+1} \psi (x)\) from both sides of the inequality in (A.30) and using Lemma A.8 we have
The result now follows from the first equality in (A.7) and the definition of \(\eta _j(x)\). \(\square \)
We now state a result that will be important for deriving complexity bounds for ADAP-FISTA.
Lemma A.10
For every \(j \ge 0\) and \(x \in \textrm{dom}\,\psi _n\), we have
Proof
Summing the inequality of Lemma A.9 from \(j=0\) to \(j=j-1\), using the facts that \(A_0=0\) and \(\tau _0=1\), and using the definition of \(\eta _j(\cdot )\) in Lemma A.9 gives us the inequality of the lemma. \(\square \)
We are now ready to give the proof of Proposition A.2.
Proof of Proposition A.2
Since \(\psi _s\) is \(\mu \)-convex, Lemma A.10 holds. Thus, using (A.33) with \(x=y_j\), it follows that for all \(j\ge 0\):
Hence, for all \(j\ge 0\), relation (A.10) in step 4 of ADAP-FISTA is always satisfied and thus ADAP-FISTA never fails. In view of this observation and Proposition A.1, it follows that if \(\psi _s\) is \(\mu \)-convex then ADAP-FISTA always terminates successfully with a (y, u, L) satisfying relation (A.13) in a finite number of iterations. The inclusion \(u\in (\psi _s+\psi _n)(y)\) then follows immediately from the inclusion in (A.13) and the subgradient rule for the sum of convex functions. \(\square \)
Technical Results for Proof of Lagrange Multipliers
The following basic result is used in Lemma B.3. Its proof can be found, for instance, in [4, Lemma A.4]. Recall that \(\nu ^+_A\) denotes the smallest positive singular value of a nonzero linear operator A.
Lemma B.1
Let \(A:\Re ^n \rightarrow \Re ^l\) be a nonzero linear operator. Then,
The following technical result, whose proof can be found in Lemma 3.10 of [16], plays an important role in the proof of Lemma B.3 below.
Lemma B.2
Let h be a function as in (C1). Then, for every \(\delta \ge 0\), \(z\in {{\mathcal {H}}}\), and \(\xi \in \partial _{\delta } h(z)\), we have
where \(\partial {{{\mathcal {H}}}}\) denotes the boundary of \({{{\mathcal {H}}}}\).
Lemma B.3
Assume that h is a function as in condition (C1) and \(A:\Re ^n \rightarrow \Re ^l\) is a linear operator satisfying condition (C2). Assume also that the triple \((z,q,r) \in \Re ^{n} \times A(\Re ^{n}) \times \Re ^{n}\) satisfy \(r \in \partial h(z)+A^{*}q\). Then:
-
(a)
there holds
$$\begin{aligned} {\bar{d}}\nu _A^{+}\Vert q\Vert \le 2 D_h\left( M_h + \Vert r\Vert \right) - \langle q,Az-b\rangle ; \end{aligned}$$(B.2) -
(b)
if, in addition,
$$\begin{aligned} q=q^-+\chi (Az-b) \end{aligned}$$(B.3)for some \(q^-\in \Re ^l\) and \(\chi >0\), then we have
$$\begin{aligned} \Vert q\Vert \le \max \left\{ \Vert q^-\Vert ,\frac{2D_h(M_h+\Vert r\Vert )}{{{\bar{d}}} \nu ^{+}_A} \right\} . \end{aligned}$$(B.4)
Proof
(a) The assumption on (z, q, r) implies that \(r-A^{*}q \in \partial h(z)\). Hence, using the Cauchy-Schwarz inequality, the definitions of \({{\bar{d}}}\) and \({{\bar{z}}}\) in (2.15) and (C2), respectively, and Lemma B.2 with \(\xi =r-A^{*}q\), \(u={{\bar{z}}}\), and \(\delta =0\), we have:
Now, using the above inequality, the triangle inequality, the definition of \(D_h\) in (C1), and the facts that \({{\bar{d}}} \le D_h\) and \(\Vert z-{{\bar{z}}}\Vert \le D_h\), we conclude that:
Noting the assumption that \(q \in A(\Re ^n)\), inequality (B.2) now follows from the above inequality and Lemma B.1.
(b) Relation (B.3) implies that \(\langle q,Az-b\rangle =\Vert q\Vert ^2/\chi -\langle q^-,q\rangle /\chi \), and hence that
where the last inequality is due to the Cauchy-Schwarz inequality. Now, letting K denote the right hand side of (B.4) and using (B.7), we conclude that
and hence that (B.4) holds. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sujanani, A., Monteiro, R.D.C. An Adaptive Superfast Inexact Proximal Augmented Lagrangian Method for Smooth Nonconvex Composite Optimization Problems. J Sci Comput 97, 34 (2023). https://doi.org/10.1007/s10915-023-02350-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-023-02350-y
Keywords
- First-order accelerated method
- Augmented Lagrangian method
- Smooth weakly convex function
- Linearly-constrained nonconvex composite optimization
- Iteration complexity
- Adaptive method