Self-adaptive ADMM for semi-strongly convex problems

Tang, Tianyun; Toh, Kim-Chuan

doi:10.1007/s12532-023-00250-8

Self-adaptive ADMM for semi-strongly convex problems

Full Length Paper
Published: 27 October 2023

Volume 16, pages 113–150, (2024)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

Tianyun Tang¹ &
Kim-Chuan Toh²

463 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we develop a self-adaptive ADMM that updates the penalty parameter adaptively. When one part of the objective function is strongly convex i.e., the problem is semi-strongly convex, our algorithm can update the penalty parameter adaptively with guaranteed convergence. We establish various types of convergence results including accelerated convergence rate of $O(1/k^2),$ linear convergence and convergence of iteration points. This enhances various previous results because we allow the penalty parameter to change adaptively. We also develop a partial proximal point method with the subproblems being solved by our adaptive ADMM. This enables us to solve problems without semi-strongly convex property. Numerical experiments are conducted to demonstrate the high efficiency and robustness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 2

The symmetric ADMM with indefinite proximal regularization and its application

Article Open access 21 July 2017

On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming

An Approximate ADMM for Solving Linearly Constrained Nonsmooth Optimization Problems with Two Blocks of Variables

Availability of data and materials

The references of all datasets are provided in this published article.

Code Availability

The code is available from https://github.com/ttang-nus/MATLAB-code-for-IADMMs/.

Notes

A sequence $\{a_k\}_{k\in {\mathbb {N}}^+}$ is said to be $\Omega (k)$ if there exists some positive number c and integer $N_0$ such that $a_k\ge c k$ for any $k\ge N_0.$
The parameters used in acc1-ADMM are quite different from the traditional ADMM, so we omit the details here. Readers may refer to [41] (31) case 2 and section 5.1 case 2 for details.
Note that the acc1-ADMM is quite different from the traditional ADMM. Its primal and dual feasibility is close to each other even if its penalty parameter increases rapidly.

References

Bai, X., Li, Q.: A highly efficient adaptive-sieving-based algorithm for the high-dimensional rank lasso problem. arXiv preprint arXiv:2207.12753 (2022)
Bauschke, H.H., Combettes, P.L., et al.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)
Book Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Article MathSciNet Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Article Google Scholar
Chen, L., Sun, D., Toh, K.C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66, 327–343 (2017)
Article MathSciNet ADS Google Scholar
Condat, L.: A direct algorithm for 1-d total variation denoising. IEEE Signal Process. Lett. 20(11), 1054–1057 (2013)
Article ADS Google Scholar
Cui, Y., Sun, D., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. 178, 381–415 (2019)
Article MathSciNet Google Scholar
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66, 889–916 (2016)
Article MathSciNet Google Scholar
Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)
Article MathSciNet Google Scholar
Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
Article MathSciNet Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Google Scholar
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2016)
Article MathSciNet Google Scholar
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(R2), 41–76 (1975)
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
Article MathSciNet Google Scholar
Golub, G.H., van Loan, C.F., et al.: Matrix Computations. Johns Hopkins 113(10), 23–36 (1996)
Google Scholar
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2023). URL https://www.gurobi.com
Ha, C.D.: A generalization of the proximal point algorithm. SIAM J. Control. Optim. 28(3), 503–512 (1990)
Article MathSciNet Google Scholar
He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106, 337–356 (2000)
Article MathSciNet Google Scholar
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet Google Scholar
Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)
Article MathSciNet Google Scholar
Huang, L., Jia, J., Yu, B., Chun, B.G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Jiang, K., Sun, D., Toh, K.C.: Solving nuclear norm regularized and semidefinite matrix least squares problems with linear equality constraints. In: Discrete Geometry and Optimization, pp. 133–162 (2013)
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190(1–2), 57–87 (2021)
Article MathSciNet Google Scholar
Li, H., Lin, Z.: Accelerated alternating direction method of multipliers: an optimal o (1/k) nonergodic analysis. J. Sci. Comput. 79, 671–699 (2019)
Article MathSciNet Google Scholar
Li, M., Sun, D., Toh, K.C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)
Article MathSciNet Google Scholar
Li, X., Sun, D., Toh, K.C.: A highly efficient semismooth newton augmented Lagrangian method for solving lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)
Article MathSciNet Google Scholar
Liang, L., Sun, D., Toh, K.C.: An inexact augmented Lagrangian method for second-order cone programming with applications. SIAM J. Optim. 31(3), 1748–1773 (2021)
Article MathSciNet Google Scholar
Lin, M., Liu, Y.J., Sun, D., Toh, K.C.: Efficient sparse semismooth newton methods for the clustered lasso problem. SIAM J. Optim. 29(3), 2026–2052 (2019)
Article MathSciNet Google Scholar
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, vol. 24 (2011)
Maros, I., Mészáros, C.: A repository of convex quadratic programming problems. Optim. Methods Softw. 11(1–4), 671–681 (1999)
Article MathSciNet Google Scholar
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of admm. In: International conference on machine learning, pp. 343–352. PMLR (2015)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media, Berlin (2009)
Google Scholar
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1), 204–227 (2022)
Article MathSciNet Google Scholar
Tang, P., Wang, C., Jiang, B.: A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems. arXiv preprint arXiv:2106.13683 (2021)
Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72, 1–43 (2019)
Article MathSciNet Google Scholar
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)
Article MathSciNet Google Scholar
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal-dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Article MathSciNet Google Scholar
Wang, L., Peng, B., Bradic, J., Li, R., Wu, Y.: A tuning-free robust and efficient approach to high-dimensional regression. J. Am. Stat. Assoc. 115(532), 1700–1714 (2020)
Article MathSciNet CAS Google Scholar
Wohlberg, B.: ADMM penalty parameter selection by residual balancing. arXiv preprint arXiv:1704.06209 (2017)
Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)
Article MathSciNet Google Scholar
Xu, Y., Akrotirianakis, I., Chakraborty, A.: Proximal gradient method for huberized support vector machine. Pattern Anal. Appl. 19, 989–1005 (2016)
Article MathSciNet Google Scholar
Xu, Y., Zhang, S.: Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. Comput. Optim. Appl. 70, 91–128 (2018)
Article MathSciNet Google Scholar
Xu, Z., Figueiredo, M., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection. In: Artificial Intelligence and Statistics, pp. 718–727. PMLR (2017)
Xu, Z., Figueiredo, M.A., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7389–7398 (2017)
Xu, Z., Taylor, G., Li, H., Figueiredo, M.A., Yuan, X., Goldstein, T.: Adaptive consensus admm for distributed optimization. In: International Conference on Machine Learning, pp. 3841–3850. PMLR (2017)
Yang, L., Toh, K.C.: Bregman proximal point algorithm revisited: a new inexact version and its inertial variant. SIAM J. Optim. 32(3), 1523–1554 (2022)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the reviewers and Associate Editor for many helpful suggestions to improve the quality of the paper.

Funding

The research of the second author is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE- 2019-T3-1-010).

Author information

Authors and Affiliations

Department of Mathematics, National University of Singapore, Singapore, 119076, Singapore
Tianyun Tang
Department of Mathematics, and Institute of Operations Research and Analytics, National University of Singapore, Singapore, 119076, Singapore
Kim-Chuan Toh

Authors

Tianyun Tang
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Chuan Toh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianyun Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof details

1.1 Proof of Lemma 1

Proof

From the optimality conditions in step 1 and 2, we have that

$$\begin{aligned}&-\Big (B^\top \lambda ^k+\beta _kB^\top (By^{k+1}+Cz^k-b)+P_k(y^{k+1}-y^k)\Big ) \in \partial f(y^{k+1}) \end{aligned}$$

(56)

$$\begin{aligned}&-\Big (C^\top \lambda ^k+\beta _kC^\top (By^{k+1}+Cz^{k+1}-b)+Q_k(z^{k+1}-z^k)\Big ) \in \partial g(z^{k+1}). \end{aligned}$$

(57)

From (56) and the convexity of f, we have that

$$\begin{aligned} f(y^{k+1})-f(y)&\le \langle B^\top \lambda ^k+\beta _kB^\top (By^{k+1}+Cz^k-b)+P_k(y^{k+1}\!-\!y^k),\,y\!-\!y^{k+1}\rangle \nonumber \\&=\langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),\,By-By^{k+1}\rangle +\eta _{P_k}(y,y^k,y^{k+1}). \end{aligned}$$

(58)

Similarly, from (57) and (2), we have that

$$\begin{aligned}&g(z^{k+1})-g(z) \nonumber \\&\quad \le \langle \lambda ^k+\beta _k(By^{k+1}+Cz^{k+1}-b),\,Cz-Cz^{k+1}\rangle \nonumber \\&\qquad +\eta _{Q_k}(z,z^k,z^{k+1})-\frac{\sigma _g}{2}\Vert z^{k+1}-z\Vert ^{2}\nonumber \\&\quad =\langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),Cz-Cz^{k+1}\rangle \nonumber \\&\qquad +\eta _{\beta _kC^\top C+Q_k}(z,z^k,z^{k+1})-\frac{\sigma _g}{2}\Vert z^{k+1}-z\Vert ^{2}. \end{aligned}$$

(59)

From (58), (59) we have that

$$\begin{aligned}&{\mathcal {F}}(x^{k+1})-{\mathcal {F}}(x)\nonumber \\&\quad \le \langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),\,b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\qquad +\eta _{\beta _kC^\top C+Q_k}(z,z^k,z^{k+1})+\eta _{P_k}(y,y^k,y^{k+1}) \nonumber \\&\quad \quad -\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^{2} \nonumber \\&\quad =\langle \lambda ^k+\beta _k({\mathcal {A}}x^{k+1}-b),b-{\mathcal {A}}x^{k+1}\rangle +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2 \nonumber \\&\quad =\langle \lambda ^{k+1},b-{\mathcal {A}}x^{k+1}\rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}\!-\!b\Vert ^2\!+\!\beta _k\langle C(z^k\!-\!z^{k+1}),b\!-\!{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2, \end{aligned}$$

(60)

where we have used step 3 to get the last equality. From (60), we have

$$\begin{aligned}&{\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\nonumber \\&\quad \le \langle \lambda ^{k+1}-\lambda ,b-{\mathcal {A}}x^{k+1}\rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\qquad +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1})+\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\nonumber \\&\quad =\Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k \gamma }\Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\qquad +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$

(61)

Now, we need to estimate $\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle $. From (57), we know that

$$\begin{aligned}&-C^\top \lambda ^k-\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)-Q_k(z^{k+1}-z^k)\in \partial g(z^{k+1})\\&-C^\top \lambda ^{k-1}-\beta _{k-1}C^\top ({\mathcal {A}}x^k-b)-Q_{k-1}(z^k-z^{k-1})\in \partial g(z^k) \end{aligned}$$

Combining the above two equations together with the strongly convexity of g, we get

$$\begin{aligned} \Big \langle \begin{array}{c} C^\top (\lambda ^{k-1}-\lambda ^k)-\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)-Q_k(z^{k+1}-z^k) \\ +\beta _{k-1}C^\top ({\mathcal {A}}x^k-b)+Q_{k-1}(z^k-z^{k-1}) \end{array},z^{k+1}-z^k \Big \rangle \ge \sigma _g \Vert z^k-z^{k+1}\Vert ^2, \end{aligned}$$

which, together with step 3, implies that

$$\begin{aligned}&\sigma _g \Vert z^k-z^{k+1}\Vert ^2 \\&\quad \le \langle \lambda ^{k-1}+\beta _{k-1}({\mathcal {A}}x^k-b)-\lambda ^k-\beta _k({\mathcal {A}}x^{k+1}-b),C(z^{k+1}-z^k)\rangle \\&\quad \quad +\langle -Q_k(z^{k+1}-z^k)+Q_{k-1}(z^k-z^{k-1}),z^{k+1}-z^k\rangle \\&\quad =\langle (1-\gamma )\beta _{k-1}({\mathcal {A}}x^k-b)-\beta _k({\mathcal {A}}x^{k+1}-b),C(z^{k+1}-z^k)\rangle \\&\quad \quad +\langle -Q_{k-1}(z^{k+1}-z^k)+Q_{k-1}(z^k-z^{k-1}),z^{k+1}-z^k\rangle \\&\qquad -(\beta _k-\beta _{k-1})\Vert z^{k+1}-z^k\Vert ^2_Q. \end{aligned}$$

The above inequality implies that

$$\begin{aligned}&\beta _k \langle {\mathcal {A}}x^{k+1}-b,C(z^{k+1}-z^k)\rangle \\&\quad \le (\gamma -1)\langle \beta _{k-1}(b-{\mathcal {A}}x^k),C(z^{k+1}-z^k)\rangle -\Vert z^{k+1}-z^k\Vert ^2_{Q_{k}} \\&\quad \quad +\beta _{k-1}\langle Q(z^k-z^{k-1}),z^{k+1}-z^k\rangle -\sigma _g\Vert z^k-z^{k+1}\Vert ^2 \\&\quad \le \frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k-b\Vert ^2 +\frac{(\gamma -1)\gamma \beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2-\Vert z^{k+1}-z^k\Vert ^2_{Q_k}\\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q+\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2. \\&\quad =\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k-b\Vert ^2 +\frac{(1-\delta )\beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q \\&\quad \quad -\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2, \end{aligned}$$

In the above, we use the fact that $\gamma (\gamma -1) = 1-\delta $. Now, we plug the above inequality into (61), we get

$$\begin{aligned}&{\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\nonumber \\&\quad \le \Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k\gamma }\Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 +\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k\!-\!b\Vert ^2\nonumber \\&\quad \quad +\eta _{P_k}(y,y^k,y^{k+1})+\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\frac{(1-\delta )\beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q-\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2 -\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\nonumber \\&\quad \le \Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k\gamma } \Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 \!+\!\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k\!-\!b\Vert ^2\nonumber \\&\quad \quad +\eta _{P_k}(y,y^k,y^{k+1})+\xi _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) - \frac{\delta \beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q-\beta _k\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$

(62)

Note that from step 4, we can derive that

$$\begin{aligned}{} & {} \beta _k\left( \xi _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1})-\frac{(1-\epsilon )\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\right) \nonumber \\{} & {} \quad \le \frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{ C^\top C+Q}. \end{aligned}$$

(63)

Multiply (62) by $\beta _k$ and use the above inequality, we obtain that

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\right) \\&\quad \le \frac{1}{\gamma }\langle \lambda ^{k+1}-\lambda , \lambda ^k-\lambda ^{k+1}\rangle +(\gamma -1)\beta _k^2\Vert {\mathcal {A}}x^{k+1}\\&\qquad -b\Vert ^2+\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma }\Vert {\mathcal {A}}x^k-b\Vert ^2 \\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1}) +\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q -\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2 \\&\quad =\frac{1}{\gamma }\xi (\lambda ,\lambda ^k,\lambda ^{k+1}) -\frac{(2-\gamma )\beta _k^2}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2+\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma }\Vert A x^{k}-b\Vert ^2\\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1})+\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q -\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2 \end{aligned}$$

where we have used the fact that $\langle \lambda _{k+1}-\lambda ,\,\lambda _k-\lambda _{k+1}\rangle = \frac{1}{2}\Vert \lambda -\lambda _k\Vert ^2 -\frac{1}{2}\Vert \lambda -\lambda _{k+1}\Vert ^2 -\frac{1}{2}\Vert \lambda _k-\lambda _{k+1}\Vert ^2$ and $\lambda ^{k+1}-\lambda ^{k}=\gamma \beta _k ({\mathcal {A}}x^{k+1}-b).$ Note that since $\gamma \in (1,\frac{1+\sqrt{5}}{2}),$ $\delta = 1+\gamma -\gamma ^2 > 0$. Using the identity, $\frac{\gamma -1}{2\gamma }= \frac{(2-\gamma )}{2}-\frac{\delta }{2\gamma }$, we deduce that

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\right) +\frac{\delta \beta _{k-1}^2}{2\gamma }\Vert {\mathcal {A}}x^{k}-b\Vert ^2 \\&\quad \le \frac{1}{\gamma }\xi (\lambda ,\lambda ^k\lambda ^{k+1})+ \frac{(2-\gamma )\beta _{k-1}^2}{2}\Vert {\mathcal {A}}x^k-b\Vert ^2 -\frac{(2-\gamma )\beta _{k}^2}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 \\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1}) +\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q-\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$

From here, one can readily get the required inequality in Lemma 1. $\square $

1.2 Proof of Lemma 2

Proof

Because $(x^*,\lambda ^*)$ is a KKT solution, we have that $0\in \partial _x {\mathcal {L}}(x^*,\lambda ^*)$. Since ${\mathcal {L}}(x,\lambda ^*)$ is a convex function of x, we then have that

$$\begin{aligned} {\mathcal {L}}(x^*,\lambda ^*)\le {\mathcal {L}}(x,\lambda ^*)\ \mathrm{for\ any}\ x, \end{aligned}$$

(64)

from which we get

$$\begin{aligned} -\langle \lambda ^*,\,{\mathcal {A}}x^k-b\rangle \le {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*). \end{aligned}$$

(65)

Consider all $\lambda \in {\mathbb {R}}^m$ such that $\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1$ in ${\mathcal {L}}(x^k,\lambda )-{\mathcal {L}}(x^*,\lambda )\le h(k)D(\lambda )$, we have

$$\begin{aligned} {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*)+(\Vert \lambda ^*\Vert +1)\Vert {\mathcal {A}}x^k-b\Vert \le h(k)\max _{\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1}D(\lambda ). \end{aligned}$$

(66)

Using (65) in (66), we get

$$\begin{aligned} \Vert {\mathcal {A}}x^k-b\Vert \le h(k)\max _{\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1}D(\lambda )=O(h(k)). \end{aligned}$$

(67)

Now, using (67) in (65) and (66) respectively, we get $| {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*)|=O(h(k)).$ $\square $

1.3 Proof of Lemma 3

Proof

Substitute $(x^*,\lambda ^*)$ into (5), we get the following long inequality

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda ^*)-{\mathcal {L}}(x^*,\lambda ^*) \right) +\overbrace{\frac{\delta \beta ^2_{k-1}}{2\gamma }\Vert {\mathcal {A}}x^k-b\Vert ^2}^{1}+\overbrace{\frac{\delta \beta ^2_k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2}^{0}\nonumber \\&\quad \quad +\overbrace{\frac{\beta _k}{2}\Vert y^k-y^{k+1} \Vert ^2_{P_k}}^{0}+\frac{\beta _k^2}{2}\Vert z^k-z^{k+1}\Vert ^2_Q+\frac{1}{2\gamma }\Vert \lambda ^*\nonumber \\&\qquad -\lambda ^{k+1}\Vert ^2+\frac{(2-\gamma )\beta ^2_{k}}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\quad \quad +\frac{\beta ^2_{k+1}}{2}\Vert z^*-z^{k+1}\Vert ^2_{C^\top C+Q}+\frac{\beta _k^2}{2}\Vert z^{k+1}-z^k\Vert ^2_Q+\overbrace{\frac{\beta _{k+1}}{2}\Vert y^*-y^{k+1} \Vert ^2_{P_{k+1}}}^{0}\nonumber \\&\quad \le \frac{1}{2\gamma } \Vert \lambda ^*-\lambda ^k\Vert ^2+\frac{(2-\gamma )\beta _{k-1}^2}{2}\Vert {\mathcal {A}}x^k-b\Vert ^2+\frac{\beta _k^2}{2}\Vert z^*\nonumber \\&\qquad -z^k\Vert ^2_{C^\top C+Q}+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q\nonumber \\&\quad \quad +\overbrace{\frac{\beta _k}{2}\Vert y^*-y^k\Vert ^2_{P_k}}^{0}-\overbrace{\sigma _g \beta _k \Vert z^k-z^{k+1}\Vert ^2}^{2}-\overbrace{\frac{\epsilon \sigma _g\beta _k}{2}\Vert z^*-z^{k+1}\Vert ^2}^{3} \end{aligned}$$

(68)

Now, we apply several operations to the above inequality: 1, ignore terms under “0” since $P=0$; 2, move the term under “1” to the right hand side; 3, move the term under “2” to the left hand side and apply $\Vert z^{k+1}-z^k\Vert ^2_Q/\lambda _{\max }(Q)\le \Vert z^k-z^{k+1}\Vert ^2$; 4, move one half of “4” to the left hand side and apply $\Vert z^{k+1}-z^*\Vert ^2\le \Vert z^{k+1}-z^*\Vert ^2_{C^\top C+Q}/\lambda _{\max (C^\top C+Q)}.$ After all these operations, we will get the inequality (16). $\square $

1.4 Proof of Lemma 4

Proof

From step 2 and step 3 in IADMM, we have that

$$\begin{aligned} 0=\nabla g(z^{k+1})+C^\top \lambda ^{k+1}+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k). \end{aligned}$$

Since $(y^*,z^*,\lambda ^*)$ is a KKT solution, we have $0=\nabla g(z^*)+C^\top \lambda ^*.$ Combining these two equations together with the Lipschitz continuity of $\nabla g$, we have

$$\begin{aligned}&\Vert C^\top (\lambda ^{k+1}-\lambda ^*)+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad =\Vert \nabla g(z^{k+1})-\nabla g(z^*)\Vert ^2\nonumber \\&\quad \le L_g^2\Vert z^{k+1}-z^*\Vert ^2. \end{aligned}$$

(69)

For $0<\alpha <\frac{1}{2}$, by using the inequality $\Vert u+v+w\Vert ^2 \ge (1-2\alpha )\Vert u\Vert ^2-\frac{1}{\alpha }\Vert v\Vert ^2-\frac{1}{\alpha }\Vert w\Vert ^2$, we have that

$$\begin{aligned}&\Vert C^\top (\lambda ^{k+1}-\lambda ^*)+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k)\Vert ^2\\&\quad \ge (1-2\alpha )\Vert C^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2 -\frac{1}{\alpha }\Vert (1-\gamma ) \beta _k C^\top ({\mathcal {A}}x^{k+1}-b)\Vert ^2\nonumber \\&\qquad -\frac{1}{\alpha }\Vert Q_k(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \ge (1-2\alpha )\lambda _{\min }(CC^\top )\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2 -\frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\quad \quad -\frac{1}{\alpha }\Vert Q_k(z^{k+1}-z^k)\Vert ^2. \end{aligned}$$

Plug this into (69), we get

$$\begin{aligned}&(1-2\alpha )\lambda _{\min }(CC^\top ) \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2\\&\quad \le \frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\qquad + \frac{1}{\alpha } \Vert Q_k(z^{k+1}-z^k)\Vert ^2+L_g^2\Vert z^{k+1}-z^*\Vert ^2\\&\quad \le \frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\qquad +\frac{1}{\alpha }\lambda _{\max }(Q)\beta _k^2\Vert z^{k+1}-z^k\Vert ^2_Q+L_g^2\Vert z^{k+1}-z^*\Vert ^2. \end{aligned}$$

This completes the proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, T., Toh, KC. Self-adaptive ADMM for semi-strongly convex problems. Math. Prog. Comp. 16, 113–150 (2024). https://doi.org/10.1007/s12532-023-00250-8

Download citation

Received: 19 February 2022
Accepted: 28 September 2023
Published: 27 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s12532-023-00250-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-adaptive ADMM for semi-strongly convex problems

Abstract

Access this article

Similar content being viewed by others

The symmetric ADMM with indefinite proximal regularization and its application

On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming

An Approximate ADMM for Solving Linearly Constrained Nonsmooth Optimization Problems with Two Blocks of Variables

Availability of data and materials

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Proof details

1.1 Proof of Lemma 1

Proof

1.2 Proof of Lemma 2

Proof

1.3 Proof of Lemma 3

Proof

1.4 Proof of Lemma 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Self-adaptive ADMM for semi-strongly convex problems

Abstract

Access this article

Similar content being viewed by others

The symmetric ADMM with indefinite proximal regularization and its application

On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming

An Approximate ADMM for Solving Linearly Constrained Nonsmooth Optimization Problems with Two Blocks of Variables

Availability of data and materials

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Proof details

Proof details

1.1 Proof of Lemma 1

Proof

1.2 Proof of Lemma 2

Proof

1.3 Proof of Lemma 3

Proof

1.4 Proof of Lemma 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation