Skip to main content
Log in

Self-adaptive ADMM for semi-strongly convex problems

  • Full Length Paper
  • Published:
Mathematical Programming Computation Aims and scope Submit manuscript

Abstract

In this paper, we develop a self-adaptive ADMM that updates the penalty parameter adaptively. When one part of the objective function is strongly convex i.e., the problem is semi-strongly convex, our algorithm can update the penalty parameter adaptively with guaranteed convergence. We establish various types of convergence results including accelerated convergence rate of \(O(1/k^2),\) linear convergence and convergence of iteration points. This enhances various previous results because we allow the penalty parameter to change adaptively. We also develop a partial proximal point method with the subproblems being solved by our adaptive ADMM. This enables us to solve problems without semi-strongly convex property. Numerical experiments are conducted to demonstrate the high efficiency and robustness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Algorithm 4
Algorithm 5

Similar content being viewed by others

Availability of data and materials

The references of all datasets are provided in this published article.

Code Availability

The code is available from https://github.com/ttang-nus/MATLAB-code-for-IADMMs/.

Notes

  1. A sequence \(\{a_k\}_{k\in {\mathbb {N}}^+}\) is said to be \(\Omega (k)\) if there exists some positive number c and integer \(N_0\) such that \(a_k\ge c k\) for any \(k\ge N_0.\)

  2. The parameters used in acc1-ADMM are quite different from the traditional ADMM, so we omit the details here. Readers may refer to [41] (31) case 2 and section 5.1 case 2 for details.

  3. Note that the acc1-ADMM is quite different from the traditional ADMM. Its primal and dual feasibility is close to each other even if its penalty parameter increases rapidly.

References

  1. Bai, X., Li, Q.: A highly efficient adaptive-sieving-based algorithm for the high-dimensional rank lasso problem. arXiv preprint arXiv:2207.12753 (2022)

  2. Bauschke, H.H., Combettes, P.L., et al.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)

    Book  Google Scholar 

  3. Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    Article  MathSciNet  Google Scholar 

  4. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  5. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)

    Article  MathSciNet  Google Scholar 

  6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)

    Article  Google Scholar 

  7. Chen, L., Sun, D., Toh, K.C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66, 327–343 (2017)

    Article  MathSciNet  ADS  Google Scholar 

  8. Condat, L.: A direct algorithm for 1-d total variation denoising. IEEE Signal Process. Lett. 20(11), 1054–1057 (2013)

    Article  ADS  Google Scholar 

  9. Cui, Y., Sun, D., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. 178, 381–415 (2019)

    Article  MathSciNet  Google Scholar 

  10. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66, 889–916 (2016)

    Article  MathSciNet  Google Scholar 

  11. Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)

    Article  MathSciNet  Google Scholar 

  12. Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)

    Article  MathSciNet  Google Scholar 

  13. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Google Scholar 

  14. Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2016)

    Article  MathSciNet  Google Scholar 

  15. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(R2), 41–76 (1975)

  16. Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)

    Article  MathSciNet  Google Scholar 

  17. Golub, G.H., van Loan, C.F., et al.: Matrix Computations. Johns Hopkins 113(10), 23–36 (1996)

    Google Scholar 

  18. Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2023). URL https://www.gurobi.com

  19. Ha, C.D.: A generalization of the proximal point algorithm. SIAM J. Control. Optim. 28(3), 503–512 (1990)

    Article  MathSciNet  Google Scholar 

  20. He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106, 337–356 (2000)

    Article  MathSciNet  Google Scholar 

  21. He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  22. Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)

    Article  MathSciNet  Google Scholar 

  23. Huang, L., Jia, J., Yu, B., Chun, B.G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

  24. Jiang, K., Sun, D., Toh, K.C.: Solving nuclear norm regularized and semidefinite matrix least squares problems with linear equality constraints. In: Discrete Geometry and Optimization, pp. 133–162 (2013)

  25. Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190(1–2), 57–87 (2021)

    Article  MathSciNet  Google Scholar 

  26. Li, H., Lin, Z.: Accelerated alternating direction method of multipliers: an optimal o (1/k) nonergodic analysis. J. Sci. Comput. 79, 671–699 (2019)

    Article  MathSciNet  Google Scholar 

  27. Li, M., Sun, D., Toh, K.C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)

    Article  MathSciNet  Google Scholar 

  28. Li, X., Sun, D., Toh, K.C.: A highly efficient semismooth newton augmented Lagrangian method for solving lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)

    Article  MathSciNet  Google Scholar 

  29. Liang, L., Sun, D., Toh, K.C.: An inexact augmented Lagrangian method for second-order cone programming with applications. SIAM J. Optim. 31(3), 1748–1773 (2021)

    Article  MathSciNet  Google Scholar 

  30. Lin, M., Liu, Y.J., Sun, D., Toh, K.C.: Efficient sparse semismooth newton methods for the clustered lasso problem. SIAM J. Optim. 29(3), 2026–2052 (2019)

    Article  MathSciNet  Google Scholar 

  31. Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, vol. 24 (2011)

  32. Maros, I., Mészáros, C.: A repository of convex quadratic programming problems. Optim. Methods Softw. 11(1–4), 671–681 (1999)

    Article  MathSciNet  Google Scholar 

  33. Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of admm. In: International conference on machine learning, pp. 343–352. PMLR (2015)

  34. Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  Google Scholar 

  35. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    Article  MathSciNet  Google Scholar 

  36. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media, Berlin (2009)

    Google Scholar 

  37. Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1), 204–227 (2022)

    Article  MathSciNet  Google Scholar 

  38. Tang, P., Wang, C., Jiang, B.: A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems. arXiv preprint arXiv:2106.13683 (2021)

  39. Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72, 1–43 (2019)

    Article  MathSciNet  Google Scholar 

  40. Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)

    Article  MathSciNet  Google Scholar 

  41. Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal-dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)

    Article  MathSciNet  Google Scholar 

  42. Wang, L., Peng, B., Bradic, J., Li, R., Wu, Y.: A tuning-free robust and efficient approach to high-dimensional regression. J. Am. Stat. Assoc. 115(532), 1700–1714 (2020)

    Article  MathSciNet  CAS  Google Scholar 

  43. Wohlberg, B.: ADMM penalty parameter selection by residual balancing. arXiv preprint arXiv:1704.06209 (2017)

  44. Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)

    Article  MathSciNet  Google Scholar 

  45. Xu, Y., Akrotirianakis, I., Chakraborty, A.: Proximal gradient method for huberized support vector machine. Pattern Anal. Appl. 19, 989–1005 (2016)

    Article  MathSciNet  Google Scholar 

  46. Xu, Y., Zhang, S.: Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. Comput. Optim. Appl. 70, 91–128 (2018)

    Article  MathSciNet  Google Scholar 

  47. Xu, Z., Figueiredo, M., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection. In: Artificial Intelligence and Statistics, pp. 718–727. PMLR (2017)

  48. Xu, Z., Figueiredo, M.A., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7389–7398 (2017)

  49. Xu, Z., Taylor, G., Li, H., Figueiredo, M.A., Yuan, X., Goldstein, T.: Adaptive consensus admm for distributed optimization. In: International Conference on Machine Learning, pp. 3841–3850. PMLR (2017)

  50. Yang, L., Toh, K.C.: Bregman proximal point algorithm revisited: a new inexact version and its inertial variant. SIAM J. Optim. 32(3), 1523–1554 (2022)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the reviewers and Associate Editor for many helpful suggestions to improve the quality of the paper.

Funding

The research of the second author is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE- 2019-T3-1-010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyun Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof details

Proof details

1.1 Proof of Lemma 1

Proof

From the optimality conditions in step 1 and 2, we have that

$$\begin{aligned}&-\Big (B^\top \lambda ^k+\beta _kB^\top (By^{k+1}+Cz^k-b)+P_k(y^{k+1}-y^k)\Big ) \in \partial f(y^{k+1}) \end{aligned}$$
(56)
$$\begin{aligned}&-\Big (C^\top \lambda ^k+\beta _kC^\top (By^{k+1}+Cz^{k+1}-b)+Q_k(z^{k+1}-z^k)\Big ) \in \partial g(z^{k+1}). \end{aligned}$$
(57)

From (56) and the convexity of f, we have that

$$\begin{aligned} f(y^{k+1})-f(y)&\le \langle B^\top \lambda ^k+\beta _kB^\top (By^{k+1}+Cz^k-b)+P_k(y^{k+1}\!-\!y^k),\,y\!-\!y^{k+1}\rangle \nonumber \\&=\langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),\,By-By^{k+1}\rangle +\eta _{P_k}(y,y^k,y^{k+1}). \end{aligned}$$
(58)

Similarly, from (57) and (2), we have that

$$\begin{aligned}&g(z^{k+1})-g(z) \nonumber \\&\quad \le \langle \lambda ^k+\beta _k(By^{k+1}+Cz^{k+1}-b),\,Cz-Cz^{k+1}\rangle \nonumber \\&\qquad +\eta _{Q_k}(z,z^k,z^{k+1})-\frac{\sigma _g}{2}\Vert z^{k+1}-z\Vert ^{2}\nonumber \\&\quad =\langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),Cz-Cz^{k+1}\rangle \nonumber \\&\qquad +\eta _{\beta _kC^\top C+Q_k}(z,z^k,z^{k+1})-\frac{\sigma _g}{2}\Vert z^{k+1}-z\Vert ^{2}. \end{aligned}$$
(59)

From (58), (59) we have that

$$\begin{aligned}&{\mathcal {F}}(x^{k+1})-{\mathcal {F}}(x)\nonumber \\&\quad \le \langle \lambda ^k+\beta _k(By^{k+1}+Cz^k-b),\,b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\qquad +\eta _{\beta _kC^\top C+Q_k}(z,z^k,z^{k+1})+\eta _{P_k}(y,y^k,y^{k+1}) \nonumber \\&\quad \quad -\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^{2} \nonumber \\&\quad =\langle \lambda ^k+\beta _k({\mathcal {A}}x^{k+1}-b),b-{\mathcal {A}}x^{k+1}\rangle +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2 \nonumber \\&\quad =\langle \lambda ^{k+1},b-{\mathcal {A}}x^{k+1}\rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}\!-\!b\Vert ^2\!+\!\beta _k\langle C(z^k\!-\!z^{k+1}),b\!-\!{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2, \end{aligned}$$
(60)

where we have used step 3 to get the last equality. From (60), we have

$$\begin{aligned}&{\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\nonumber \\&\quad \le \langle \lambda ^{k+1}-\lambda ,b-{\mathcal {A}}x^{k+1}\rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\qquad +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1})+\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\nonumber \\&\quad =\Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k \gamma }\Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\qquad +\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \nonumber \\&\quad \quad +\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\eta _{P_k}(y,y^k,y^{k+1})-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$
(61)

Now, we need to estimate \(\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \). From (57), we know that

$$\begin{aligned}&-C^\top \lambda ^k-\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)-Q_k(z^{k+1}-z^k)\in \partial g(z^{k+1})\\&-C^\top \lambda ^{k-1}-\beta _{k-1}C^\top ({\mathcal {A}}x^k-b)-Q_{k-1}(z^k-z^{k-1})\in \partial g(z^k) \end{aligned}$$

Combining the above two equations together with the strongly convexity of g, we get

$$\begin{aligned} \Big \langle \begin{array}{c} C^\top (\lambda ^{k-1}-\lambda ^k)-\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)-Q_k(z^{k+1}-z^k) \\ +\beta _{k-1}C^\top ({\mathcal {A}}x^k-b)+Q_{k-1}(z^k-z^{k-1}) \end{array},z^{k+1}-z^k \Big \rangle \ge \sigma _g \Vert z^k-z^{k+1}\Vert ^2, \end{aligned}$$

which, together with step 3, implies that

$$\begin{aligned}&\sigma _g \Vert z^k-z^{k+1}\Vert ^2 \\&\quad \le \langle \lambda ^{k-1}+\beta _{k-1}({\mathcal {A}}x^k-b)-\lambda ^k-\beta _k({\mathcal {A}}x^{k+1}-b),C(z^{k+1}-z^k)\rangle \\&\quad \quad +\langle -Q_k(z^{k+1}-z^k)+Q_{k-1}(z^k-z^{k-1}),z^{k+1}-z^k\rangle \\&\quad =\langle (1-\gamma )\beta _{k-1}({\mathcal {A}}x^k-b)-\beta _k({\mathcal {A}}x^{k+1}-b),C(z^{k+1}-z^k)\rangle \\&\quad \quad +\langle -Q_{k-1}(z^{k+1}-z^k)+Q_{k-1}(z^k-z^{k-1}),z^{k+1}-z^k\rangle \\&\qquad -(\beta _k-\beta _{k-1})\Vert z^{k+1}-z^k\Vert ^2_Q. \end{aligned}$$

The above inequality implies that

$$\begin{aligned}&\beta _k \langle {\mathcal {A}}x^{k+1}-b,C(z^{k+1}-z^k)\rangle \\&\quad \le (\gamma -1)\langle \beta _{k-1}(b-{\mathcal {A}}x^k),C(z^{k+1}-z^k)\rangle -\Vert z^{k+1}-z^k\Vert ^2_{Q_{k}} \\&\quad \quad +\beta _{k-1}\langle Q(z^k-z^{k-1}),z^{k+1}-z^k\rangle -\sigma _g\Vert z^k-z^{k+1}\Vert ^2 \\&\quad \le \frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k-b\Vert ^2 +\frac{(\gamma -1)\gamma \beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2-\Vert z^{k+1}-z^k\Vert ^2_{Q_k}\\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q+\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2. \\&\quad =\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k-b\Vert ^2 +\frac{(1-\delta )\beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q \\&\quad \quad -\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2, \end{aligned}$$

In the above, we use the fact that \(\gamma (\gamma -1) = 1-\delta \). Now, we plug the above inequality into (61), we get

$$\begin{aligned}&{\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\nonumber \\&\quad \le \Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k\gamma }\Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 +\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k\!-\!b\Vert ^2\nonumber \\&\quad \quad +\eta _{P_k}(y,y^k,y^{k+1})+\eta _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) +\frac{(1-\delta )\beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q-\frac{\beta _k}{2}\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2 -\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\nonumber \\&\quad \le \Big \langle \lambda ^{k+1}-\lambda ,\frac{\lambda ^k-\lambda ^{k+1}}{\beta _k\gamma } \Big \rangle +(\gamma -1)\beta _k\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 \!+\!\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma \beta _k}\Vert {\mathcal {A}}x^k\!-\!b\Vert ^2\nonumber \\&\quad \quad +\eta _{P_k}(y,y^k,y^{k+1})+\xi _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1}) - \frac{\delta \beta _k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \quad +\frac{\beta _{k-1}^2}{2\beta _k}\Vert z^k-z^{k-1}\Vert ^2_Q-\beta _k\Vert z^{k+1}-z^k\Vert ^2_Q-\sigma _g\Vert z^k-z^{k+1}\Vert ^2-\frac{\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$
(62)

Note that from step 4, we can derive that

$$\begin{aligned}{} & {} \beta _k\left( \xi _{\beta _k C^\top C+Q_k}(z,z^k,z^{k+1})-\frac{(1-\epsilon )\sigma _g}{2}\Vert z-z^{k+1}\Vert ^2\right) \nonumber \\{} & {} \quad \le \frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{ C^\top C+Q}. \end{aligned}$$
(63)

Multiply (62) by \(\beta _k\) and use the above inequality, we obtain that

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\right) \\&\quad \le \frac{1}{\gamma }\langle \lambda ^{k+1}-\lambda , \lambda ^k-\lambda ^{k+1}\rangle +(\gamma -1)\beta _k^2\Vert {\mathcal {A}}x^{k+1}\\&\qquad -b\Vert ^2+\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma }\Vert {\mathcal {A}}x^k-b\Vert ^2 \\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1}) +\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q -\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2 \\&\quad =\frac{1}{\gamma }\xi (\lambda ,\lambda ^k,\lambda ^{k+1}) -\frac{(2-\gamma )\beta _k^2}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2+\frac{(\gamma -1)\beta _{k-1}^2}{2\gamma }\Vert A x^{k}-b\Vert ^2\\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1})+\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q -\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2 \end{aligned}$$

where we have used the fact that \(\langle \lambda _{k+1}-\lambda ,\,\lambda _k-\lambda _{k+1}\rangle = \frac{1}{2}\Vert \lambda -\lambda _k\Vert ^2 -\frac{1}{2}\Vert \lambda -\lambda _{k+1}\Vert ^2 -\frac{1}{2}\Vert \lambda _k-\lambda _{k+1}\Vert ^2\) and \(\lambda ^{k+1}-\lambda ^{k}=\gamma \beta _k ({\mathcal {A}}x^{k+1}-b).\) Note that since \(\gamma \in (1,\frac{1+\sqrt{5}}{2}),\) \(\delta = 1+\gamma -\gamma ^2 > 0\). Using the identity, \(\frac{\gamma -1}{2\gamma }= \frac{(2-\gamma )}{2}-\frac{\delta }{2\gamma }\), we deduce that

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda )-{\mathcal {L}}(x,\lambda )\right) +\frac{\delta \beta _{k-1}^2}{2\gamma }\Vert {\mathcal {A}}x^{k}-b\Vert ^2 \\&\quad \le \frac{1}{\gamma }\xi (\lambda ,\lambda ^k\lambda ^{k+1})+ \frac{(2-\gamma )\beta _{k-1}^2}{2}\Vert {\mathcal {A}}x^k-b\Vert ^2 -\frac{(2-\gamma )\beta _{k}^2}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2 \\&\quad \quad +\beta _k\eta _{P_k}(y,y^k,y^{k+1}) +\frac{\beta _k^2}{2}\Vert z-z^k\Vert ^2_{C^\top C+Q}-\frac{\beta _{k+1}^2}{2}\Vert z-z^{k+1}\Vert ^2_{C^\top C+Q} \\&\quad \quad -\frac{\delta \beta _k^2}{2}\Vert C(z^{k+1}-z^k)\Vert ^2+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q -\beta _k^2\Vert z^{k+1}\\&\qquad -z^k\Vert ^2_Q-\sigma _g\beta _k\Vert z^k-z^{k+1}\Vert ^2\\&\quad \quad -\frac{\epsilon \sigma _g\beta _k}{2}\Vert z-z^{k+1}\Vert ^2. \end{aligned}$$

From here, one can readily get the required inequality in Lemma 1. \(\square \)

1.2 Proof of Lemma 2

Proof

Because \((x^*,\lambda ^*)\) is a KKT solution, we have that \(0\in \partial _x {\mathcal {L}}(x^*,\lambda ^*)\). Since \({\mathcal {L}}(x,\lambda ^*)\) is a convex function of x, we then have that

$$\begin{aligned} {\mathcal {L}}(x^*,\lambda ^*)\le {\mathcal {L}}(x,\lambda ^*)\ \mathrm{for\ any}\ x, \end{aligned}$$
(64)

from which we get

$$\begin{aligned} -\langle \lambda ^*,\,{\mathcal {A}}x^k-b\rangle \le {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*). \end{aligned}$$
(65)

Consider all \(\lambda \in {\mathbb {R}}^m\) such that \(\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1\) in \({\mathcal {L}}(x^k,\lambda )-{\mathcal {L}}(x^*,\lambda )\le h(k)D(\lambda )\), we have

$$\begin{aligned} {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*)+(\Vert \lambda ^*\Vert +1)\Vert {\mathcal {A}}x^k-b\Vert \le h(k)\max _{\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1}D(\lambda ). \end{aligned}$$
(66)

Using (65) in (66), we get

$$\begin{aligned} \Vert {\mathcal {A}}x^k-b\Vert \le h(k)\max _{\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1}D(\lambda )=O(h(k)). \end{aligned}$$
(67)

Now, using (67) in (65) and (66) respectively, we get \(| {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*)|=O(h(k)).\) \(\square \)

1.3 Proof of Lemma 3

Proof

Substitute \((x^*,\lambda ^*)\) into (5), we get the following long inequality

$$\begin{aligned}&\beta _k\left( {\mathcal {L}}(x^{k+1},\lambda ^*)-{\mathcal {L}}(x^*,\lambda ^*) \right) +\overbrace{\frac{\delta \beta ^2_{k-1}}{2\gamma }\Vert {\mathcal {A}}x^k-b\Vert ^2}^{1}+\overbrace{\frac{\delta \beta ^2_k}{2}\Vert C(z^{k+1}-z^k)\Vert ^2}^{0}\nonumber \\&\quad \quad +\overbrace{\frac{\beta _k}{2}\Vert y^k-y^{k+1} \Vert ^2_{P_k}}^{0}+\frac{\beta _k^2}{2}\Vert z^k-z^{k+1}\Vert ^2_Q+\frac{1}{2\gamma }\Vert \lambda ^*\nonumber \\&\qquad -\lambda ^{k+1}\Vert ^2+\frac{(2-\gamma )\beta ^2_{k}}{2}\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\nonumber \\&\quad \quad +\frac{\beta ^2_{k+1}}{2}\Vert z^*-z^{k+1}\Vert ^2_{C^\top C+Q}+\frac{\beta _k^2}{2}\Vert z^{k+1}-z^k\Vert ^2_Q+\overbrace{\frac{\beta _{k+1}}{2}\Vert y^*-y^{k+1} \Vert ^2_{P_{k+1}}}^{0}\nonumber \\&\quad \le \frac{1}{2\gamma } \Vert \lambda ^*-\lambda ^k\Vert ^2+\frac{(2-\gamma )\beta _{k-1}^2}{2}\Vert {\mathcal {A}}x^k-b\Vert ^2+\frac{\beta _k^2}{2}\Vert z^*\nonumber \\&\qquad -z^k\Vert ^2_{C^\top C+Q}+\frac{\beta _{k-1}^2}{2}\Vert z^k-z^{k-1}\Vert ^2_Q\nonumber \\&\quad \quad +\overbrace{\frac{\beta _k}{2}\Vert y^*-y^k\Vert ^2_{P_k}}^{0}-\overbrace{\sigma _g \beta _k \Vert z^k-z^{k+1}\Vert ^2}^{2}-\overbrace{\frac{\epsilon \sigma _g\beta _k}{2}\Vert z^*-z^{k+1}\Vert ^2}^{3} \end{aligned}$$
(68)

Now, we apply several operations to the above inequality: 1, ignore terms under “0” since \(P=0\); 2, move the term under “1” to the right hand side; 3, move the term under “2” to the left hand side and apply \(\Vert z^{k+1}-z^k\Vert ^2_Q/\lambda _{\max }(Q)\le \Vert z^k-z^{k+1}\Vert ^2\); 4, move one half of “4” to the left hand side and apply \(\Vert z^{k+1}-z^*\Vert ^2\le \Vert z^{k+1}-z^*\Vert ^2_{C^\top C+Q}/\lambda _{\max (C^\top C+Q)}.\) After all these operations, we will get the inequality (16). \(\square \)

1.4 Proof of Lemma 4

Proof

From step 2 and step 3 in IADMM, we have that

$$\begin{aligned} 0=\nabla g(z^{k+1})+C^\top \lambda ^{k+1}+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k). \end{aligned}$$

Since \((y^*,z^*,\lambda ^*)\) is a KKT solution, we have \(0=\nabla g(z^*)+C^\top \lambda ^*.\) Combining these two equations together with the Lipschitz continuity of \(\nabla g\), we have

$$\begin{aligned}&\Vert C^\top (\lambda ^{k+1}-\lambda ^*)+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad =\Vert \nabla g(z^{k+1})-\nabla g(z^*)\Vert ^2\nonumber \\&\quad \le L_g^2\Vert z^{k+1}-z^*\Vert ^2. \end{aligned}$$
(69)

For \(0<\alpha <\frac{1}{2}\), by using the inequality \(\Vert u+v+w\Vert ^2 \ge (1-2\alpha )\Vert u\Vert ^2-\frac{1}{\alpha }\Vert v\Vert ^2-\frac{1}{\alpha }\Vert w\Vert ^2\), we have that

$$\begin{aligned}&\Vert C^\top (\lambda ^{k+1}-\lambda ^*)+(1-\gamma )\beta _k C^\top ({\mathcal {A}}x^{k+1}-b)+Q_k(z^{k+1}-z^k)\Vert ^2\\&\quad \ge (1-2\alpha )\Vert C^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2 -\frac{1}{\alpha }\Vert (1-\gamma ) \beta _k C^\top ({\mathcal {A}}x^{k+1}-b)\Vert ^2\nonumber \\&\qquad -\frac{1}{\alpha }\Vert Q_k(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad \ge (1-2\alpha )\lambda _{\min }(CC^\top )\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2 -\frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\quad \quad -\frac{1}{\alpha }\Vert Q_k(z^{k+1}-z^k)\Vert ^2. \end{aligned}$$

Plug this into (69), we get

$$\begin{aligned}&(1-2\alpha )\lambda _{\min }(CC^\top ) \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2\\&\quad \le \frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\qquad + \frac{1}{\alpha } \Vert Q_k(z^{k+1}-z^k)\Vert ^2+L_g^2\Vert z^{k+1}-z^*\Vert ^2\\&\quad \le \frac{1}{\alpha }\lambda _{\max }(CC^\top )(1-\gamma )^2\beta _k^2\Vert {\mathcal {A}}x^{k+1}-b\Vert ^2\\&\qquad +\frac{1}{\alpha }\lambda _{\max }(Q)\beta _k^2\Vert z^{k+1}-z^k\Vert ^2_Q+L_g^2\Vert z^{k+1}-z^*\Vert ^2. \end{aligned}$$

This completes the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, T., Toh, KC. Self-adaptive ADMM for semi-strongly convex problems. Math. Prog. Comp. 16, 113–150 (2024). https://doi.org/10.1007/s12532-023-00250-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12532-023-00250-8

Keywords

Mathematics Subject Classification

Navigation