Abstract
In this paper, we develop a self-adaptive ADMM that updates the penalty parameter adaptively. When one part of the objective function is strongly convex i.e., the problem is semi-strongly convex, our algorithm can update the penalty parameter adaptively with guaranteed convergence. We establish various types of convergence results including accelerated convergence rate of \(O(1/k^2),\) linear convergence and convergence of iteration points. This enhances various previous results because we allow the penalty parameter to change adaptively. We also develop a partial proximal point method with the subproblems being solved by our adaptive ADMM. This enables us to solve problems without semi-strongly convex property. Numerical experiments are conducted to demonstrate the high efficiency and robustness of our method.
Similar content being viewed by others
Availability of data and materials
The references of all datasets are provided in this published article.
Code Availability
The code is available from https://github.com/ttang-nus/MATLAB-code-for-IADMMs/.
Notes
A sequence \(\{a_k\}_{k\in {\mathbb {N}}^+}\) is said to be \(\Omega (k)\) if there exists some positive number c and integer \(N_0\) such that \(a_k\ge c k\) for any \(k\ge N_0.\)
The parameters used in acc1-ADMM are quite different from the traditional ADMM, so we omit the details here. Readers may refer to [41] (31) case 2 and section 5.1 case 2 for details.
Note that the acc1-ADMM is quite different from the traditional ADMM. Its primal and dual feasibility is close to each other even if its penalty parameter increases rapidly.
References
Bai, X., Li, Q.: A highly efficient adaptive-sieving-based algorithm for the high-dimensional rank lasso problem. arXiv preprint arXiv:2207.12753 (2022)
Bauschke, H.H., Combettes, P.L., et al.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Chen, L., Sun, D., Toh, K.C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66, 327–343 (2017)
Condat, L.: A direct algorithm for 1-d total variation denoising. IEEE Signal Process. Lett. 20(11), 1054–1057 (2013)
Cui, Y., Sun, D., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. 178, 381–415 (2019)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66, 889–916 (2016)
Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)
Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2016)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(R2), 41–76 (1975)
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
Golub, G.H., van Loan, C.F., et al.: Matrix Computations. Johns Hopkins 113(10), 23–36 (1996)
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2023). URL https://www.gurobi.com
Ha, C.D.: A generalization of the proximal point algorithm. SIAM J. Control. Optim. 28(3), 503–512 (1990)
He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106, 337–356 (2000)
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)
Huang, L., Jia, J., Yu, B., Chun, B.G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Jiang, K., Sun, D., Toh, K.C.: Solving nuclear norm regularized and semidefinite matrix least squares problems with linear equality constraints. In: Discrete Geometry and Optimization, pp. 133–162 (2013)
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190(1–2), 57–87 (2021)
Li, H., Lin, Z.: Accelerated alternating direction method of multipliers: an optimal o (1/k) nonergodic analysis. J. Sci. Comput. 79, 671–699 (2019)
Li, M., Sun, D., Toh, K.C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)
Li, X., Sun, D., Toh, K.C.: A highly efficient semismooth newton augmented Lagrangian method for solving lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)
Liang, L., Sun, D., Toh, K.C.: An inexact augmented Lagrangian method for second-order cone programming with applications. SIAM J. Optim. 31(3), 1748–1773 (2021)
Lin, M., Liu, Y.J., Sun, D., Toh, K.C.: Efficient sparse semismooth newton methods for the clustered lasso problem. SIAM J. Optim. 29(3), 2026–2052 (2019)
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, vol. 24 (2011)
Maros, I., Mészáros, C.: A repository of convex quadratic programming problems. Optim. Methods Softw. 11(1–4), 671–681 (1999)
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of admm. In: International conference on machine learning, pp. 343–352. PMLR (2015)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media, Berlin (2009)
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1), 204–227 (2022)
Tang, P., Wang, C., Jiang, B.: A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems. arXiv preprint arXiv:2106.13683 (2021)
Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72, 1–43 (2019)
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal-dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Wang, L., Peng, B., Bradic, J., Li, R., Wu, Y.: A tuning-free robust and efficient approach to high-dimensional regression. J. Am. Stat. Assoc. 115(532), 1700–1714 (2020)
Wohlberg, B.: ADMM penalty parameter selection by residual balancing. arXiv preprint arXiv:1704.06209 (2017)
Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)
Xu, Y., Akrotirianakis, I., Chakraborty, A.: Proximal gradient method for huberized support vector machine. Pattern Anal. Appl. 19, 989–1005 (2016)
Xu, Y., Zhang, S.: Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. Comput. Optim. Appl. 70, 91–128 (2018)
Xu, Z., Figueiredo, M., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection. In: Artificial Intelligence and Statistics, pp. 718–727. PMLR (2017)
Xu, Z., Figueiredo, M.A., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7389–7398 (2017)
Xu, Z., Taylor, G., Li, H., Figueiredo, M.A., Yuan, X., Goldstein, T.: Adaptive consensus admm for distributed optimization. In: International Conference on Machine Learning, pp. 3841–3850. PMLR (2017)
Yang, L., Toh, K.C.: Bregman proximal point algorithm revisited: a new inexact version and its inertial variant. SIAM J. Optim. 32(3), 1523–1554 (2022)
Acknowledgements
We thank the reviewers and Associate Editor for many helpful suggestions to improve the quality of the paper.
Funding
The research of the second author is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE- 2019-T3-1-010).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof details
Proof details
1.1 Proof of Lemma 1
Proof
From the optimality conditions in step 1 and 2, we have that
From (56) and the convexity of f, we have that
Similarly, from (57) and (2), we have that
where we have used step 3 to get the last equality. From (60), we have
Now, we need to estimate \(\beta _k\langle C(z^k-z^{k+1}),b-{\mathcal {A}}x^{k+1}\rangle \). From (57), we know that
Combining the above two equations together with the strongly convexity of g, we get
which, together with step 3, implies that
The above inequality implies that
In the above, we use the fact that \(\gamma (\gamma -1) = 1-\delta \). Now, we plug the above inequality into (61), we get
Note that from step 4, we can derive that
Multiply (62) by \(\beta _k\) and use the above inequality, we obtain that
where we have used the fact that \(\langle \lambda _{k+1}-\lambda ,\,\lambda _k-\lambda _{k+1}\rangle = \frac{1}{2}\Vert \lambda -\lambda _k\Vert ^2 -\frac{1}{2}\Vert \lambda -\lambda _{k+1}\Vert ^2 -\frac{1}{2}\Vert \lambda _k-\lambda _{k+1}\Vert ^2\) and \(\lambda ^{k+1}-\lambda ^{k}=\gamma \beta _k ({\mathcal {A}}x^{k+1}-b).\) Note that since \(\gamma \in (1,\frac{1+\sqrt{5}}{2}),\) \(\delta = 1+\gamma -\gamma ^2 > 0\). Using the identity, \(\frac{\gamma -1}{2\gamma }= \frac{(2-\gamma )}{2}-\frac{\delta }{2\gamma }\), we deduce that
From here, one can readily get the required inequality in Lemma 1. \(\square \)
1.2 Proof of Lemma 2
Proof
Because \((x^*,\lambda ^*)\) is a KKT solution, we have that \(0\in \partial _x {\mathcal {L}}(x^*,\lambda ^*)\). Since \({\mathcal {L}}(x,\lambda ^*)\) is a convex function of x, we then have that
from which we get
Consider all \(\lambda \in {\mathbb {R}}^m\) such that \(\Vert \lambda \Vert \le \Vert \lambda ^*\Vert +1\) in \({\mathcal {L}}(x^k,\lambda )-{\mathcal {L}}(x^*,\lambda )\le h(k)D(\lambda )\), we have
Now, using (67) in (65) and (66) respectively, we get \(| {\mathcal {F}}(x^k)-{\mathcal {F}}(x^*)|=O(h(k)).\) \(\square \)
1.3 Proof of Lemma 3
Proof
Substitute \((x^*,\lambda ^*)\) into (5), we get the following long inequality
Now, we apply several operations to the above inequality: 1, ignore terms under “0” since \(P=0\); 2, move the term under “1” to the right hand side; 3, move the term under “2” to the left hand side and apply \(\Vert z^{k+1}-z^k\Vert ^2_Q/\lambda _{\max }(Q)\le \Vert z^k-z^{k+1}\Vert ^2\); 4, move one half of “4” to the left hand side and apply \(\Vert z^{k+1}-z^*\Vert ^2\le \Vert z^{k+1}-z^*\Vert ^2_{C^\top C+Q}/\lambda _{\max (C^\top C+Q)}.\) After all these operations, we will get the inequality (16). \(\square \)
1.4 Proof of Lemma 4
Proof
From step 2 and step 3 in IADMM, we have that
Since \((y^*,z^*,\lambda ^*)\) is a KKT solution, we have \(0=\nabla g(z^*)+C^\top \lambda ^*.\) Combining these two equations together with the Lipschitz continuity of \(\nabla g\), we have
For \(0<\alpha <\frac{1}{2}\), by using the inequality \(\Vert u+v+w\Vert ^2 \ge (1-2\alpha )\Vert u\Vert ^2-\frac{1}{\alpha }\Vert v\Vert ^2-\frac{1}{\alpha }\Vert w\Vert ^2\), we have that
Plug this into (69), we get
This completes the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, T., Toh, KC. Self-adaptive ADMM for semi-strongly convex problems. Math. Prog. Comp. 16, 113–150 (2024). https://doi.org/10.1007/s12532-023-00250-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-023-00250-8