On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Deng, Wei; Yin, Wotao

doi:10.1007/s10915-015-0048-x

On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Published: 30 May 2015

Volume 66, pages 889–916, (2016)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

6759 Accesses
314 Citations
Explore all metrics

Abstract

The formulation

$$\begin{aligned} \min _{x,y} ~f(x)+g(y),\quad \text{ subject } \text{ to } Ax+By=b, \end{aligned}$$

where f and g are extended-value convex functions, arises in many application areas such as signal processing, imaging and image processing, statistics, and machine learning either naturally or after variable splitting. In many common problems, one of the two objective functions is strictly convex and has Lipschitz continuous gradient. On this kind of problem, a very effective approach is the alternating direction method of multipliers (ADM or ADMM), which solves a sequence of f/g-decoupled subproblems. However, its effectiveness has not been matched by a provably fast rate of convergence; only sublinear rates such as O(1 / k) and $O(1/k^2)$ were recently established in the literature, though the O(1 / k) rates do not require strong convexity. This paper shows that global linear convergence can be guaranteed under the assumptions of strong convexity and Lipschitz gradient on one of the two functions, along with certain rank assumptions on A and B. The result applies to various generalizations of ADM that allow the subproblems to be solved faster and less exactly in certain manners. The derived rate of convergence also provides some theoretical guidance for optimizing the ADM parameters. In addition, this paper makes meaningful extensions to the existing global convergence theory of ADM generalizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

Article 04 January 2021

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

Article 02 February 2016

On the linear convergence of the alternating direction method of multipliers

Article 06 July 2016

Notes

The results continue to hold in many cases when strong convexity is relaxed to strict convexity (e.g., $-\log (x)$ is strictly convex but not strongly convex over $x>0$. ADM always generates a bounded sequence $Ax^k, By^k, {\uplambda }^k$, where the bound only depends on the starting point and the solution, even when the feasible set is unbounded. When restricted to a compact set, a strictly convex function is strongly convex.
Suppose a sequence $\{u^k\}$ converges to $u^*$. We say the convergence is (in some norm $\Vert \cdot \Vert $)
- Q-linear, if there exists $\mu \in (0,1)$ such that $\frac{\Vert u^{k+1}-u^*\Vert }{\Vert u^{k}-u^*\Vert }\le \mu $;
- R-linear, if there exists a sequence $\{\sigma ^k\}$ such that $\Vert u^{k}-u^*\Vert \le \sigma ^k$ and $\sigma ^k\rightarrow 0$ Q-linearly.

References

Boley, D.: Linear convergence of ADMM on a model problem. TR 12-009, Department of Computer Science and Engineering, University of Minnesota (2012)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)
Article MATH Google Scholar
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)
Article Google Scholar
Cai, J., Osher, S., Shen, Z.: Split Bregman methods and frame based image restoration. Multiscale Model. Simul. 8(2), 337 (2009)
Article MathSciNet Google Scholar
Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64(1), 81–101 (1994)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. arXiv preprint arXiv:1406.4834 (2014)
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. arXiv preprint arXiv:1407.5210 (2014)
Deng, W., Yin, W., Zhang, Y.: Group sparse optimization by alternating direction method. In: SPIE Optical Engineering+Applications, pp. 88580R–88580R (2013)
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.: An alternating direction method for linear programming. Division of Research, Harvard Business School, Laboratory for Information Technology, M.I., Systems (1990)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Article MathSciNet MATH Google Scholar
Esser, E.: Applications of Lagrangian-based alternating direction methods and connections to split Bregman. CAM report 09–31, UCLA (2009)
Gabay, D.: Chapter ix applications of the method of multipliers to variational inequalities. Stud. Math. Appl. 15, 299–331 (1983)
Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems, Springer Series in Computational Physics. Springer, Berlin (1984)
Book Google Scholar
Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. Laboria (1975)
Goldfarb, D., Ma, S.: Fast multiple splitting algorithms for convex optimization. SIAM J. Optim. 22(2), 533–556 (2012)
Article MathSciNet MATH Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)
Goldfarb, D., Yin, W.: Parametric maximum flow algorithms for fast total variation minimization. SIAM J. Sci. Comput. 31(5), 3712–3743 (2009)
Article MathSciNet MATH Google Scholar
Goldstein, T., Bresson, X., Osher, S.: Geometric applications of the split Bregman method: segmentation and surface reconstruction. J. Sci. Comput. 45(1), 272–293 (2010)
Article MathSciNet MATH Google Scholar
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
Goldstein, T., Osher, S.: The split Bregman method for L1 regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009)
Article MathSciNet MATH Google Scholar
He, B., Liao, L., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92(1), 103–118 (2002)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numerische Mathematik 130(3), 567–577 (2014)
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. Arxiv preprint arXiv:1208.3922v3 (2013)
Jiang, H., Deng, W., Shen, Z.: Surveillance video processing using compressive sensing. Inverse Probl. Imaging 6(2), 201–214 (2012)
Liang, J., Fadili, J., Peyre, G., Luke, R.: Activity identification and local linear convergence of Douglas–Rachford/ADMM under partial smoothness. arXiv:1412.6858v5 (2015)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
Mateos, G., Bazerque, J., Giannakis, G.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)
Article MathSciNet Google Scholar
Mendel, J., Burrus, C.: Maximum-Likelihood Deconvolution: A Journey into Model-Based Signal Processing. Springer, New York (1990)
Book Google Scholar
Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Rockafellar, R.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)
MATH Google Scholar
Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008)
Article MathSciNet MATH Google Scholar
Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. arXiv:1407.7400 (2014)
Yang, J., Zhang, Y.: Alternating direction algorithms for $\ell _1$-problems in compressive sensing. SIAM J. Sci. Comput. 33(1–2), 250–278 (2011)
Article MathSciNet MATH Google Scholar
Zhang, X., Burger, M., Osher, S.: A unified primal–dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2011)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors’ work is supported in part by ARL MURI Grant W911NF-09-1-0383 and NSF Grant DMS-1317602.

Author information

Authors and Affiliations

Department of Computational and Applied Mathematics, Rice University, Houston, TX, 77005, USA
Wei Deng
Department of Mathematics, University of California, Los Angeles, CA, 90095, USA
Wotao Yin

Authors

Wei Deng
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wotao Yin.

Appendices

Appendix 1: Proof of Lemma 2.1

Proof

i)
The optimality conditions of the subproblems of Algorithm 2 yield (2.6) and (2.7).
ii)
By the convexity of f (1.16), combining the optimality conditions (1.13) and (2.6) yield (2.8). Similarly, (2.9) follows from the convexity of g (1.17) and the optimality conditions (1.14) and (2.7).
iii)
Equation (2.10) follows directly from (1.15) and (2.1).
iv)
By adding (2.8) and (2.9) and using (2.10), we have
$$\begin{aligned}&\frac{1}{\beta }\langle {\uplambda }^{k}-\hat{{\uplambda }},~\hat{{\uplambda }}-{\uplambda }^*-\beta A(x^k-x^{k+1})\rangle + \langle x^{k+1}-x^*, ~\hat{P}(x^k-x^{k+1})\rangle \nonumber \\&\qquad + \langle y^{k+1}-y^*, ~Q(y^k-y^{k+1})\rangle \nonumber \\&\quad \ge \nu _f\Vert x^{k+1}-x^{*}\Vert ^2+\nu _g\Vert y^{k+1}-y^*\Vert ^2, \end{aligned}$$
(7.1)
which can be simplified as
$$\begin{aligned} (\hat{u}-u^*)^TG_1(u^k-\hat{u})\ge & {} \langle A(x^k-x^{k+1}),~{\uplambda }^k-\hat{{\uplambda }}\rangle \nonumber \\&+\,\nu _f\Vert x^{k+1}-x^{*}\Vert ^2+\nu _g\Vert y^{k+1}-y^*\Vert ^2. \end{aligned}$$
(7.2)
By rearranging the terms, we have
$$\begin{aligned} (u^k-u^*)^TG_1(u^k-\hat{u})\ge & {} \Vert u^k-\hat{u}\Vert _{G_1}^2+\langle A(x^k-x^{k+1}),{\uplambda }^k-\hat{{\uplambda }}\rangle \nonumber \\&+\,\nu _f\Vert x^{k+1}-x^{*}\Vert ^2+\nu _g\Vert y^{k+1}-y^*\Vert ^2. \end{aligned}$$
(7.3)
Then, from the identity $\Vert a-c\Vert _G^2-\Vert b-c\Vert _G^2=2(a-c)^TG(a-b)-\Vert a-b\Vert _G^2$ and (2.5), it follows that
$$\begin{aligned} \Vert u^k-u^*\Vert _G^2-\Vert u^{k+1}-u^*\Vert _G^2=2(u^k-u^*)^TG_1(u^k-\hat{u})-\Vert G_0(u^k-\hat{u})\Vert ^2_G. \end{aligned}$$
(7.4)
Substituting (7.3) into the right-hand side of the above equation, we have
$$\begin{aligned} \Vert u^k-u^*\Vert _G^2-\Vert u^{k+1}-u^*\Vert _G^2\ge & {} 2\Vert u^k-\hat{u}\Vert _{G_1}^2-\Vert u^k-\hat{u}\Vert ^2_{G_1G_0}\nonumber \\&+\,2\langle A(x^k-x^{k+1}),{\uplambda }^k-\hat{{\uplambda }}\rangle \nonumber \\&+\,2\nu _f\Vert x^{k+1}-x^{*}\Vert ^2+2\nu _g\Vert y^{k+1}-y^*\Vert ^2, \end{aligned}$$
(7.5)
and thus (2.11) follows immediately.

Appendix 2: Proof of Theorem 2.1

Proof

i)
By the Cauchy–Schwarz inequality, we have
$$\begin{aligned} 2({\uplambda }^k-\hat{{\uplambda }})^TA(x^k-x^{k+1})\ge -\frac{1}{\rho }\Vert A(x^k-x^{k+1})\Vert ^2-\rho \Vert {\uplambda }^k-\hat{{\uplambda }}\Vert ^2,~\forall \rho >0. \end{aligned}$$
(8.1)
Substituting (8.1) into (2.11) and using $\frac{1}{\gamma }({\uplambda }^k-{\uplambda }^{k+1})={\uplambda }^k-\hat{{\uplambda }}$, we have
$$\begin{aligned}&\Vert u^k-u^*\Vert _G^2-\Vert u^{k+1}-u^*\Vert _G^2\nonumber \\&\quad \ge ~\Vert x^k-x^{k+1}\Vert _{\hat{P}-\frac{1}{\rho }A^TA}^2+\Vert y^k-y^{k+1}\Vert _Q^2+\left( \frac{2-\gamma }{\beta }-\rho \right) \frac{1}{\gamma ^2}\Vert {\uplambda }^k-{\uplambda }^{k+1}\Vert ^2\nonumber \\&\qquad +\,2\nu _f\Vert x^{k+1}-x^{*}\Vert ^2+2\nu _g\Vert y^{k+1}-y^*\Vert ^2,\quad \forall \rho >0. \end{aligned}$$
(8.2)
To show that (2.14) holds for a certain $\eta >0$, we only need $\hat{P}-\frac{1}{\rho }A^TA\succ 0$ and $\frac{2-\gamma }{\beta }-\rho >0$ for a certain $\rho >0$, which is true if and only if we have $\hat{P}\succ \frac{\beta }{2-\gamma }A^T A$ or, equivalently, (2.13).
ii)
For $P=\mathbf {0}$, we first derive a lower bound for the cross term $({\uplambda }^k-\hat{{\uplambda }})^TA(x^k-x^{k+1})$. Applying (2.6) at two consecutive iterations with $P=\mathbf {0}$ and in light of the definition of $\hat{{\uplambda }}$, we have
$$\begin{aligned} \left\{ \begin{array}{l} A^T[{\uplambda }^{k-1}-\beta (Ax^{k}+By^{k}-b)] \in \partial f(x^{k}),\\ A^T\hat{{\uplambda }} \in \partial f(x^{k+1}). \end{array} \right. \end{aligned}$$
(8.3)
The difference of the two terms on the left in (8.3) is
$$\begin{aligned}&A^T\left[ {\uplambda }^{k-1}-\beta (Ax^{k}+By^{k}-b)-\hat{{\uplambda }}\right] \nonumber \\&\quad =A^T\left( {\uplambda }^k-\hat{{\uplambda }}\right) -(1-\gamma )\beta A^T\left( Ax^{k}+By^{k}-b\right) . \end{aligned}$$
(8.4)
By (8.3), (8.4) and (1.16), we get
$$\begin{aligned}&\langle A^T({\uplambda }^k-\hat{{\uplambda }}),x^k-x^{k+1}\rangle - \langle (1-\gamma )\beta A^T(Ax^{k}+By^{k}-b),x^k-x^{k+1}\rangle \nonumber \\&\quad \ge \nu _f\Vert x^k-x^{k+1}\Vert ^2, \end{aligned}$$
(8.5)
to which applying the Cauchy–Schwarz inequality gives
$$\begin{aligned}&({\uplambda }^k-\hat{{\uplambda }})^T A(x^k-x^{k+1})\nonumber \\&\quad \ge \langle \sqrt{\beta } (Ax^{k}+By^{k}-b), (1-\gamma )\sqrt{\beta }A(x^k-x^{k+1})\rangle +\nu _f\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\quad \ge -\frac{\beta }{2\rho }\Vert Ax^{k}+By^{k}-b\Vert ^2-\frac{(1-\gamma )^2\beta \rho }{2}\Vert A(x^k-x^{k+1})\Vert ^2\nonumber \\&\qquad +\nu _f\Vert x^k-x^{k+1}\Vert ^2,\quad \forall \rho >0. \end{aligned}$$
(8.6)
Substituting (8.6) into (2.11) and using $\hat{P}=P+\beta A^T A=\beta A^T A$ and the definition of $\hat{{\uplambda }}$, we have
$$\begin{aligned}&\Vert u^k-u^*\Vert _G^2+\frac{\beta }{\rho }\Vert Ax^{k}+By^{k}-b\Vert ^2 \nonumber \\&\quad \ge ~\Vert u^{k+1}-u^*\Vert _G^2+\frac{\beta }{\rho }\Vert Ax^{k+1}+By^{k+1}-b\Vert ^2\nonumber \\&\qquad +\,\beta \left( 2-\gamma -\frac{1}{\rho }\right) \Vert Ax^{k+1} +By^{k+1}-b\Vert ^2 +\beta \left( 1-(1-\gamma )^2\rho \right) \Vert A(x^k-x^{k+1})\Vert ^2\nonumber \\&\qquad +\,\Vert y^k-y^{k+1}\Vert ^2_Q+2\nu _f\Vert x^k-x^{k+1}\Vert ^2+2\nu _f\Vert x^{k+1}-x^*\Vert ^2+2\nu _g\Vert y^{k+1}-y^*\Vert ^2. \end{aligned}$$
(8.7)
To prove such $\eta >0$ exists for (2.16), we only need the existence of $\rho >0$ such that $2-\gamma -\frac{1}{\rho }>0$ and $1-(1-\gamma )^2\rho >0$, which holds if and only if $2-\gamma >(1-\gamma )^2$ or, equivalently, $\gamma \in (0,\frac{1+\sqrt{5}}{2})$. In this case of $P=\mathbf {0}$, if we set $\gamma =1$, (8.6) reduces to $({\uplambda }^k-\hat{{\uplambda }})^T A(x^k-x^{k+1})\ge \nu _f\Vert x^k-x^{k+1}\Vert ^2$, which substituting into (2.11) gives (2.17) with $\eta = 1$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, W., Yin, W. On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers. J Sci Comput 66, 889–916 (2016). https://doi.org/10.1007/s10915-015-0048-x

Download citation

Received: 20 May 2015
Accepted: 21 May 2015
Published: 30 May 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10915-015-0048-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Abstract

Access this article

Similar content being viewed by others

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

On the linear convergence of the alternating direction method of multipliers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Lemma 2.1

Proof

Appendix 2: Proof of Theorem 2.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Abstract

Access this article

Similar content being viewed by others

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

On the linear convergence of the alternating direction method of multipliers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Lemma 2.1

Proof

Appendix 2: Proof of Theorem 2.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation