Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases

Tan, Pauline

doi:10.1007/s10957-017-1211-3

Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases

Published: 20 December 2017

Volume 176, pages 377–398, (2018)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Pauline Tan ORCID: orcid.org/0000-0003-3410-0379¹

381 Accesses
Explore all metrics

Abstract

In the present paper, we propose a novel convergence analysis of the alternating direction method of multipliers, based on its equivalence with the overrelaxed primal–dual hybrid gradient algorithm. We consider the smooth case, where the objective function can be decomposed into one differentiable with Lipschitz continuous gradient part and one strongly convex part. Under these hypotheses, a convergence proof with an optimal parameter choice is given for the primal–dual method, which leads to convergence results for the alternating direction method of multipliers. An accelerated variant of the latter, based on a parameter relaxation, is also proposed, which is shown to converge linearly with same asymptotic rate as the primal–dual algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Convergence Rates of Proximal Alternating Direction Method of Multipliers

Article Open access 02 November 2023

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

Article 04 January 2021

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Article 21 November 2014

Notes

that is, g is not identically equal to $+\infty $

References

Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(2), 41–76 (1975)
Article Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2012)
MathSciNet MATH Google Scholar
Zhu, M., Chan, T.: An efficient primal–dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report. 8-34 (2008)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford–Shah functional. In: IEEE International Conference on Computer Vision. pp. 1133–1140 (2009)
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. JMIV 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Nishihara R., Lessard L., Recht B., Packard A., Jordan, M.: A general analysis of the convergence of ADMM. In: International Conference on Machine Learning, pp. 343–352 (2015)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42, 783–805 (2017)
He, B., You, Y., Yuan, X.: On the convergence of primal–dual hybrid gradient algorithm. SIAM J. Imaging Sci. 7(4), 2526–2537 (2014)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)
MATH Google Scholar
Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds.) Advances in Neural Information Processing Systems, pp. 980–988. Curran Associates, Inc., New York (2013)
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93(2), 273–299 (1965)
Article MathSciNet MATH Google Scholar
Arrow, K., Hurwicz, L., Uzawa, H., Chenery, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Redwood City (1958)
MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Article MathSciNet MATH Google Scholar
Necoara, I, Nesterov, Y, Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. preprint arXiv:1504.06298 (2015)
Allaire, G.: Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole Polytechnique (2005)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Boston (2004)
Book MATH Google Scholar
O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Article MathSciNet MATH Google Scholar
Liang, J., Fadili, J., Peyré, G.: Local linear convergence analysis of primal–dual splitting methods. preprint arXiv:1705.01926 (2017)

Download references

Author information

Authors and Affiliations

Centre de Mathématiques Appliquées, CNRS, École polytechnique, Université Paris Saclay, Palaiseau, France
Pauline Tan

Authors

Pauline Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pauline Tan.

Additional information

Communicated by Jalal M. Fadili.

Appendices

Appendix A: Proof of Theorem 3.1 and of Corollary 3.1

We proceed analogously to [7, Section 5.2], but we do not specify any parameter unless needed. This proof is also inspired by the one found in [9], which does not allow $\theta \ne 1$. For now, we only assume that $0<\theta \le 1$.

We suppose that $\mathcal {L}$ satisfy Assumption (S2). Let $(y_n,\xi _n,\bar{\xi }_n)_{n}$ a sequence generated by Algorithm 1. For any $N\ge 1$, points $\xi _{n+1}$ and $y_{n+1}$ can be seen as minimizers of convex problem, thus, the first-order optimality condition yields, respectively, $-(\xi _{n+1}-\xi _n)/\tau - K^*y_{n+1} \in \partial G(\xi _{n+1})$ and $-(y_{n+1}-y_n)/\sigma + K\bar{\xi }_n \in \partial H^*(y_{n+1})$. Using the definition of strong convexity, we get for any $(\xi ,y)\in Z\times Y$ (after expanding the scalar products and summing the two inequalities for G and $H^*$)

$$\begin{aligned} \mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1})\le & {} \frac{\Vert \xi -\xi _n\Vert ^2}{2\tau } - (1+\tau \tilde{\gamma })\frac{\Vert \xi -\xi _{n+1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&+\, \frac{\Vert y-y_n\Vert ^2}{2\sigma } - (1+\sigma \tilde{\delta })\,\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } - \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&+\langle K(\xi _{n+1}-\xi ),y_{n+1}-y_n\rangle -\langle K(\xi _{n+1}-\bar{\xi }_n),y_{n+1}-y\rangle .\qquad \qquad \end{aligned}$$

(65)

For any $N\ge 1$ and $(\xi ,y)\in Z\times Y$, we set $\Delta _n = \Vert \xi -\xi _n\Vert ^2/(2\tau ) + \Vert y-y_n\Vert ^2/(2\sigma )$. Let us first prove the following lemma:

Lemma A.1

Let $\mathcal {L}$ satisfy Assumption (S2) and $(y_n,\xi _n,\bar{\xi }_n)_{n}$ a sequence generated by Algorithm 1. Then, for any integer $N\ge 1$, $\tau ,\sigma >0$ and $0<\omega \le \theta $, we have for any $(\xi ,y)\in Z\times Y$

$$\begin{aligned} \mathcal {G}(\xi _n,\xi ;y,y_n) \le \Delta _n -\frac{\Delta _{n+1}}{\omega } +\omega \,\frac{\Vert \xi _{n-1}-\xi _n\Vert ^2}{2\tau } +\frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau } \\ +\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\nonumber \end{aligned}$$

(66)

Proof

Replacing $\bar{\xi }_n$ by $\xi _n+\theta \,(\xi _n-\xi _{n-1})$ in (65), we get after simplification

$$\begin{aligned} \mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1})&\le \frac{\Vert \xi -\xi _n\Vert ^2}{2\tau } - (1+\tau \tilde{\gamma })\,\frac{\Vert \xi -\xi _{n+1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&\quad + \frac{\Vert y-y_n\Vert ^2}{2\sigma } - (1+\sigma \tilde{\delta })\,\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } - \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&\quad +\theta \,\langle K(\xi _{n-1}-\xi _n),y-y_{n+1}\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\nonumber \\ \end{aligned}$$

(67)

Now, we define $\tau \tilde{\gamma }= \mu >0$ and $\sigma \tilde{\delta }= \mu '>0$. Let us bound the scalar products in (67). Introducing $0<\omega \le \theta $, we have

$$\begin{aligned} \theta \,\langle K(\xi _{n-1}-\xi _n),y-y_{n+1}\rangle&= \omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle \nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y_n-y_{n+1}\rangle \end{aligned}$$

(68)

Let us have a closer look at the last two terms. Let $\alpha >0$. Since $\langle K\Xi ,Y\rangle \le L_K(\alpha \Vert \Xi \Vert ^2+\Vert Y\Vert ^2/\alpha )/2$, the majoration (67) becomes, after simplification,

$$\begin{aligned}&\mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1}) \le \Delta _n - (1+\mu )\,\Delta _{n+1} +\left( \frac{\omega L_K\sigma }{\alpha } - 1\right) \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&\quad +\,\!\left( \!\!\frac{(\theta -\omega )L_K\sigma }{\alpha }+\mu -\mu '\!\right) \!\!\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } +\theta L_K\alpha \tau \frac{\Vert \xi _{n-1}-\xi _n\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\qquad \end{aligned}$$

(69)

Choose $\alpha = \omega L_K\sigma $. Hence, $\omega L_K/\alpha =1/\sigma $, so that the $\Vert y_n-y_{n+1}\Vert ^2$ term cancels. Since $1+\mu = 1/\omega + 1+\mu -1/\omega $, one gets

$$\begin{aligned}&\mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1}) \le \Delta _n-\frac{\Delta _{n+1}}{\omega } +\omega \theta L_K^2\tau \sigma \frac{\Vert \xi _n-\xi _{n-1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&\quad +\, \left( \frac{1}{\omega }-\mu -1\right) \frac{\Vert \xi -\xi _{n+1} \Vert ^2}{2\tau } + {\left( \frac{\theta -\omega }{\omega }+\frac{1}{\omega }-\mu '-1\right) }\frac{\Vert y-y_{n+1} \Vert ^2}{2\sigma }\nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle . \end{aligned}$$

(70)

We can now set conditions on $\omega $, $\theta $, $\tau $ and $\sigma $. First, choose $\theta $, $\tau $ and $\sigma $ so that $\theta L_K^2\tau \sigma \le 1$. Then, choose $\theta $ so that both the parentheses are nonpositive, which implies that $1/(\mu +1)\le \omega $ and $(\theta +1)/(\mu '+2) \le \omega $. Since $\omega \le \theta $, we get the wanted inequality. $\square $

Let us now prove Corollary 3.1 and then Theorem 3.1. Set $\xi ^{-1} = \xi ^0$. Multiplying (15) by $1/\omega ^n$ and summing between $n=0$ and $n=N-1$ yield

$$\begin{aligned} \sum _{n=1}^N\frac{1}{\omega ^{n-1}}\,\mathcal {G}(\xi _n,\xi ;y,y_n)&\le \Delta _0-\frac{\Delta _N}{\omega ^N} - \frac{1}{\omega ^{N-1}}\,\frac{\Vert \xi _{N-1}-\xi _N\Vert ^2}{2\tau }\nonumber \\&\quad -\frac{1}{\omega ^{N-1}}\,\langle K(\xi _{N-1}-\xi _N),y-y_N\rangle . \end{aligned}$$

(71)

One has, for any $\beta >0$,

$$\begin{aligned} -\langle K(\xi _{N-1}-\xi _N),y-y_N\rangle \le L_K \left( \frac{\beta }{2}\,\Vert \xi _{N-1}-\xi _N \Vert ^2 +\frac{1}{2\beta }\,\Vert y-y_N\Vert ^2\right) \end{aligned}$$

(72)

and thus the right-hand side of (71) is bounded from above by

$$\begin{aligned} \Delta _0-\frac{1}{\omega ^N}\,\Delta _N + \frac{1}{\omega ^{N-1}}(L_K\beta \tau - 1)\,\frac{\Vert \xi _{N-1}-\xi _N\Vert ^2}{2\tau } + \frac{L_K}{\omega ^{N-1}}\frac{1}{2\beta }\,\Vert y_N-y\Vert ^2. \end{aligned}$$

(73)

Choosing $\beta = 1/(L_K\tau )$, which cancels the $\Vert \xi _{N-1}-\xi _N\Vert ^2$ term, we have, since $\omega L_K^2\tau \sigma \le \theta L_K^2\tau \sigma \le 1$ and $\mathcal {G}(\xi _n,\xi ^*;y^*,y_n)\ge 0$ for any $N\ge 1$,

$$\begin{aligned} 0&\le \frac{\Vert \xi ^*-\xi _N \Vert ^2}{2\tau } +(1-\omega L_K^2\tau \sigma )\frac{\Vert y^*-y_N\Vert ^2}{2\sigma } +\sum _{n=1}^N\frac{\omega ^N}{\omega ^{n-1}}\,\mathcal {G}(\xi _n,\xi ^*;y^*,y_n) \nonumber \\&\le \omega ^N\left( \frac{\Vert \xi ^*-\xi _0 \Vert ^2}{2\tau } + \frac{\Vert y^*-y_0 \Vert ^2}{2\sigma }\right) . \end{aligned}$$

(74)

This inequality proves the linear convergence of the iterates (Corollary 3.1). We can now complete the proof of Theorem 3.1. Now, dividing (74) by $\omega ^NT_N$ and using convexity gives Theorem 3.1.$\square $

Appendix B: Equivalence Between Algorithm 4 and Algorithm 2

Let us prove that the iterations of Algorithm 4 are equivalent to (41). Let $n\ge 0$. By optimality in (37), one has for any $x\in X$

$$\begin{aligned} g(x_{n+1}) + \langle Ax_{n+1},y_n\rangle + \frac{\Vert Ax_{n+1}-z_n\Vert ^2}{2\tau } \le g(x) + \langle Ax,y_n\rangle + \frac{\Vert Ax-z_n\Vert ^2}{2\tau } \end{aligned}$$

(75)

For any $\xi \in Y$, we define

$$\begin{aligned} g_A(\xi ):=\left\{ \begin{array}{ll} \inf \{g(x) : Ax=\xi \}, &{} \quad \mathrm {if}\,\{x\in X : Ax=\xi \}\ne \emptyset ,\\ +\infty , &{} \quad \mathrm {otherwise}. \end{array}\right. \end{aligned}$$

(76)

Let $\xi \in Y$. Suppose that $\{x\in X : Ax=\xi \}\ne \emptyset $. Then, taking the infimum over $\{x\in X : Ax=\xi \}$, we have

$$\begin{aligned} g(x_{n+1}) + \langle Ax_{n+1},y_n\rangle + \frac{\Vert Ax_{n+1}-z_n\Vert ^2}{2\tau } \le g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau }. \end{aligned}$$

(77)

Let us define $\xi _{n+1}:=Ax_{n+1}$. Since $g_A(\xi _{n+1})\le g(x_{n+1})$, one has

$$\begin{aligned} g_A(\xi _{n+1}) + \langle \xi _{n+1},y_n\rangle + \frac{\Vert \xi _{n+1}-z_n\Vert ^2}{2\tau } \le g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau } \end{aligned}$$

(78)

for any $\xi \in Y$, including those such that $\{x\in X : Ax=\xi \}$ is empty. Thus

$$\begin{aligned} g_A(\xi _{n+1}) + \langle \xi _{n+1},y_n\rangle + \frac{\Vert \xi _{n+1}-z_n\Vert ^2}{2\tau } = \min _{\xi \in Y}\left\{ g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau }\right\} . \end{aligned}$$

(79)

In particular, one can check that $\xi _{n+1} = \mathrm {prox}_{\tau g_A}(z_n-\tau \,y_n)$. Let us write the y-update (39). We have

$$\begin{aligned} y_{n+1} = y_n + \frac{\xi _{n+1} - z_{n+1}}{\tau '}&\Longleftrightarrow z_{n+1} = \xi _{n+1} - \tau '\,(y_{n+1}-y_n)\end{aligned}$$

(80)

$$\begin{aligned}&\Longleftrightarrow z_{n+1}+\tau '\,y_{n+1} = \xi _{n+1} + \tau '\,y_n. \end{aligned}$$

(81)

Let us define $\bar{y}_{n+1}$ as in (30). Equation (80) implies that, for any $n\ge 1$, one has $z_n-\tau \,y_n = \xi _n - \tau '\,(y_n-y_{n-1})-\tau \,y_n = \xi _n-\tau \,(y_n + (\tau '/\tau )\,(y_n-y_{n-1}))$. Injecting the latter equality in the expression of $\xi _{n+1}$ yields (28). The z-update (38) can be rewritten as a proximal step

$$\begin{aligned} z_{n+1} = \mathrm {prox}_{\tau ' h}(\xi _{n+1} + \tau '\,y_n) = \mathrm {prox}_{\tau ' h}(z_{n+1}+\tau '\,y_{n+1}). \end{aligned}$$

(82)

Using Moreau’s decomposition [18], we eventually obtain (29).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, P. Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases. J Optim Theory Appl 176, 377–398 (2018). https://doi.org/10.1007/s10957-017-1211-3

Download citation

Received: 23 January 2017
Accepted: 15 December 2017
Published: 20 December 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10957-017-1211-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases

Abstract

Access this article

Similar content being viewed by others

On Convergence Rates of Proximal Alternating Direction Method of Multipliers

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Theorem 3.1 and of Corollary 3.1

Lemma A.1

Proof

Appendix B: Equivalence Between Algorithm 4 and Algorithm 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases

Abstract

Access this article

Similar content being viewed by others

On Convergence Rates of Proximal Alternating Direction Method of Multipliers

Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Theorem 3.1 and of Corollary 3.1

Lemma A.1

Proof

Appendix B: Equivalence Between Algorithm 4 and Algorithm 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation