Skip to main content
Log in

Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In the present paper, we propose a novel convergence analysis of the alternating direction method of multipliers, based on its equivalence with the overrelaxed primal–dual hybrid gradient algorithm. We consider the smooth case, where the objective function can be decomposed into one differentiable with Lipschitz continuous gradient part and one strongly convex part. Under these hypotheses, a convergence proof with an optimal parameter choice is given for the primal–dual method, which leads to convergence results for the alternating direction method of multipliers. An accelerated variant of the latter, based on a parameter relaxation, is also proposed, which is shown to converge linearly with same asymptotic rate as the primal–dual algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. that is, g is not identically equal to \(+\infty \)

References

  1. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  2. Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(2), 41–76 (1975)

    Article  Google Scholar 

  3. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  4. Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Zhu, M., Chan, T.: An efficient primal–dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report. 8-34 (2008)

  6. Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford–Shah functional. In: IEEE International Conference on Computer Vision. pp. 1133–1140 (2009)

  7. Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. JMIV 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  11. Nishihara R., Lessard L., Recht B., Packard A., Jordan, M.: A general analysis of the convergence of ADMM. In: International Conference on Machine Learning, pp. 343–352 (2015)

  12. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42, 783–805 (2017)

  14. He, B., You, Y., Yuan, X.: On the convergence of primal–dual hybrid gradient algorithm. SIAM J. Imaging Sci. 7(4), 2526–2537 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)

    MATH  Google Scholar 

  17. Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds.) Advances in Neural Information Processing Systems, pp. 980–988. Curran Associates, Inc., New York (2013)

  18. Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93(2), 273–299 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  19. Arrow, K., Hurwicz, L., Uzawa, H., Chenery, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Redwood City (1958)

    MATH  Google Scholar 

  20. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Necoara, I, Nesterov, Y, Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. preprint arXiv:1504.06298 (2015)

  22. Allaire, G.: Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole Polytechnique (2005)

  23. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Boston (2004)

    Book  MATH  Google Scholar 

  24. O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Liang, J., Fadili, J., Peyré, G.: Local linear convergence analysis of primal–dual splitting methods. preprint arXiv:1705.01926 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pauline Tan.

Additional information

Communicated by Jalal M. Fadili.

Appendices

Appendix A: Proof of Theorem 3.1 and of Corollary 3.1

We proceed analogously to [7, Section 5.2], but we do not specify any parameter unless needed. This proof is also inspired by the one found in [9], which does not allow \(\theta \ne 1\). For now, we only assume that \(0<\theta \le 1\).

We suppose that \(\mathcal {L}\) satisfy Assumption (S2). Let \((y_n,\xi _n,\bar{\xi }_n)_{n}\) a sequence generated by Algorithm 1. For any \(N\ge 1\), points \(\xi _{n+1}\) and \(y_{n+1}\) can be seen as minimizers of convex problem, thus, the first-order optimality condition yields, respectively, \(-(\xi _{n+1}-\xi _n)/\tau - K^*y_{n+1} \in \partial G(\xi _{n+1})\) and \(-(y_{n+1}-y_n)/\sigma + K\bar{\xi }_n \in \partial H^*(y_{n+1})\). Using the definition of strong convexity, we get for any \((\xi ,y)\in Z\times Y\) (after expanding the scalar products and summing the two inequalities for G and \(H^*\))

$$\begin{aligned} \mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1})\le & {} \frac{\Vert \xi -\xi _n\Vert ^2}{2\tau } - (1+\tau \tilde{\gamma })\frac{\Vert \xi -\xi _{n+1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&+\, \frac{\Vert y-y_n\Vert ^2}{2\sigma } - (1+\sigma \tilde{\delta })\,\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } - \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&+\langle K(\xi _{n+1}-\xi ),y_{n+1}-y_n\rangle -\langle K(\xi _{n+1}-\bar{\xi }_n),y_{n+1}-y\rangle .\qquad \qquad \end{aligned}$$
(65)

For any \(N\ge 1\) and \((\xi ,y)\in Z\times Y\), we set \(\Delta _n = \Vert \xi -\xi _n\Vert ^2/(2\tau ) + \Vert y-y_n\Vert ^2/(2\sigma )\). Let us first prove the following lemma:

Lemma A.1

Let \(\mathcal {L}\) satisfy Assumption (S2) and \((y_n,\xi _n,\bar{\xi }_n)_{n}\) a sequence generated by Algorithm 1. Then, for any integer \(N\ge 1\), \(\tau ,\sigma >0\) and \(0<\omega \le \theta \), we have for any \((\xi ,y)\in Z\times Y\)

$$\begin{aligned} \mathcal {G}(\xi _n,\xi ;y,y_n) \le \Delta _n -\frac{\Delta _{n+1}}{\omega } +\omega \,\frac{\Vert \xi _{n-1}-\xi _n\Vert ^2}{2\tau } +\frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau } \\ +\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\nonumber \end{aligned}$$
(66)

Proof

Replacing \(\bar{\xi }_n\) by \(\xi _n+\theta \,(\xi _n-\xi _{n-1})\) in (65), we get after simplification

$$\begin{aligned} \mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1})&\le \frac{\Vert \xi -\xi _n\Vert ^2}{2\tau } - (1+\tau \tilde{\gamma })\,\frac{\Vert \xi -\xi _{n+1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&\quad + \frac{\Vert y-y_n\Vert ^2}{2\sigma } - (1+\sigma \tilde{\delta })\,\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } - \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&\quad +\theta \,\langle K(\xi _{n-1}-\xi _n),y-y_{n+1}\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\nonumber \\ \end{aligned}$$
(67)

Now, we define \(\tau \tilde{\gamma }= \mu >0\) and \(\sigma \tilde{\delta }= \mu '>0\). Let us bound the scalar products in (67). Introducing \(0<\omega \le \theta \), we have

$$\begin{aligned} \theta \,\langle K(\xi _{n-1}-\xi _n),y-y_{n+1}\rangle&= \omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle \nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y_n-y_{n+1}\rangle \end{aligned}$$
(68)

Let us have a closer look at the last two terms. Let \(\alpha >0\). Since \(\langle K\Xi ,Y\rangle \le L_K(\alpha \Vert \Xi \Vert ^2+\Vert Y\Vert ^2/\alpha )/2\), the majoration (67) becomes, after simplification,

$$\begin{aligned}&\mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1}) \le \Delta _n - (1+\mu )\,\Delta _{n+1} +\left( \frac{\omega L_K\sigma }{\alpha } - 1\right) \frac{\Vert y_n-y_{n+1}\Vert ^2}{2\sigma }\nonumber \\&\quad +\,\!\left( \!\!\frac{(\theta -\omega )L_K\sigma }{\alpha }+\mu -\mu '\!\right) \!\!\frac{\Vert y-y_{n+1}\Vert ^2}{2\sigma } +\theta L_K\alpha \tau \frac{\Vert \xi _{n-1}-\xi _n\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle .\qquad \end{aligned}$$
(69)

Choose \(\alpha = \omega L_K\sigma \). Hence, \(\omega L_K/\alpha =1/\sigma \), so that the \(\Vert y_n-y_{n+1}\Vert ^2\) term cancels. Since \(1+\mu = 1/\omega + 1+\mu -1/\omega \), one gets

$$\begin{aligned}&\mathcal {G}(\xi _{n+1},\xi ;y,y_{n+1}) \le \Delta _n-\frac{\Delta _{n+1}}{\omega } +\omega \theta L_K^2\tau \sigma \frac{\Vert \xi _n-\xi _{n-1}\Vert ^2}{2\tau } - \frac{\Vert \xi _n-\xi _{n+1}\Vert ^2}{2\tau }\quad \nonumber \\&\quad +\, \left( \frac{1}{\omega }-\mu -1\right) \frac{\Vert \xi -\xi _{n+1} \Vert ^2}{2\tau } + {\left( \frac{\theta -\omega }{\omega }+\frac{1}{\omega }-\mu '-1\right) }\frac{\Vert y-y_{n+1} \Vert ^2}{2\sigma }\nonumber \\&\quad +\,\omega \,\langle K(\xi _{n-1}-\xi _n),y-y_n\rangle -\langle K(\xi _n-\xi _{n+1}),y-y_{n+1}\rangle . \end{aligned}$$
(70)

We can now set conditions on \(\omega \)\(\theta \)\(\tau \) and \(\sigma \). First, choose \(\theta \)\(\tau \) and \(\sigma \) so that \(\theta L_K^2\tau \sigma \le 1\). Then, choose \(\theta \) so that both the parentheses are nonpositive, which implies that \(1/(\mu +1)\le \omega \) and \((\theta +1)/(\mu '+2) \le \omega \). Since \(\omega \le \theta \), we get the wanted inequality. \(\square \)

Let us now prove Corollary 3.1 and then Theorem 3.1. Set \(\xi ^{-1} = \xi ^0\). Multiplying (15) by \(1/\omega ^n\) and summing between \(n=0\) and \(n=N-1\) yield

$$\begin{aligned} \sum _{n=1}^N\frac{1}{\omega ^{n-1}}\,\mathcal {G}(\xi _n,\xi ;y,y_n)&\le \Delta _0-\frac{\Delta _N}{\omega ^N} - \frac{1}{\omega ^{N-1}}\,\frac{\Vert \xi _{N-1}-\xi _N\Vert ^2}{2\tau }\nonumber \\&\quad -\frac{1}{\omega ^{N-1}}\,\langle K(\xi _{N-1}-\xi _N),y-y_N\rangle . \end{aligned}$$
(71)

One has, for any \(\beta >0\),

$$\begin{aligned} -\langle K(\xi _{N-1}-\xi _N),y-y_N\rangle \le L_K \left( \frac{\beta }{2}\,\Vert \xi _{N-1}-\xi _N \Vert ^2 +\frac{1}{2\beta }\,\Vert y-y_N\Vert ^2\right) \end{aligned}$$
(72)

and thus the right-hand side of (71) is bounded from above by

$$\begin{aligned} \Delta _0-\frac{1}{\omega ^N}\,\Delta _N + \frac{1}{\omega ^{N-1}}(L_K\beta \tau - 1)\,\frac{\Vert \xi _{N-1}-\xi _N\Vert ^2}{2\tau } + \frac{L_K}{\omega ^{N-1}}\frac{1}{2\beta }\,\Vert y_N-y\Vert ^2. \end{aligned}$$
(73)

Choosing \(\beta = 1/(L_K\tau )\), which cancels the \(\Vert \xi _{N-1}-\xi _N\Vert ^2\) term, we have, since \(\omega L_K^2\tau \sigma \le \theta L_K^2\tau \sigma \le 1\) and \(\mathcal {G}(\xi _n,\xi ^*;y^*,y_n)\ge 0\) for any \(N\ge 1\),

$$\begin{aligned} 0&\le \frac{\Vert \xi ^*-\xi _N \Vert ^2}{2\tau } +(1-\omega L_K^2\tau \sigma )\frac{\Vert y^*-y_N\Vert ^2}{2\sigma } +\sum _{n=1}^N\frac{\omega ^N}{\omega ^{n-1}}\,\mathcal {G}(\xi _n,\xi ^*;y^*,y_n) \nonumber \\&\le \omega ^N\left( \frac{\Vert \xi ^*-\xi _0 \Vert ^2}{2\tau } + \frac{\Vert y^*-y_0 \Vert ^2}{2\sigma }\right) . \end{aligned}$$
(74)

This inequality proves the linear convergence of the iterates (Corollary 3.1). We can now complete the proof of Theorem 3.1. Now, dividing (74) by \(\omega ^NT_N\) and using convexity gives Theorem 3.1.\(\square \)

Appendix B: Equivalence Between Algorithm 4 and Algorithm 2

Let us prove that the iterations of Algorithm 4 are equivalent to (41). Let \(n\ge 0\). By optimality in (37), one has for any \(x\in X\)

$$\begin{aligned} g(x_{n+1}) + \langle Ax_{n+1},y_n\rangle + \frac{\Vert Ax_{n+1}-z_n\Vert ^2}{2\tau } \le g(x) + \langle Ax,y_n\rangle + \frac{\Vert Ax-z_n\Vert ^2}{2\tau } \end{aligned}$$
(75)

For any \(\xi \in Y\), we define

$$\begin{aligned} g_A(\xi ):=\left\{ \begin{array}{ll} \inf \{g(x) : Ax=\xi \}, &{} \quad \mathrm {if}\,\{x\in X : Ax=\xi \}\ne \emptyset ,\\ +\infty , &{} \quad \mathrm {otherwise}. \end{array}\right. \end{aligned}$$
(76)

Let \(\xi \in Y\). Suppose that \(\{x\in X : Ax=\xi \}\ne \emptyset \). Then, taking the infimum over \(\{x\in X : Ax=\xi \}\), we have

$$\begin{aligned} g(x_{n+1}) + \langle Ax_{n+1},y_n\rangle + \frac{\Vert Ax_{n+1}-z_n\Vert ^2}{2\tau } \le g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau }. \end{aligned}$$
(77)

Let us define \(\xi _{n+1}:=Ax_{n+1}\). Since \(g_A(\xi _{n+1})\le g(x_{n+1})\), one has

$$\begin{aligned} g_A(\xi _{n+1}) + \langle \xi _{n+1},y_n\rangle + \frac{\Vert \xi _{n+1}-z_n\Vert ^2}{2\tau } \le g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau } \end{aligned}$$
(78)

for any \(\xi \in Y\), including those such that \(\{x\in X : Ax=\xi \}\) is empty. Thus

$$\begin{aligned} g_A(\xi _{n+1}) + \langle \xi _{n+1},y_n\rangle + \frac{\Vert \xi _{n+1}-z_n\Vert ^2}{2\tau } = \min _{\xi \in Y}\left\{ g_A(\xi ) + \langle \xi ,y_n\rangle + \frac{\Vert \xi -z_n\Vert ^2}{2\tau }\right\} . \end{aligned}$$
(79)

In particular, one can check that \(\xi _{n+1} = \mathrm {prox}_{\tau g_A}(z_n-\tau \,y_n)\). Let us write the y-update (39). We have

$$\begin{aligned} y_{n+1} = y_n + \frac{\xi _{n+1} - z_{n+1}}{\tau '}&\Longleftrightarrow z_{n+1} = \xi _{n+1} - \tau '\,(y_{n+1}-y_n)\end{aligned}$$
(80)
$$\begin{aligned}&\Longleftrightarrow z_{n+1}+\tau '\,y_{n+1} = \xi _{n+1} + \tau '\,y_n. \end{aligned}$$
(81)

Let us define \(\bar{y}_{n+1}\) as in (30). Equation (80) implies that, for any \(n\ge 1\), one has \(z_n-\tau \,y_n = \xi _n - \tau '\,(y_n-y_{n-1})-\tau \,y_n = \xi _n-\tau \,(y_n + (\tau '/\tau )\,(y_n-y_{n-1}))\). Injecting the latter equality in the expression of \(\xi _{n+1}\) yields (28). The z-update (38) can be rewritten as a proximal step

$$\begin{aligned} z_{n+1} = \mathrm {prox}_{\tau ' h}(\xi _{n+1} + \tau '\,y_n) = \mathrm {prox}_{\tau ' h}(z_{n+1}+\tau '\,y_{n+1}). \end{aligned}$$
(82)

Using Moreau’s decomposition [18], we eventually obtain (29).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, P. Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases. J Optim Theory Appl 176, 377–398 (2018). https://doi.org/10.1007/s10957-017-1211-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-017-1211-3

Keywords

Mathematics Subject Classification

Navigation