Abstract
In the present paper, we propose a novel convergence analysis of the alternating direction method of multipliers, based on its equivalence with the overrelaxed primal–dual hybrid gradient algorithm. We consider the smooth case, where the objective function can be decomposed into one differentiable with Lipschitz continuous gradient part and one strongly convex part. Under these hypotheses, a convergence proof with an optimal parameter choice is given for the primal–dual method, which leads to convergence results for the alternating direction method of multipliers. An accelerated variant of the latter, based on a parameter relaxation, is also proposed, which is shown to converge linearly with same asymptotic rate as the primal–dual algorithm.
Similar content being viewed by others
Notes
that is, g is not identically equal to \(+\infty \)
References
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(2), 41–76 (1975)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2012)
Zhu, M., Chan, T.: An efficient primal–dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report. 8-34 (2008)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford–Shah functional. In: IEEE International Conference on Computer Vision. pp. 1133–1140 (2009)
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. JMIV 40(1), 120–145 (2011)
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nishihara R., Lessard L., Recht B., Packard A., Jordan, M.: A general analysis of the convergence of ADMM. In: International Conference on Machine Learning, pp. 343–352 (2015)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42, 783–805 (2017)
He, B., You, Y., Yuan, X.: On the convergence of primal–dual hybrid gradient algorithm. SIAM J. Imaging Sci. 7(4), 2526–2537 (2014)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)
Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds.) Advances in Neural Information Processing Systems, pp. 980–988. Curran Associates, Inc., New York (2013)
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93(2), 273–299 (1965)
Arrow, K., Hurwicz, L., Uzawa, H., Chenery, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Redwood City (1958)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Necoara, I, Nesterov, Y, Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. preprint arXiv:1504.06298 (2015)
Allaire, G.: Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole Polytechnique (2005)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Boston (2004)
O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Liang, J., Fadili, J., Peyré, G.: Local linear convergence analysis of primal–dual splitting methods. preprint arXiv:1705.01926 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jalal M. Fadili.
Appendices
Appendix A: Proof of Theorem 3.1 and of Corollary 3.1
We proceed analogously to [7, Section 5.2], but we do not specify any parameter unless needed. This proof is also inspired by the one found in [9], which does not allow \(\theta \ne 1\). For now, we only assume that \(0<\theta \le 1\).
We suppose that \(\mathcal {L}\) satisfy Assumption (S2). Let \((y_n,\xi _n,\bar{\xi }_n)_{n}\) a sequence generated by Algorithm 1. For any \(N\ge 1\), points \(\xi _{n+1}\) and \(y_{n+1}\) can be seen as minimizers of convex problem, thus, the first-order optimality condition yields, respectively, \(-(\xi _{n+1}-\xi _n)/\tau - K^*y_{n+1} \in \partial G(\xi _{n+1})\) and \(-(y_{n+1}-y_n)/\sigma + K\bar{\xi }_n \in \partial H^*(y_{n+1})\). Using the definition of strong convexity, we get for any \((\xi ,y)\in Z\times Y\) (after expanding the scalar products and summing the two inequalities for G and \(H^*\))
For any \(N\ge 1\) and \((\xi ,y)\in Z\times Y\), we set \(\Delta _n = \Vert \xi -\xi _n\Vert ^2/(2\tau ) + \Vert y-y_n\Vert ^2/(2\sigma )\). Let us first prove the following lemma:
Lemma A.1
Let \(\mathcal {L}\) satisfy Assumption (S2) and \((y_n,\xi _n,\bar{\xi }_n)_{n}\) a sequence generated by Algorithm 1. Then, for any integer \(N\ge 1\), \(\tau ,\sigma >0\) and \(0<\omega \le \theta \), we have for any \((\xi ,y)\in Z\times Y\)
Proof
Replacing \(\bar{\xi }_n\) by \(\xi _n+\theta \,(\xi _n-\xi _{n-1})\) in (65), we get after simplification
Now, we define \(\tau \tilde{\gamma }= \mu >0\) and \(\sigma \tilde{\delta }= \mu '>0\). Let us bound the scalar products in (67). Introducing \(0<\omega \le \theta \), we have
Let us have a closer look at the last two terms. Let \(\alpha >0\). Since \(\langle K\Xi ,Y\rangle \le L_K(\alpha \Vert \Xi \Vert ^2+\Vert Y\Vert ^2/\alpha )/2\), the majoration (67) becomes, after simplification,
Choose \(\alpha = \omega L_K\sigma \). Hence, \(\omega L_K/\alpha =1/\sigma \), so that the \(\Vert y_n-y_{n+1}\Vert ^2\) term cancels. Since \(1+\mu = 1/\omega + 1+\mu -1/\omega \), one gets
We can now set conditions on \(\omega \), \(\theta \), \(\tau \) and \(\sigma \). First, choose \(\theta \), \(\tau \) and \(\sigma \) so that \(\theta L_K^2\tau \sigma \le 1\). Then, choose \(\theta \) so that both the parentheses are nonpositive, which implies that \(1/(\mu +1)\le \omega \) and \((\theta +1)/(\mu '+2) \le \omega \). Since \(\omega \le \theta \), we get the wanted inequality. \(\square \)
Let us now prove Corollary 3.1 and then Theorem 3.1. Set \(\xi ^{-1} = \xi ^0\). Multiplying (15) by \(1/\omega ^n\) and summing between \(n=0\) and \(n=N-1\) yield
One has, for any \(\beta >0\),
and thus the right-hand side of (71) is bounded from above by
Choosing \(\beta = 1/(L_K\tau )\), which cancels the \(\Vert \xi _{N-1}-\xi _N\Vert ^2\) term, we have, since \(\omega L_K^2\tau \sigma \le \theta L_K^2\tau \sigma \le 1\) and \(\mathcal {G}(\xi _n,\xi ^*;y^*,y_n)\ge 0\) for any \(N\ge 1\),
This inequality proves the linear convergence of the iterates (Corollary 3.1). We can now complete the proof of Theorem 3.1. Now, dividing (74) by \(\omega ^NT_N\) and using convexity gives Theorem 3.1.\(\square \)
Appendix B: Equivalence Between Algorithm 4 and Algorithm 2
Let us prove that the iterations of Algorithm 4 are equivalent to (41). Let \(n\ge 0\). By optimality in (37), one has for any \(x\in X\)
For any \(\xi \in Y\), we define
Let \(\xi \in Y\). Suppose that \(\{x\in X : Ax=\xi \}\ne \emptyset \). Then, taking the infimum over \(\{x\in X : Ax=\xi \}\), we have
Let us define \(\xi _{n+1}:=Ax_{n+1}\). Since \(g_A(\xi _{n+1})\le g(x_{n+1})\), one has
for any \(\xi \in Y\), including those such that \(\{x\in X : Ax=\xi \}\) is empty. Thus
In particular, one can check that \(\xi _{n+1} = \mathrm {prox}_{\tau g_A}(z_n-\tau \,y_n)\). Let us write the y-update (39). We have
Let us define \(\bar{y}_{n+1}\) as in (30). Equation (80) implies that, for any \(n\ge 1\), one has \(z_n-\tau \,y_n = \xi _n - \tau '\,(y_n-y_{n-1})-\tau \,y_n = \xi _n-\tau \,(y_n + (\tau '/\tau )\,(y_n-y_{n-1}))\). Injecting the latter equality in the expression of \(\xi _{n+1}\) yields (28). The z-update (38) can be rewritten as a proximal step
Using Moreau’s decomposition [18], we eventually obtain (29).
Rights and permissions
About this article
Cite this article
Tan, P. Linear Convergence Rates for Variants of the Alternating Direction Method of Multipliers in Smooth Cases. J Optim Theory Appl 176, 377–398 (2018). https://doi.org/10.1007/s10957-017-1211-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-017-1211-3
Keywords
- Alternating direction method of multipliers
- Primal–dual algorithm
- Strong convexity
- Linear convergence rate