Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence

Lorenz, Dirk A.; Tran-Dinh, Quoc

doi:10.1007/s10589-019-00106-9

Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence

Published: 08 May 2019

Volume 74, pages 67–92, (2019)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

677 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

We revisit the classical Douglas–Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the step-sizes, we aim at developing an adaptive step-size rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a step-size strategy that eliminates the need for step-size tuning. We analyze a general non-stationary DR scheme and prove its convergence for a convergent sequence of step-sizes with summable increments in the case of maximally monotone operators. This, in turn, proves the convergence of the method with the new adaptive step-size rule. We also derive the related non-stationary alternating direction method of multipliers. We illustrate the efficiency of the proposed methods on several numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Alternating Direction Methods of Multipliers: A Historical Perspective

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

Article 28 January 2021

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Article 21 November 2014

Notes

The exact construction of A and B is $A = C^{T}C$ and $B = D^{T}D$, where $C\in \mathbb {R}^{(0.5m + 10)\times m}$ and $D\in \mathbb {R}^{0.5m\times m}$ are drawn from the standard Gaussian distribution in Matlab.
The paper [43] has a convergence guarantee for an adaptive relaxed method, but this does not apply to the methods used in this comparison and is not included since it also involves relaxation.

References

Bauschke, H.H.: A note on the paper by Eckstein and Svaiter on “general projective splitting methods for sums of maximal monotone operators”. SIAM J. Control Optim. 48(4), 2513–2515 (2009)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Book MATH Google Scholar
Bauschke, H.H., Moffat, S.M., Wang, X.: Firmly nonexpansive mappings and maximally monotone operators: correspondence and duality. Set-Valued Var. Anal. 20(1), 131–153 (2012)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Compt. 3(3), 165–218 (2011)
Article MathSciNet MATH Google Scholar
Becker, S., Combettes, P.L.: An algorithm for splitting parallel sums of linearly composed monotone operators, with applications to signal recovery (2013). arXiv:1305.5828
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Bredies, K., Sun, H.P.: Preconditioned Douglas–Rachford algorithms for TV-and TGV-regularized variational imaging problems. J. Math. Imaging Vis. 52(3), 317–344 (2015)
Article MathSciNet MATH Google Scholar
Bredies, K., Sun, H.: Preconditioned Douglas–Rachford splitting methods for convex–concave saddle-point problems. SIAM J. Numer. Anal. 53(1), 421–444 (2015)
Article MathSciNet MATH Google Scholar
Bredies, K., Sun, H.: Accelerated Douglas–Rachford methods for the solution of convex–concave saddle-point problems (2016). arXiv:1604.06282
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J. Sel. Top. Signal Process. 1(4), 564–574 (2007)
Article Google Scholar
Dao, N.M., Phan, M.H.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators (2018). arXiv:1809.00761
Davis, D.: Convergence rate analysis of the forward-Douglas–Rachford splitting scheme. SIAM J. Optim. 25(3), 1760–1786 (2015)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 115–163. Springer, Berlin (2016)
Chapter Google Scholar
Douglas Jr., J., Rachford Jr., H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)
Article MathSciNet MATH Google Scholar
Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
Article MathSciNet MATH Google Scholar
Giselsson, P.: Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19(4), 2241–2270 (2017)
Article MathSciNet MATH Google Scholar
Giselsson, P., Boyd, S.: Diagonal scaling in Douglas–Rachford splitting and ADMM. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 5033–5039. IEEE (2014)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)
Article MathSciNet MATH Google Scholar
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y., Neittaanmäki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology, pp. 59–82. Springer, Dordrecht (2014)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)
MATH Google Scholar
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yang, H., Wang, S.L.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)
Article MathSciNet MATH Google Scholar
Li, X., Yuan, X.: A proximal strictly contractive Peaceman–Rachford splitting method for convex programming with applications to imaging. SIAM J. Imaging Sci. 8(2), 1332–1365 (2015)
Article MathSciNet MATH Google Scholar
Liang, J., Fadili, J., Peyré, G.: Local convergence properties of Douglas–Rachford and alternating direction method of multipliers. J. Optim. Theory Appl. 172(3), 874–913 (2017)
Article MathSciNet MATH Google Scholar
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612–620 (2011)
Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
Moursi, W.M., Vandenberghe, L.: Douglas–Rachford splitting for a Lipschitz continuous and a strongly monotone operator (2018). arXiv:1805.09396
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of ADMM (2015). arXiv:1502.02009
O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)
Article MathSciNet MATH Google Scholar
Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 4234–4239. IEEE (2014)
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1762–1769. IEEE (2011)
Tyrrell Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Song, C., Yoon, S., Pavlovic, V.: Fast ADMM algorithm for distributed optimization with adaptive penalty. In: AAAI, pp. 753–759 (2016)
Svaiter, B.F.: A simplified proof of weak convergence in Douglas–Rachford method to a solution of the underlying inclusion problem (2018). arXiv:1809.00967
Svaiter, B.F.: On weak convergence of the Douglas–Rachford method. SIAM J. Control Optim. 49(1), 280–287 (2011)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Xu, Z., Figueiredo, M.A.T., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection (2016). arXiv:1605.07246
Xu, Z., Figueiredo, M.A.T., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7234–7243. IEEE (2017)

Download references

Acknowledgements

We thank Zheng Xu and Tom Goldstein for sharing the ADMM code.

Author information

Authors and Affiliations

Institute of Analysis and Algebra, TU Braunschweig, 38092, Braunschweig, Germany
Dirk A. Lorenz
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill (UNC-Chapel Hill), 333 Hanes Hall, CB# 3260, UNC Chapel Hill, NC, 27599-3260, USA
Quoc Tran-Dinh

Authors

Dirk A. Lorenz
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Tran-Dinh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk A. Lorenz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The work of Q. Tran-Dinh was partially supported by the NSF Grant, No. DMS-1619884 (2016–2019).

Appendix: The proof of Theorem 5.1

Let us assume that we apply (12) to solve the optimality condition (25) of the dual problem (24). From (12), i.e.,

$$\begin{aligned} y^{n+1} = J_{t_{n}A}\left( (1+\kappa _n)J_{t_{n-1}B}y^{n} - \kappa _n y^{n}\right) + \kappa _n\left( y^{n} -J_{t_{n-1}B}y^{n}\right) , \end{aligned}$$

we define $w^{n+1} := J_{t_{n-1}B}y^{n}$ and $z^{n+1} := J_{t_{n}A}( (1 + \kappa _n)w^{n+1} - \kappa _ny^n)$ to obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} w^{n+1} := J_{t_{n-1}B}y^{n}\\ z^{n+1} := J_{t_{n}A}\left( (1 + \kappa _n)w^{n+1} - \kappa _n y^n\right) \\ y^{n+1} = z^{n+1} + \kappa _n\left( y^n- w^{n+1}\right) . \end{array}\right. } \end{aligned}$$

Shifting up this scheme by one index and changing the order, we obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} z^{n} = J_{t_{n-1}A}\left( \left( 1 + \kappa _{n-1}\right) w^{n} - \kappa _{n-1}y^{n-1}\right) \\ y^{n} = z^{n} + \kappa _{n-1}\left( y^{n-1} - w^{n}\right) \\ w^{n+1} = J_{t_{n-1}B}y^n = J_{t_{n-1}B}\left( z^n + \kappa _{n-1}\left( y^{n-1} - w^n\right) \right) . \end{array}\right. } \end{aligned}$$

Let $(1 + \kappa _{n-1})w^{n} - \kappa _{n-1}y^{n-1} = x^n + w^n$. This gives $x^n = \kappa _{n-1}(w^n - y^{n-1})$ and hence, $z^n + \kappa _{n-1}(y^{n-1} - w^n) = z^n - x^n$ and $x^{n+1} = \kappa _n(w^{n+1} - y^{n}) = \kappa _n(w^{n+1} - z^n + x^n)$. Substituting these into the above expression of the DR scheme, we obtain

$$\begin{aligned} \left\{ \begin{array}{ll} z^{n} = J_{t_{n-1}A}(x^{n} + w^n)\\ w^{n+1} = J_{t_{n-1}B}(z^{n} - x^n)\\ x^{n+1} = \kappa _n(x^n + w^{n+1} - z^{n}), \end{array}\right. \end{aligned}$$

(28)

where $x^n = \kappa _{n-1}(w^{n} - y^{n-1})$.

From $z^{n} = J_{t_{n-1}A}(w^{n} + x^n)$, we have $z^n = (I + t_{n-1}A)^{-1}(w^n + x^n)$ or

$$\begin{aligned} 0 \in z^n - w^n - x^n + t_{n-1}\left( D\nabla {\varphi ^{*}}\left( D^Tz^n\right) - c\right) . \end{aligned}$$

Let $u^{n+1} \in \nabla {\varphi ^{*}}(D^Tz^n)$, which implies $D^Tz^n \in \partial {\varphi }(u^{n+1})$. Hence, we have $z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0$, therefore $D^Tz^n = D^T(w^n + x^n - t_{n-1}(Du^{n+1} - c)) \in \partial {\varphi }(u^{n+1})$. This condition leads to

$$\begin{aligned} 0 \in D^T\left( t_{n-1}\left( Du^{n+1} - c\right) - x^n - w^n\right) + \partial {\varphi }\left( u^{n+1}\right) . \end{aligned}$$

This is the optimality condition of

$$\begin{aligned} u^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _u\left\{ \varphi (u) + \frac{t_{n-1}}{2}\Vert Du - c - t_{n-1}^{-1}(x^n + w^n)\Vert ^2\right\} . \end{aligned}$$

Similarly, from $w^{n+1} = J_{t_{n-1}B}(z^n - x^n)$, if we define $v^{n+1} \in \nabla {\psi ^{*}}(E^Tw^{n+1})$, then we can also derive that

$$\begin{aligned} v^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _v\left\{ \psi (v) + \frac{t_{n-1}}{2}\Vert Ev + t_{n-1}^{-1}(x^n - z^n)\Vert ^2\right\} . \end{aligned}$$

From the line $z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0$ above, we can write $x^n - z^n = t_{n-1}(Du^{n+1} - c) - w^n$. Substituting this expression into the above step, we obtain

$$\begin{aligned} v^{n+1} = \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _v\left\{ \psi (v) - \langle w^n, Ev\rangle + \tfrac{t_{n-1}}{2}\Vert Ev + Du^{n+1} - c\Vert ^2\right\} . \end{aligned}$$

This is the second line of (26).

Next, from $w^{n+1} - z^n + x^n + t_{n-1}Ev^{n+1} = 0$, we have $w^{n} = z^{n-1} - x^{n-1} - t_{n-2}Ev^{n}$. This implies $Ev^n = -t_{n-2}^{-1}(x^{n-1} + w^n - z^{n-1})$. From the last line of (28), we have $x^n = \kappa _{n-1}(x^{n-1} + w^n - z^{n-1})$. Combine these two lines, we get $Ev^n = -\tfrac{1}{\kappa _{n-1}t_{n-2}}x^n = -\frac{1}{t_{n-1}}x^n$ due to the update rule (9): $t_{n-1} = \kappa _{n-1}t_{n-2}$. Substituting $Ev^n = -\frac{1}{t_{n-1}}x^n$ into the u-subproblem, we obtain

$$\begin{aligned} u^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _u\left\{ \varphi (u) - \langle w^n, Du\rangle + \frac{t_{n-1}}{2}\Vert Du + Ev^n - c\Vert ^2\right\} . \end{aligned}$$

This is the first line of (26).

Now, since $z^n = w^n - t_{n-1}(Du^{n+1} - c) + x^n$, and $w^{n+1} = z^n - x^n - t_{n-1}Ev^{n+1}$, combining these expressions, we obtain $w^{n+1} = w^n - t_{n-1}(Du^{n+1} + Ev^{n+1} - c)$. This is the last line of (26).

Finally, we derive the update rule for $t_n$. Indeed, note that $y^n = z^n - x^n$, and $z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0$. These relations show that $y^n = w^n - t_{n-1}(Du^{n+1} - c)$. Moreover, we also have $w^{n+1} = J_{t_{n-1}B}(z^n - x^n) = J_{t_{n-1}B}(y^n)$. In this case, we have $J_{t_{n-1}B}(y^n) - y^n = w^{n+1} - w^n + t_{n-1}(Du^{n+1} - c) = -t_{n-1}(Du^{n+1} + Ev^{n+1} - c) + t_{n-1}(Du^{n+1} - c) = -t_{n-1}Ev^{n+1}$. Hence, we can compute $\kappa _n$ as

$$\begin{aligned} \kappa _n := \frac{\Vert J_{t_{n-1}B}(y^n)\Vert }{\Vert y^n - J_{t_{n-1}B}(y^n)\Vert } = \frac{\Vert w^{n+1}\Vert }{t_{n-1}\Vert Ev^{n+1}\Vert }. \end{aligned}$$

Using the fact that $t_n := \kappa _nt_{n-1}$, we show that $t_n := \frac{\Vert w^{n+1}\Vert }{\Vert Ev^{n+1}\Vert }$, which is the last line of (26) after projecting and weighting as in Sect. 4. Since $\left\{ w^n\right\} $ is equivalent to the sequence $\left\{ u^n\right\} $ in the DR scheme (2) [or equivalently, (12)] applying to the dual optimality condition (25) of the dual problem (24), the last conclusion is a direct consequence of Theorem 3.2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lorenz, D.A., Tran-Dinh, Q. Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence. Comput Optim Appl 74, 67–92 (2019). https://doi.org/10.1007/s10589-019-00106-9

Download citation

Received: 27 September 2018
Published: 08 May 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10589-019-00106-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence

Abstract

Access this article

Similar content being viewed by others

On Alternating Direction Methods of Multipliers: A Historical Perspective

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The proof of Theorem 5.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence

Abstract

Access this article

Similar content being viewed by others

On Alternating Direction Methods of Multipliers: A Historical Perspective

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The proof of Theorem 5.1

Appendix: The proof of Theorem 5.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation