Skip to main content
Log in

Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We revisit the classical Douglas–Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the step-sizes, we aim at developing an adaptive step-size rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a step-size strategy that eliminates the need for step-size tuning. We analyze a general non-stationary DR scheme and prove its convergence for a convergent sequence of step-sizes with summable increments in the case of maximally monotone operators. This, in turn, proves the convergence of the method with the new adaptive step-size rule. We also derive the related non-stationary alternating direction method of multipliers. We illustrate the efficiency of the proposed methods on several numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The exact construction of A and B is \(A = C^{T}C\) and \(B = D^{T}D\), where \(C\in \mathbb {R}^{(0.5m + 10)\times m}\) and \(D\in \mathbb {R}^{0.5m\times m}\) are drawn from the standard Gaussian distribution in Matlab.

  2. The paper [43] has a convergence guarantee for an adaptive relaxed method, but this does not apply to the methods used in this comparison and is not included since it also involves relaxation.

References

  1. Bauschke, H.H.: A note on the paper by Eckstein and Svaiter on “general projective splitting methods for sums of maximal monotone operators”. SIAM J. Control Optim. 48(4), 2513–2515 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)

    Book  MATH  Google Scholar 

  3. Bauschke, H.H., Moffat, S.M., Wang, X.: Firmly nonexpansive mappings and maximally monotone operators: correspondence and duality. Set-Valued Var. Anal. 20(1), 131–153 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Compt. 3(3), 165–218 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Becker, S., Combettes, P.L.: An algorithm for splitting parallel sums of linearly composed monotone operators, with applications to signal recovery (2013). arXiv:1305.5828

  7. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  8. Bredies, K., Sun, H.P.: Preconditioned Douglas–Rachford algorithms for TV-and TGV-regularized variational imaging problems. J. Math. Imaging Vis. 52(3), 317–344 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bredies, K., Sun, H.: Preconditioned Douglas–Rachford splitting methods for convex–concave saddle-point problems. SIAM J. Numer. Anal. 53(1), 421–444 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bredies, K., Sun, H.: Accelerated Douglas–Rachford methods for the solution of convex–concave saddle-point problems (2016). arXiv:1604.06282

  11. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  12. Combettes, P.L., Pesquet, J.-C.: A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J. Sel. Top. Signal Process. 1(4), 564–574 (2007)

    Article  Google Scholar 

  13. Dao, N.M., Phan, M.H.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators (2018). arXiv:1809.00761

  14. Davis, D.: Convergence rate analysis of the forward-Douglas–Rachford splitting scheme. SIAM J. Optim. 25(3), 1760–1786 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 115–163. Springer, Berlin (2016)

    Chapter  Google Scholar 

  16. Douglas Jr., J., Rachford Jr., H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  17. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. Giselsson, P.: Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19(4), 2241–2270 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  20. Giselsson, P., Boyd, S.: Diagonal scaling in Douglas–Rachford splitting and ADMM. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 5033–5039. IEEE (2014)

  21. Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y., Neittaanmäki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology, pp. 59–82. Springer, Dordrecht (2014)

    Google Scholar 

  23. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)

    MATH  Google Scholar 

  24. He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. He, B.S., Yang, H., Wang, S.L.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  27. Li, X., Yuan, X.: A proximal strictly contractive Peaceman–Rachford splitting method for convex programming with applications to imaging. SIAM J. Imaging Sci. 8(2), 1332–1365 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Liang, J., Fadili, J., Peyré, G.: Local convergence properties of Douglas–Rachford and alternating direction method of multipliers. J. Optim. Theory Appl. 172(3), 874–913 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  29. Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612–620 (2011)

  30. Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  31. Moursi, W.M., Vandenberghe, L.: Douglas–Rachford splitting for a Lipschitz continuous and a strongly monotone operator (2018). arXiv:1805.09396

  32. Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of ADMM (2015). arXiv:1502.02009

  33. O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  34. Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 4234–4239. IEEE (2014)

  35. Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1762–1769. IEEE (2011)

  36. Tyrrell Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  37. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  38. Song, C., Yoon, S., Pavlovic, V.: Fast ADMM algorithm for distributed optimization with adaptive penalty. In: AAAI, pp. 753–759 (2016)

  39. Svaiter, B.F.: A simplified proof of weak convergence in Douglas–Rachford method to a solution of the underlying inclusion problem (2018). arXiv:1809.00967

  40. Svaiter, B.F.: On weak convergence of the Douglas–Rachford method. SIAM J. Control Optim. 49(1), 280–287 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  42. Xu, Z., Figueiredo, M.A.T., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection (2016). arXiv:1605.07246

  43. Xu, Z., Figueiredo, M.A.T., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7234–7243. IEEE (2017)

Download references

Acknowledgements

We thank Zheng Xu and Tom Goldstein for sharing the ADMM code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk A. Lorenz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The work of Q. Tran-Dinh was partially supported by the NSF Grant, No. DMS-1619884 (2016–2019).

Appendix: The proof of Theorem 5.1

Appendix: The proof of Theorem 5.1

Let us assume that we apply (12) to solve the optimality condition (25) of the dual problem (24). From (12), i.e.,

$$\begin{aligned} y^{n+1} = J_{t_{n}A}\left( (1+\kappa _n)J_{t_{n-1}B}y^{n} - \kappa _n y^{n}\right) + \kappa _n\left( y^{n} -J_{t_{n-1}B}y^{n}\right) , \end{aligned}$$

we define \(w^{n+1} := J_{t_{n-1}B}y^{n}\) and \(z^{n+1} := J_{t_{n}A}( (1 + \kappa _n)w^{n+1} - \kappa _ny^n)\) to obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} w^{n+1} := J_{t_{n-1}B}y^{n}\\ z^{n+1} := J_{t_{n}A}\left( (1 + \kappa _n)w^{n+1} - \kappa _n y^n\right) \\ y^{n+1} = z^{n+1} + \kappa _n\left( y^n- w^{n+1}\right) . \end{array}\right. } \end{aligned}$$

Shifting up this scheme by one index and changing the order, we obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} z^{n} = J_{t_{n-1}A}\left( \left( 1 + \kappa _{n-1}\right) w^{n} - \kappa _{n-1}y^{n-1}\right) \\ y^{n} = z^{n} + \kappa _{n-1}\left( y^{n-1} - w^{n}\right) \\ w^{n+1} = J_{t_{n-1}B}y^n = J_{t_{n-1}B}\left( z^n + \kappa _{n-1}\left( y^{n-1} - w^n\right) \right) . \end{array}\right. } \end{aligned}$$

Let \((1 + \kappa _{n-1})w^{n} - \kappa _{n-1}y^{n-1} = x^n + w^n\). This gives \(x^n = \kappa _{n-1}(w^n - y^{n-1})\) and hence, \(z^n + \kappa _{n-1}(y^{n-1} - w^n) = z^n - x^n\) and \(x^{n+1} = \kappa _n(w^{n+1} - y^{n}) = \kappa _n(w^{n+1} - z^n + x^n)\). Substituting these into the above expression of the DR scheme, we obtain

$$\begin{aligned} \left\{ \begin{array}{ll} z^{n} = J_{t_{n-1}A}(x^{n} + w^n)\\ w^{n+1} = J_{t_{n-1}B}(z^{n} - x^n)\\ x^{n+1} = \kappa _n(x^n + w^{n+1} - z^{n}), \end{array}\right. \end{aligned}$$
(28)

where \(x^n = \kappa _{n-1}(w^{n} - y^{n-1})\).

From \(z^{n} = J_{t_{n-1}A}(w^{n} + x^n)\), we have \(z^n = (I + t_{n-1}A)^{-1}(w^n + x^n)\) or

$$\begin{aligned} 0 \in z^n - w^n - x^n + t_{n-1}\left( D\nabla {\varphi ^{*}}\left( D^Tz^n\right) - c\right) . \end{aligned}$$

Let \(u^{n+1} \in \nabla {\varphi ^{*}}(D^Tz^n)\), which implies \(D^Tz^n \in \partial {\varphi }(u^{n+1})\). Hence, we have \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\), therefore \(D^Tz^n = D^T(w^n + x^n - t_{n-1}(Du^{n+1} - c)) \in \partial {\varphi }(u^{n+1})\). This condition leads to

$$\begin{aligned} 0 \in D^T\left( t_{n-1}\left( Du^{n+1} - c\right) - x^n - w^n\right) + \partial {\varphi }\left( u^{n+1}\right) . \end{aligned}$$

This is the optimality condition of

$$\begin{aligned} u^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _u\left\{ \varphi (u) + \frac{t_{n-1}}{2}\Vert Du - c - t_{n-1}^{-1}(x^n + w^n)\Vert ^2\right\} . \end{aligned}$$

Similarly, from \(w^{n+1} = J_{t_{n-1}B}(z^n - x^n)\), if we define \(v^{n+1} \in \nabla {\psi ^{*}}(E^Tw^{n+1})\), then we can also derive that

$$\begin{aligned} v^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _v\left\{ \psi (v) + \frac{t_{n-1}}{2}\Vert Ev + t_{n-1}^{-1}(x^n - z^n)\Vert ^2\right\} . \end{aligned}$$

From the line \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\) above, we can write \(x^n - z^n = t_{n-1}(Du^{n+1} - c) - w^n\). Substituting this expression into the above step, we obtain

$$\begin{aligned} v^{n+1} = \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _v\left\{ \psi (v) - \langle w^n, Ev\rangle + \tfrac{t_{n-1}}{2}\Vert Ev + Du^{n+1} - c\Vert ^2\right\} . \end{aligned}$$

This is the second line of (26).

Next, from \(w^{n+1} - z^n + x^n + t_{n-1}Ev^{n+1} = 0\), we have \(w^{n} = z^{n-1} - x^{n-1} - t_{n-2}Ev^{n}\). This implies \(Ev^n = -t_{n-2}^{-1}(x^{n-1} + w^n - z^{n-1})\). From the last line of (28), we have \(x^n = \kappa _{n-1}(x^{n-1} + w^n - z^{n-1})\). Combine these two lines, we get \(Ev^n = -\tfrac{1}{\kappa _{n-1}t_{n-2}}x^n = -\frac{1}{t_{n-1}}x^n\) due to the update rule (9): \(t_{n-1} = \kappa _{n-1}t_{n-2}\). Substituting \(Ev^n = -\frac{1}{t_{n-1}}x^n\) into the u-subproblem, we obtain

$$\begin{aligned} u^{n+1} = \mathop {{{\,\mathrm{argmin}\,}}}\limits _u\left\{ \varphi (u) - \langle w^n, Du\rangle + \frac{t_{n-1}}{2}\Vert Du + Ev^n - c\Vert ^2\right\} . \end{aligned}$$

This is the first line of (26).

Now, since \(z^n = w^n - t_{n-1}(Du^{n+1} - c) + x^n\), and \(w^{n+1} = z^n - x^n - t_{n-1}Ev^{n+1}\), combining these expressions, we obtain \(w^{n+1} = w^n - t_{n-1}(Du^{n+1} + Ev^{n+1} - c)\). This is the last line of (26).

Finally, we derive the update rule for \(t_n\). Indeed, note that \(y^n = z^n - x^n\), and \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\). These relations show that \(y^n = w^n - t_{n-1}(Du^{n+1} - c)\). Moreover, we also have \(w^{n+1} = J_{t_{n-1}B}(z^n - x^n) = J_{t_{n-1}B}(y^n)\). In this case, we have \(J_{t_{n-1}B}(y^n) - y^n = w^{n+1} - w^n + t_{n-1}(Du^{n+1} - c) = -t_{n-1}(Du^{n+1} + Ev^{n+1} - c) + t_{n-1}(Du^{n+1} - c) = -t_{n-1}Ev^{n+1}\). Hence, we can compute \(\kappa _n\) as

$$\begin{aligned} \kappa _n := \frac{\Vert J_{t_{n-1}B}(y^n)\Vert }{\Vert y^n - J_{t_{n-1}B}(y^n)\Vert } = \frac{\Vert w^{n+1}\Vert }{t_{n-1}\Vert Ev^{n+1}\Vert }. \end{aligned}$$

Using the fact that \(t_n := \kappa _nt_{n-1}\), we show that \(t_n := \frac{\Vert w^{n+1}\Vert }{\Vert Ev^{n+1}\Vert }\), which is the last line of (26) after projecting and weighting as in Sect. 4. Since \(\left\{ w^n\right\} \) is equivalent to the sequence \(\left\{ u^n\right\} \) in the DR scheme (2) [or equivalently, (12)] applying to the dual optimality condition (25) of the dual problem (24), the last conclusion is a direct consequence of Theorem 3.2. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lorenz, D.A., Tran-Dinh, Q. Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence. Comput Optim Appl 74, 67–92 (2019). https://doi.org/10.1007/s10589-019-00106-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00106-9

Keywords

Mathematics Subject Classification

Navigation