Abstract
We revisit the classical Douglas–Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the step-sizes, we aim at developing an adaptive step-size rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a step-size strategy that eliminates the need for step-size tuning. We analyze a general non-stationary DR scheme and prove its convergence for a convergent sequence of step-sizes with summable increments in the case of maximally monotone operators. This, in turn, proves the convergence of the method with the new adaptive step-size rule. We also derive the related non-stationary alternating direction method of multipliers. We illustrate the efficiency of the proposed methods on several numerical examples.
Similar content being viewed by others
Notes
The exact construction of A and B is \(A = C^{T}C\) and \(B = D^{T}D\), where \(C\in \mathbb {R}^{(0.5m + 10)\times m}\) and \(D\in \mathbb {R}^{0.5m\times m}\) are drawn from the standard Gaussian distribution in Matlab.
The paper [43] has a convergence guarantee for an adaptive relaxed method, but this does not apply to the methods used in this comparison and is not included since it also involves relaxation.
References
Bauschke, H.H.: A note on the paper by Eckstein and Svaiter on “general projective splitting methods for sums of maximal monotone operators”. SIAM J. Control Optim. 48(4), 2513–2515 (2009)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Bauschke, H.H., Moffat, S.M., Wang, X.: Firmly nonexpansive mappings and maximally monotone operators: correspondence and duality. Set-Valued Var. Anal. 20(1), 131–153 (2012)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Compt. 3(3), 165–218 (2011)
Becker, S., Combettes, P.L.: An algorithm for splitting parallel sums of linearly composed monotone operators, with applications to signal recovery (2013). arXiv:1305.5828
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Bredies, K., Sun, H.P.: Preconditioned Douglas–Rachford algorithms for TV-and TGV-regularized variational imaging problems. J. Math. Imaging Vis. 52(3), 317–344 (2015)
Bredies, K., Sun, H.: Preconditioned Douglas–Rachford splitting methods for convex–concave saddle-point problems. SIAM J. Numer. Anal. 53(1), 421–444 (2015)
Bredies, K., Sun, H.: Accelerated Douglas–Rachford methods for the solution of convex–concave saddle-point problems (2016). arXiv:1604.06282
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004)
Combettes, P.L., Pesquet, J.-C.: A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J. Sel. Top. Signal Process. 1(4), 564–574 (2007)
Dao, N.M., Phan, M.H.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators (2018). arXiv:1809.00761
Davis, D.: Convergence rate analysis of the forward-Douglas–Rachford splitting scheme. SIAM J. Optim. 25(3), 1760–1786 (2015)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 115–163. Springer, Berlin (2016)
Douglas Jr., J., Rachford Jr., H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)
Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
Giselsson, P.: Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19(4), 2241–2270 (2017)
Giselsson, P., Boyd, S.: Diagonal scaling in Douglas–Rachford splitting and ADMM. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 5033–5039. IEEE (2014)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y., Neittaanmäki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology, pp. 59–82. Springer, Dordrecht (2014)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012)
He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
He, B.S., Yang, H., Wang, S.L.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)
Li, X., Yuan, X.: A proximal strictly contractive Peaceman–Rachford splitting method for convex programming with applications to imaging. SIAM J. Imaging Sci. 8(2), 1332–1365 (2015)
Liang, J., Fadili, J., Peyré, G.: Local convergence properties of Douglas–Rachford and alternating direction method of multipliers. J. Optim. Theory Appl. 172(3), 874–913 (2017)
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612–620 (2011)
Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Moursi, W.M., Vandenberghe, L.: Douglas–Rachford splitting for a Lipschitz continuous and a strongly monotone operator (2018). arXiv:1805.09396
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of ADMM (2015). arXiv:1502.02009
O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)
Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 4234–4239. IEEE (2014)
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1762–1769. IEEE (2011)
Tyrrell Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)
Song, C., Yoon, S., Pavlovic, V.: Fast ADMM algorithm for distributed optimization with adaptive penalty. In: AAAI, pp. 753–759 (2016)
Svaiter, B.F.: A simplified proof of weak convergence in Douglas–Rachford method to a solution of the underlying inclusion problem (2018). arXiv:1809.00967
Svaiter, B.F.: On weak convergence of the Douglas–Rachford method. SIAM J. Control Optim. 49(1), 280–287 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
Xu, Z., Figueiredo, M.A.T., Goldstein, T.: Adaptive ADMM with spectral penalty parameter selection (2016). arXiv:1605.07246
Xu, Z., Figueiredo, M.A.T., Yuan, X., Studer, C., Goldstein, T.: Adaptive relaxed ADMM: convergence theory and practical implementation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7234–7243. IEEE (2017)
Acknowledgements
We thank Zheng Xu and Tom Goldstein for sharing the ADMM code.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The work of Q. Tran-Dinh was partially supported by the NSF Grant, No. DMS-1619884 (2016–2019).
Appendix: The proof of Theorem 5.1
Appendix: The proof of Theorem 5.1
Let us assume that we apply (12) to solve the optimality condition (25) of the dual problem (24). From (12), i.e.,
we define \(w^{n+1} := J_{t_{n-1}B}y^{n}\) and \(z^{n+1} := J_{t_{n}A}( (1 + \kappa _n)w^{n+1} - \kappa _ny^n)\) to obtain
Shifting up this scheme by one index and changing the order, we obtain
Let \((1 + \kappa _{n-1})w^{n} - \kappa _{n-1}y^{n-1} = x^n + w^n\). This gives \(x^n = \kappa _{n-1}(w^n - y^{n-1})\) and hence, \(z^n + \kappa _{n-1}(y^{n-1} - w^n) = z^n - x^n\) and \(x^{n+1} = \kappa _n(w^{n+1} - y^{n}) = \kappa _n(w^{n+1} - z^n + x^n)\). Substituting these into the above expression of the DR scheme, we obtain
where \(x^n = \kappa _{n-1}(w^{n} - y^{n-1})\).
From \(z^{n} = J_{t_{n-1}A}(w^{n} + x^n)\), we have \(z^n = (I + t_{n-1}A)^{-1}(w^n + x^n)\) or
Let \(u^{n+1} \in \nabla {\varphi ^{*}}(D^Tz^n)\), which implies \(D^Tz^n \in \partial {\varphi }(u^{n+1})\). Hence, we have \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\), therefore \(D^Tz^n = D^T(w^n + x^n - t_{n-1}(Du^{n+1} - c)) \in \partial {\varphi }(u^{n+1})\). This condition leads to
This is the optimality condition of
Similarly, from \(w^{n+1} = J_{t_{n-1}B}(z^n - x^n)\), if we define \(v^{n+1} \in \nabla {\psi ^{*}}(E^Tw^{n+1})\), then we can also derive that
From the line \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\) above, we can write \(x^n - z^n = t_{n-1}(Du^{n+1} - c) - w^n\). Substituting this expression into the above step, we obtain
This is the second line of (26).
Next, from \(w^{n+1} - z^n + x^n + t_{n-1}Ev^{n+1} = 0\), we have \(w^{n} = z^{n-1} - x^{n-1} - t_{n-2}Ev^{n}\). This implies \(Ev^n = -t_{n-2}^{-1}(x^{n-1} + w^n - z^{n-1})\). From the last line of (28), we have \(x^n = \kappa _{n-1}(x^{n-1} + w^n - z^{n-1})\). Combine these two lines, we get \(Ev^n = -\tfrac{1}{\kappa _{n-1}t_{n-2}}x^n = -\frac{1}{t_{n-1}}x^n\) due to the update rule (9): \(t_{n-1} = \kappa _{n-1}t_{n-2}\). Substituting \(Ev^n = -\frac{1}{t_{n-1}}x^n\) into the u-subproblem, we obtain
This is the first line of (26).
Now, since \(z^n = w^n - t_{n-1}(Du^{n+1} - c) + x^n\), and \(w^{n+1} = z^n - x^n - t_{n-1}Ev^{n+1}\), combining these expressions, we obtain \(w^{n+1} = w^n - t_{n-1}(Du^{n+1} + Ev^{n+1} - c)\). This is the last line of (26).
Finally, we derive the update rule for \(t_n\). Indeed, note that \(y^n = z^n - x^n\), and \(z^n - w^n - x^n + t_{n-1}(Du^{n+1} - c) = 0\). These relations show that \(y^n = w^n - t_{n-1}(Du^{n+1} - c)\). Moreover, we also have \(w^{n+1} = J_{t_{n-1}B}(z^n - x^n) = J_{t_{n-1}B}(y^n)\). In this case, we have \(J_{t_{n-1}B}(y^n) - y^n = w^{n+1} - w^n + t_{n-1}(Du^{n+1} - c) = -t_{n-1}(Du^{n+1} + Ev^{n+1} - c) + t_{n-1}(Du^{n+1} - c) = -t_{n-1}Ev^{n+1}\). Hence, we can compute \(\kappa _n\) as
Using the fact that \(t_n := \kappa _nt_{n-1}\), we show that \(t_n := \frac{\Vert w^{n+1}\Vert }{\Vert Ev^{n+1}\Vert }\), which is the last line of (26) after projecting and weighting as in Sect. 4. Since \(\left\{ w^n\right\} \) is equivalent to the sequence \(\left\{ u^n\right\} \) in the DR scheme (2) [or equivalently, (12)] applying to the dual optimality condition (25) of the dual problem (24), the last conclusion is a direct consequence of Theorem 3.2. \(\square \)
Rights and permissions
About this article
Cite this article
Lorenz, D.A., Tran-Dinh, Q. Non-stationary Douglas–Rachford and alternating direction method of multipliers: adaptive step-sizes and convergence. Comput Optim Appl 74, 67–92 (2019). https://doi.org/10.1007/s10589-019-00106-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00106-9
Keywords
- Douglas–Rachford method
- Alternating direction methods of multipliers
- Maximal monotone inclusions
- Adaptive step-size
- Non-stationary iteration