Abstract
Although the performance of popular optimization algorithms such as the Douglas–Rachford splitting (DRS) and the ADMM is satisfactory in convex and well-scaled problems, ill conditioning and nonconvexity pose a severe obstacle to their reliable employment. Expanding on recent convergence results for DRS and ADMM applied to nonconvex problems, we propose two linesearch algorithms to enhance and robustify these methods by means of quasi-Newton directions. The proposed algorithms are suited for nonconvex problems, require the same black-box oracle of DRS and ADMM, and maintain their (subsequential) convergence properties. Numerical evidence shows that the employment of L-BFGS in the proposed framework greatly improves convergence of DRS and ADMM, making them robust to ill conditioning. Under regularity and nondegeneracy assumptions at the limit point, superlinear convergence is shown when quasi-Newton Broyden directions are adopted.
Similar content being viewed by others
Data Availability
The implementations in Julia of all the algorithms are available online as part of the ProximalAlgorithms.jl package. (https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) Additionally to synthetic data, the following publicly available datasets are used in the the numerical experiments: - the full 20newsgroup (http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html) and a condensed version (https://cs.nyu.edu/~roweis/data.html) - nips_conference_papers (https://archive.ics.uci.edu/ml/datasets/NIPS+Conference+Papers+1987-2015) from [39] (belonging to the dataset collection [14]).
Notes
Specifically, (2.7) is the dual of \(\hbox {minimize}\,_{x,z\in \mathbb {R}^n}\varphi _{1}(x)+\varphi _{2}(z)\) \(\hbox {subject}\, \hbox {to}\,x-z=0\).
Owing to strong convexity, the additional requirement of nonemptyness of \(\hbox {arg}\,\hbox {min} \varphi _{1}+\varphi _{2}\) in the cited reference would be trivially satisfied.
In the limiting case \(A=0\), one has that \((Af)=f(0)+\delta _{\left\{ 0\right\} }\) is lsc and \(\infty\)-strongly convex for any \(f\), and properness amounts to the condition \(0\in \hbox {dom}\,f\).
References
Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. ACM 12(4), 547–560 (1965). https://doi.org/10.1145/321296.321305
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer Monographs in Mathematics. Springer, New York (2002)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-48311-5
Bauschke, H.H., Noll, D.: On the local convergence of the Douglas–Rachford algorithm. Arch. Math. 102(6), 589–600 (2014). https://doi.org/10.1007/s00013-014-0652-2
Bauschke, H.H., Phan, H.M., Wang, X.: The method of alternating relaxed projections for two nonconvex sets. Vietnam J. Math. 42(4), 421–450 (2014). https://doi.org/10.1007/s10013-013-0049-8
Bemporad, A., Casavola, A., Mosca, E.: Nonlinear control of constrained linear systems via predictive reference management. IEEE Trans. Autom. Control 42(3), 340–349 (1997). https://doi.org/10.1109/9.557577
Bertsekas, D.P.: Nonlinear Programming, vol. 2, edition Athena Scientific, Belmont (1999)
Bolte, J., Sabach, S., Teboulle, M.: Proximal Alternating Linearized Minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9
Broyden, C.G.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19(92), 577–593 (1965)
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA J. Appl. Math. 6(1), 76–90 (1970). https://doi.org/10.1093/imamat/6.1.76
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. In: Advances in Neural Information Processing Systems, pp. 41–48 (2005)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010). https://doi.org/10.1002/cpa.20303
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992). https://doi.org/10.1007/BF01581204
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. II. Springer, Berlin (2003)
Fang, Hr., Saad, Y.: Two classes of multisecant methods for nonlinear acceleration. Numer. Linear Algebra Appl. 16(3), 197–221 (2009). https://doi.org/10.1002/nla.617
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970). https://doi.org/10.1093/comjnl/13.3.317
García, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practice—a survey. Automatica 25(3), 335–348 (1989). https://doi.org/10.1016/0005-1098(89)90002-2
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)
Goncalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems. Pac. J. Optim. 15, 378–398 (2019)
Guo, K., Han, D., Wu, T.T.: Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. Int. J. Comput. Math. 94(8), 1653–1669 (2017). https://doi.org/10.1080/00207160.2016.1227432
Hesse, R., Luke, R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013). https://doi.org/10.1137/120902653
Hesse, R., Luke, R., Neumann, P.: Alternating projections and Douglas–Rachford for sparse affine feasibility. IEEE Trans. Signal Process. 62(18), 4868–4881 (2014). https://doi.org/10.1109/TSP.2014.2339801
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016). https://doi.org/10.1137/140990309
Ip, C.M., Kyparisis, J.: Local convergence of quasi-Newton methods for B-differentiable equations. Math. Program. 56(1–3), 71–89 (1992)
Izmailov, A.F., Solodov, M.V.: Newton-Type Methods for Optimization and Variational Problems. Springer, Berlin (2014)
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)
Li, G., Liu, T., Pong, T.K.: Peaceman–Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. (2017). https://doi.org/10.1007/s10589-017-9915-8
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015). https://doi.org/10.1137/140998135
Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159(1), 371–401 (2016). https://doi.org/10.1007/s10107-015-0963-5
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 379–387. Curran Associates Inc, New York (2015)
Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems. Ph.D. thesis. Imperial College London (University of London) (1978)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Soviet Math. Doklady 27, 372–376 (1983)
Nocedal, J., Wright, S.: Numerical Optimization, vol. 2, edition Springer, New York (2006)
Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: 52nd IEEE Conference on Decision and Control, pp. 2358–2363 (2013). https://doi.org/10.1109/CDC.2013.6760233
Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: 53rd IEEE Conference on Decision and Control, pp. 4234–4239 (2014). https://doi.org/10.1109/CDC.2014.7040049
Pejcic, I., Jones, C.: Accelerated ADMM based on accelerated Douglas–Rachford splitting. In: 2016 European Control Conference (ECC), pp. 1952–1957 (2016). https://doi.org/10.1109/ECC.2016.7810577
Perrone, V., Jenkins, P.A., Spano, D., Teh, Y.W.: Poisson random fields for dynamic feature models. J. Mach. Learn. Res. 18(1), 4626–4670 (2017)
Poliquin, R.A., Rockafellar, R.T.: Generalized Hessian properties of regularized nonsmooth functions. SIAM J. Optim. 6(4), 1121–1137 (1996)
Powell, M.J.D.: A hybrid method for nonlinear equations. In: Numerical Methods for Nonlinear Algebraic Equations, pp. 87–144. Gordon and Breach (1970)
Powell, M.J.: A fast algorithm for nonlinearly constrained optimization calculations. In: Watson, G.A. (ed.) Numerical Analysis, pp. 144–157. Springer, Berlin (1978)
Rey, F., Frick, D., Domahidi, A., Jerez, J., Morari, M., Lygeros, J.: ADMM prescaling for model predictive control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 3662–3667. IEEE, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CDC.2016.7798820
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.: Variational Analysis, vol. 317. Springer, Berlin (2011)
Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)
Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67(3), 443–487 (2017). https://doi.org/10.1007/s10589-017-9912-y
Stella, L., Themelis, A., Patrinos, P.: Newton-type alternating minimization algorithm for convex optimization. IEEE Trans. Autom. Control 64(2), 697–711 (2019). https://doi.org/10.1109/TAC.2018.2872203
Stella, L., Themelis, A., Sopasakis, P., Patrinos, P.: A simple and efficient algorithm for nonlinear model predictive control. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1939–1944 (2017). https://doi.org/10.1109/CDC.2017.8263933
Themelis, A.: Proximal algorithms for structured nonconvex optimization. Ph.D. thesis, KU Leuven (2018)
Themelis, A., Ahookhosh, M., Patrinos, P.: On the acceleration of forward-backward splitting via an inexact Newton method. In: Bauschke, H.H., Burachik, R.S., Luke, D.R. (eds) Splitting Algorithms, Modern Operator Theory, and Applications, pp. 363–412. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25939-6_15
Themelis, A., Patrinos, P.: SuperMann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Autom. Control 64(12), 4875–4890 (2019). https://doi.org/10.1109/TAC.2019.2906393
Themelis, A., Patrinos, P.: Douglas–Rachford splitting and ADMM for nonconvex optimization: tight convergence results. SIAM J. Optim. 30(1), 149–181 (2020). https://doi.org/10.1137/18M1163993
Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM J. Optim. 28(3), 2274–2303 (2018). https://doi.org/10.1137/16M1080240
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Xu, Z., Chang, X., Xu, F., Zhang, H.: \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012). https://doi.org/10.1109/TNNLS.2012.2197412
Yan, M., Yin, W.: Self Equivalence of the Alternating Direction Method of Multipliers, pp. 165–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41589-5_5
Acknowledgements
The authors acknowledge the thorough proofreading and constructive comments of the anonymous reviewers which significantly helped improving the paper’s quality.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Themelis is supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI grant JP21K17710.
L. Stella’s work was done prior to joining Amazon.
P. Patrinos is supported by the Research Foundation Flanders (FWO) research projects G0A0920N, G086518N, G086318N, and G081222N; Research Council KU Leuven C1 Project No. C14/18/068; Fonds de la Recherche Scientifique—FNRS and the Fonds Wetenschappelijk Onderzoek—Vlaanderen under EOS project 30468160 (SeLMA); European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 953348.
Appendix A: Auxiliary results
Appendix A: Auxiliary results
This appendix contains some auxiliary results needed for the convergence analysis of Sect. 4. As shown in [53, Eq. (3.4)], the DRE can be expressed in terms of the forward-backward envelope \(\varphi _{\gamma }^{\textsc {fb}}\) [36, 47, 54] as
where \(u=\hbox {prox}\, _{\gamma \varphi _{1}}(s)\) and the minimum is attained at any \(v\in \hbox {prox}\, _{\gamma \varphi _{2}}(2u-s)\). Equivalently,
Fact A.1
( [53, Prop. 3.3]). Suppose that Assumption I holds and let \(\gamma <\nicefrac {1}{L_{\varphi _{1}}}\) be fixed. Then, for all \(s\in \mathbb {R}^p\) and \((u,v)\in \text {DRS} _{\gamma }(s)\) it holds that
\(\square \)
Lemma A.2
Suppose that Assumption I holds. Then, for all \(s,{\bar{u}}\in \mathbb {R}^p\) and \(\gamma <\nicefrac {1}{L_{\varphi _{1}}}\)
Proof
Let \(u\,{:}{=}\,\hbox {prox}\, _{\gamma \varphi _{1}}(s)\) for brevity. By plugging \(w={\bar{u}}\) into (A.1a) we obtain
where the second inequality uses the known quadratic upper bound [8, Prop. A.24] for functions with Lipschitz-continuous gradient. \(\square\)
Lemma A.3
Suppose that Assumption I* holds and let \(\gamma >1\nicefrac {}{\mu _{\varphi _{1}}}\) be fixed. Then,
where \(\gamma _*=\nicefrac 1\gamma\). Moreover, for any \(s\in \mathbb {R}^p\) it holds that
where \(x_\star\) is the unique minimizer of \(\varphi\), \(s_*=-\nicefrac s\gamma\), and \((u,v)=\text {DRS} _{\gamma }(s)\).
Proof
Due to strong convexity, the set of primal solutions \(\hbox {arg}\,\hbox {min} \varphi\) is a singleton, ensuring strong duality \(\inf \varphi =-\inf \psi\) through [3, Thm. 5.2.1(b)-(c)]. Since \(\psi _{1}\) is \(\nicefrac {1}{\mu _{\varphi _{1}}}\)-smooth, it follows from Fact 2.1(i) that \(\inf \psi _{\gamma _{*}}^{\textsc {dr}}=\inf \psi\) for every \(\gamma _*<\mu _{\varphi _{1}}\); combined with the identity \(\varphi _{\gamma }^{\textsc {dr}}(s)=-\psi _{\gamma _{*}}^{\textsc {dr}}(-s\nicefrac {}\gamma )\) holding for \(\gamma _*=1\nicefrac {}\gamma\) (cf. Theorem 2.3), (A.2) is obtained. Let now \(s\in \mathbb {R}^p\) be fixed and consider \((u,v)=\text {DRS} _{\gamma }(s)\). From the inclusion \(\tfrac{s-u}{\gamma }\in \partial \varphi _{1}(u)\) (cf. (1.10)) and strong convexity of \(\varphi _{1}\) one has
Similarly, since \(\tfrac{2u-s-v}{\gamma }\in \partial \varphi _{2}(v)\) and \(\varphi _{2}\) is convex, one has
Summing the two inequalities yields
The claim now follows from the identity \(\psi _{\gamma _{*}}^{\textsc {dr}}(s_*)=-\varphi _{\gamma }^{\textsc {dr}}(s)\) shown in Theorem 2.3. \(\square\)
Rights and permissions
About this article
Cite this article
Themelis, A., Stella, L. & Patrinos, P. Douglas–Rachford splitting and ADMM for nonconvex optimization: accelerated and Newton-type linesearch algorithms. Comput Optim Appl 82, 395–440 (2022). https://doi.org/10.1007/s10589-022-00366-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-022-00366-y