Abstract
In this paper, we investigate attractive properties of the proximal gradient algorithm with inertia. Notably, we show that using alternated inertia yields monotonically decreasing functional values, which contrasts with usual accelerated proximal gradient methods. We also provide convergence rates for the algorithm with alternated inertia, based on local geometric properties of the objective function. The results are put into perspective by discussions on several extensions (strongly convex case, non-convex case, and alternated extrapolation) and illustrations on common regularized optimization problems.
Similar content being viewed by others
Notes
For a non-smooth (possibly non-convex) function \(\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}\), we denote by \(\partial \varPhi (x)\) the limiting (Fréchet) subdifferential at x [26]. If \(\varPhi \) is convex, this subdifferential coincides with the standard convex subdifferential.
\(a_k = \varOmega (b_k) \) if \(\exists a,K\) such that \(\forall k\ge K\) we have \(a_k\ge a.b_k\).
References
Chambolle, A., De Vore, R.A., Lee, N.Y., Lucier, B.J.: Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7(3), 319–335 (1998)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics 57(11), 1413–1457 (2004)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell _{1}\)-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in hilbert space. SIAM Journal on Optim. 14(3), 773–782 (2004)
Lorenz, D., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imag. Vis. 51(2), 311–325 (2014)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Aujol, J.F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to fista. SIAM J. Optim. 25(4), 2408–2433 (2015)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than \(1/k^2\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate o (1/k2). Sov. Math. Dokl. 27(2), 372–376 (1983)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Maingé, P.E.: Convergence theorems for inertial km-type algorithms. J. Comput. Appl. Math. 219(1), 223–236 (2008)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)
Malitsky, Y., Pock, T.: A first-order primal-dual algorithm with linesearch. arXiv preprint arXiv:1608.08883 (2016)
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Math. Program. 62(2), 261–275 (1993)
Bioucas-Dias, J.M., Figueiredo, M.A.: A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16(12), 2992–3004 (2007)
Fuentes, M., Malick, J., Lemaréchal, C.: Descentwise inexact proximal algorithms for smooth optimization. Comput. Optim. Appl. 53(3), 755–769 (2012)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in neural information processing systems, pp. 379–387 (2015)
Mu, Z., Peng, Y.: A note on the inertial proximal point method. Stat. Optim. Inf. Comput. 3(3), 241–248 (2015)
Iutzeler, F., Hendrickx, J.M.: A generic linear rate acceleration of optimization algorithms via relaxation and inertia. arXiv preprint arXiv:1603.05398 (2016)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. 2. Springer, Heidelberg (1993)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
NGuyen, T.P.: Kurdyka-lojasiewicz and convexity: algorithms and applications. Ph.D. thesis, Toulouse University (2017)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for kurdyka-łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811 (2016)
Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)
Liang, J., Fadili, J., Peyré, G.: A multi-step inertial forward-backward splitting method for non-convex optimization. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 4035–4043. Curran Associates, Inc. (2016)
Chartrand, R., Yin, W.: Nonconvex sparse regularization and splitting algorithms. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 237–249. Springer (2016)
Acknowledgements
The work of the authors is partly supported by the PGMO Grant Advanced Non-smooth Optimization Methods for Stochastic Programming.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Marc Teboulle.
Appendix
Appendix
For the sake of completeness, we provide short and direct proofs of known lemmas recalled in Sect. 2.2.
Proof of Lemma 2.2
Let \(x\in \mathbb {R}^n\); by definition, we have
and, as it is defined as the minimizer of \(\frac{1}{\gamma }\)-strongly convex surrogate function \(s_x\), we have for any \(y\in \mathbb {R}^n\) that \(s_x ( \mathsf {T}_\gamma (x)) + \frac{1}{2\gamma } \Vert \mathsf {T}_\gamma (x) - y\Vert ^2 \le s_x (y)\) so
Now we use (i) the descent lemma on L-smooth function f (see [23, Th. 18.15]) to show that
and (ii) the convexity of f to have
Using Eq. (15) on the left-hand side of (14) and Eq. (16) on the right-hand side, we get
\(\square \)
Proof of Lemma 2.3
Let \(x\in \mathbb {R}^n\), and let \(y = \mathsf {T}_\gamma (x) \in {{\mathrm{argmin}}}_w \left( \gamma g(w) + \frac{1}{2} \left\| w- \left( x - \gamma \nabla f(x) \right) \right\| ^2 \right) \), then
so \( \nabla f(y) - \nabla f(x) + \frac{1}{\gamma }(x-y) \in \partial F(y)\), thus we have \( {{\mathrm{dist}}}(0,\partial F(y)) \le \Vert \nabla f(y) - \nabla f(x) + \frac{1}{\gamma }(x-y) \Vert \le \left( L + \frac{1}{\gamma }\right) \Vert x-y\Vert \). \(\square \)
Lemma
(Non-convex version of Lemma 2.2) Let Assumption 1 hold but with g possibly non-convex, and take \(\gamma >0\). Then, for any \(x,y\in \mathbb {R}^n\),
Proof
Let \(x\in \mathbb {R}^n\); by definition, we have, as in the proof of Lemma 2.2,
and, as it is defined as a minimizer of (non-necessarily convex) surrogate function \(s_x\), we have for any \(y\in \mathbb {R}^n\) that \(s_x ( \mathsf {T}_\gamma (x)) \le s_x (y)\) (which differs from the convex case of Lemma 2.2) thus
The proof then follows the same lines as that of Lemma 2.2.\(\square \)
Rights and permissions
About this article
Cite this article
Iutzeler, F., Malick, J. On the Proximal Gradient Algorithm with Alternated Inertia. J Optim Theory Appl 176, 688–710 (2018). https://doi.org/10.1007/s10957-018-1226-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-018-1226-4