On the Proximal Gradient Algorithm with Alternated Inertia

Iutzeler, Franck; Malick, Jérôme

doi:10.1007/s10957-018-1226-4

On the Proximal Gradient Algorithm with Alternated Inertia

Published: 31 January 2018

Volume 176, pages 688–710, (2018)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

735 Accesses
24 Citations
Explore all metrics

Abstract

In this paper, we investigate attractive properties of the proximal gradient algorithm with inertia. Notably, we show that using alternated inertia yields monotonically decreasing functional values, which contrasts with usual accelerated proximal gradient methods. We also provide convergence rates for the algorithm with alternated inertia, based on local geometric properties of the objective function. The results are put into perspective by discussions on several extensions (strongly convex case, non-convex case, and alternated extrapolation) and illustrations on common regularized optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

Article 16 August 2019

New inertial proximal gradient methods for unconstrained convex optimization problems

Article Open access 07 December 2020

A Proximal Point Algorithm Revisited and Extended

Article 17 May 2019

Notes

For a non-smooth (possibly non-convex) function $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}$, we denote by $\partial \varPhi (x)$ the limiting (Fréchet) subdifferential at x [26]. If $\varPhi $ is convex, this subdifferential coincides with the standard convex subdifferential.
$a_k = \varOmega (b_k) $ if $\exists a,K$ such that $\forall k\ge K$ we have $a_k\ge a.b_k$.
In the case of the proximal gradient, the proof of Lemma 2.2 recalled in “Appendix” requires (i) convexity of f in order to take $x\ne y$ [see Eq. (16)]; (ii) convexity of g to get the term in $\Vert \mathsf {T}_\gamma (x) - y\Vert $ by strong convexity of the proximal surrogate [see Eq. (14)].
https://archive.ics.uci.edu/ml/datasets/ionosphere.

References

Chambolle, A., De Vore, R.A., Lee, N.Y., Lucier, B.J.: Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7(3), 319–335 (1998)
Article MathSciNet MATH Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics 57(11), 1413–1457 (2004)
Article MathSciNet MATH Google Scholar
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for $\ell _{1}$-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
Article MathSciNet MATH Google Scholar
Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in hilbert space. SIAM Journal on Optim. 14(3), 773–782 (2004)
Article MathSciNet MATH Google Scholar
Lorenz, D., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imag. Vis. 51(2), 311–325 (2014)
Article MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Article MathSciNet MATH Google Scholar
Aujol, J.F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to fista. SIAM J. Optim. 25(4), 2408–2433 (2015)
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than $1/k^2$. SIAM J. Optim. 26(3), 1824–1834 (2016)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate o (1/k2). Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Maingé, P.E.: Convergence theorems for inertial km-type algorithms. J. Comput. Appl. Math. 219(1), 223–236 (2008)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)
Article MathSciNet MATH Google Scholar
Malitsky, Y., Pock, T.: A first-order primal-dual algorithm with linesearch. arXiv preprint arXiv:1608.08883 (2016)
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Math. Program. 62(2), 261–275 (1993)
Article MathSciNet MATH Google Scholar
Bioucas-Dias, J.M., Figueiredo, M.A.: A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16(12), 2992–3004 (2007)
Article MathSciNet Google Scholar
Fuentes, M., Malick, J., Lemaréchal, C.: Descentwise inexact proximal algorithms for smooth optimization. Comput. Optim. Appl. 53(3), 755–769 (2012)
Article MathSciNet MATH Google Scholar
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in neural information processing systems, pp. 379–387 (2015)
Mu, Z., Peng, Y.: A note on the inertial proximal point method. Stat. Optim. Inf. Comput. 3(3), 241–248 (2015)
Article MathSciNet Google Scholar
Iutzeler, F., Hendrickx, J.M.: A generic linear rate acceleration of optimization algorithms via relaxation and inertia. arXiv preprint arXiv:1603.05398 (2016)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. 2. Springer, Heidelberg (1993)
MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book MATH Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
NGuyen, T.P.: Kurdyka-lojasiewicz and convexity: algorithms and applications. Ph.D. thesis, Toulouse University (2017)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)
Article MathSciNet MATH Google Scholar
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for kurdyka-łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)
Article MathSciNet MATH Google Scholar
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811 (2016)
Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Liang, J., Fadili, J., Peyré, G.: A multi-step inertial forward-backward splitting method for non-convex optimization. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 4035–4043. Curran Associates, Inc. (2016)
Chartrand, R., Yin, W.: Nonconvex sparse regularization and splitting algorithms. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 237–249. Springer (2016)

Download references

Acknowledgements

The work of the authors is partly supported by the PGMO Grant Advanced Non-smooth Optimization Methods for Stochastic Programming.

Author information

Authors and Affiliations

University of Grenoble Alpes, Grenoble, France
Franck Iutzeler
CNRS, LJK, Grenoble, France
Jérôme Malick

Authors

Franck Iutzeler
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Malick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franck Iutzeler.

Additional information

Communicated by Marc Teboulle.

Appendix

For the sake of completeness, we provide short and direct proofs of known lemmas recalled in Sect. 2.2.

Proof of Lemma 2.2

Let $x\in \mathbb {R}^n$; by definition, we have

$$\begin{aligned} \mathsf {T}_\gamma (x)&= \mathop {\hbox {argmin}}\limits _w \left( \gamma g(w) + \frac{1}{2} \left\| w- \left( x - \gamma \nabla f(x) \right) \right\| ^2 \right) \\&= \mathop {\hbox {argmin}}\limits _w \left( \underbrace{ f(x) + g(w) + \langle w-x ; \nabla f(x) \rangle + \frac{1}{2\gamma } \left\| w- x \right\| ^2 }_{s_x(w)} \right) \end{aligned}$$

and, as it is defined as the minimizer of $\frac{1}{\gamma }$-strongly convex surrogate function $s_x$, we have for any $y\in \mathbb {R}^n$ that $s_x ( \mathsf {T}_\gamma (x)) + \frac{1}{2\gamma } \Vert \mathsf {T}_\gamma (x) - y\Vert ^2 \le s_x (y)$ so

$$\begin{aligned}&f(x) + g(\mathsf {T}_\gamma (x) ) + \langle \mathsf {T}_\gamma (x) -x ; \nabla f(x) \rangle + \frac{\left\| \mathsf {T}_\gamma (x) - x \right\| ^2 }{2\gamma } + \frac{\left\| \mathsf {T}_\gamma (x) - y \right\| ^2 }{2\gamma } \nonumber \\&\quad \le f(x) + g(y) + \langle y-x ; \nabla f(x) \rangle + \frac{\left\| y- x \right\| ^2 }{2\gamma }. \end{aligned}$$

(14)

Now we use (i) the descent lemma on L-smooth function f (see [23, Th. 18.15]) to show that

$$\begin{aligned} f( \mathsf {T}_\gamma (x) ) \le f(x) + \langle \mathsf {T}_\gamma (x) -x ; \nabla f(x) \rangle + \frac{L}{2} \left\| \mathsf {T}_\gamma (x) - x \right\| ^2 \end{aligned}$$

(15)

and (ii) the convexity of f to have

$$\begin{aligned} f(x) + \langle y-x ; \nabla f(x) \rangle \le f(y) . \end{aligned}$$

(16)

Using Eq. (15) on the left-hand side of (14) and Eq. (16) on the right-hand side, we get

$$\begin{aligned}&f(\mathsf {T}_\gamma (x)) + g(\mathsf {T}_\gamma (x) ) + \frac{ (1-\gamma L) \left\| \mathsf {T}_\gamma (x) - x \right\| ^2 }{2\gamma } + \frac{\left\| \mathsf {T}_\gamma (x) - y \right\| ^2 }{2\gamma } \\&\quad \le f(y) + g(y) + \frac{\left\| y- x \right\| ^2 }{2\gamma }. \end{aligned}$$

$\square $

Proof of Lemma 2.3

Let $x\in \mathbb {R}^n$, and let $y = \mathsf {T}_\gamma (x) \in {{\mathrm{argmin}}}_w \left( \gamma g(w) + \frac{1}{2} \left\| w- \left( x - \gamma \nabla f(x) \right) \right\| ^2 \right) $, then

$$\begin{aligned}&0 \in \gamma \partial g(y) + y - x + \gamma \nabla f(x) ~~~~ \Leftrightarrow ~~~~ 0 \in \nabla f(y) + \partial g(y) + \nabla f(x) \\&\quad - \nabla f(y) + \frac{1}{\gamma }(y-x) \end{aligned}$$

so $ \nabla f(y) - \nabla f(x) + \frac{1}{\gamma }(x-y) \in \partial F(y)$, thus we have $ {{\mathrm{dist}}}(0,\partial F(y)) \le \Vert \nabla f(y) - \nabla f(x) + \frac{1}{\gamma }(x-y) \Vert \le \left( L + \frac{1}{\gamma }\right) \Vert x-y\Vert $. $\square $

Lemma

(Non-convex version of Lemma 2.2) Let Assumption 1 hold but with g possibly non-convex, and take $\gamma >0$. Then, for any $x,y\in \mathbb {R}^n$,

$$\begin{aligned} F( \mathsf {T}_\gamma (x) ) + \frac{ (1-\gamma L)}{2\gamma } \left\| \mathsf {T}_\gamma (x) - x \right\| ^2 \le F(y) + \frac{1}{2\gamma } \left\| x - y \right\| ^2. \end{aligned}$$

Proof

Let $x\in \mathbb {R}^n$; by definition, we have, as in the proof of Lemma 2.2,

$$\begin{aligned} \mathsf {T}_\gamma (x)&= \mathop {\hbox {argmin}}\limits _w \left( \gamma g(w) + \frac{1}{2} \left\| w- \left( x - \gamma \nabla f(x) \right) \right\| ^2 \right) \\&= \mathop {\hbox {argmin}}\limits _w \left( \underbrace{ f(x) + g(w) + \langle w-x ; \nabla f(x) \rangle + \frac{1}{2\gamma } \left\| w- x \right\| ^2 }_{s_x(w)} \right) \end{aligned}$$

and, as it is defined as a minimizer of (non-necessarily convex) surrogate function $s_x$, we have for any $y\in \mathbb {R}^n$ that $s_x ( \mathsf {T}_\gamma (x)) \le s_x (y)$ (which differs from the convex case of Lemma 2.2) thus

$$\begin{aligned}&f(x) + g(\mathsf {T}_\gamma (x) ) + \langle \mathsf {T}_\gamma (x) -x ; \nabla f(x) \rangle + \frac{\left\| \mathsf {T}_\gamma (x) - x \right\| ^2 }{2\gamma } \nonumber \\&\quad \le f(x) + g(y) + \langle y-x ; \nabla f(x) \rangle + \frac{\left\| y- x \right\| ^2 }{2\gamma }. \end{aligned}$$

(17)

The proof then follows the same lines as that of Lemma 2.2.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iutzeler, F., Malick, J. On the Proximal Gradient Algorithm with Alternated Inertia. J Optim Theory Appl 176, 688–710 (2018). https://doi.org/10.1007/s10957-018-1226-4

Download citation

Received: 28 July 2017
Accepted: 22 January 2018
Published: 31 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10957-018-1226-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Proximal Gradient Algorithm with Alternated Inertia

Abstract

Access this article

Similar content being viewed by others

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

New inertial proximal gradient methods for unconstrained convex optimization problems

A Proximal Point Algorithm Revisited and Extended

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proof of Lemma 2.2

Proof of Lemma 2.3

Lemma

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the Proximal Gradient Algorithm with Alternated Inertia

Abstract

Access this article

Similar content being viewed by others

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

New inertial proximal gradient methods for unconstrained convex optimization problems

A Proximal Point Algorithm Revisited and Extended

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof of Lemma 2.2

Proof of Lemma 2.3

Lemma

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation