Skip to main content
Log in

FISTA is an automatic geometrically optimized algorithm for strongly convex functions

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this work, we are interested in the famous FISTA algorithm. We show that FISTA is an automatic geometrically optimized algorithm for functions satisfying a quadratic growth assumption. This explains why FISTA works better than the standard Forward-Backward algorithm (FB) in such a case, although FISTA is known to have a polynomial asymptotic convergence rate while FB is exponential. We provide a simple rule to tune the \(\alpha \) parameter within the FISTA algorithm to reach an \(\varepsilon \)-solution with an optimal number of iterations. These new results highlight the efficiency of FISTA algorithms, and they rely on new non asymptotic bounds for FISTA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Alamo, T., Krupa, P., Limon, D.: Gradient based restart FISTA. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 3936–3941. IEEE (2019)

  2. Alamo, T., Limon, D., Krupa, P.: Restart fista with global linear convergence. In: 2019 18th European Control Conference (ECC), pp. 1969–1974. IEEE (2019)

  3. Apidopoulos, V., Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. Math. Program.(2020)

  4. Attouch, H., Chbani, Z.: Fast inertial dynamics and FISTA algorithms in convex optimization. Perturbation aspects. arXiv preprint arXiv:1507.01367 (2015)

  5. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168(1–2), 123–175 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Aujol, J.F., Dossal, C., Rondepierre, A.: Optimal convergence rates for Nesterov acceleration. SIAM J. Optim. 29(4), 3131–3153 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Aujol, J.F., Dossal, C., Rondepierre, A.: FISTA is an automatic geometrically optimized algorithm for strongly convex functions. HAL preprint (2021)

  8. Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of the heavy-ball method for quasi-strongly convex optimization. SIAM J. Optim. 32, 1817–1842 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  9. Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of the Heavy–Ball method under the Łojasiewicz property. Math. Program. (2022)

  10. Aujol, J.F., Dossal, C.H., Labarrière, H., Rondepierre, A.: FISTA restart using an automatic estimation of the growth parameter (2021). https://hal.archives-ouvertes.fr/hal-03153525

  11. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bolte, J., Nguyen, T., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm.” J. Optim. Theory Appl. 166(3), 968–982 (2015)

  15. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  16. Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  17. Garrigos, G., Rosasco, L., Villa, S.: Convergence of the forward-backward algorithm: beyond the worst case with the help of geometry. Math. Program. (2022)

  18. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles (Paris, 1962), pp. 87–89. Éditions du Centre National de la Recherche Scientifique, Paris (1963)

  19. Łojasiewicz, S.: Sur la géométrie semi- et sous-analytique. Ann. l’Inst. Four. Univer. Grenoble 43(5), 1575–1595 (1993)

    Article  MATH  Google Scholar 

  20. Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175(1–2), 69–107 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(\frac{1}{k^2})\). In: Soviet Mathematics Doklady, vol. 27, no. 2, pp. 372–376 (1983)

  22. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  23. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer (2013)

  24. Ochs, P., Brox, T., Pock, T.: iPiasco: inertial proximal algorithm for strongly convex optimization. J. Math. Imaging Vis. 53(2), 171–181 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Park, C., Park, J., Ryu, E.K.: Factor-\(\sqrt{2}\) acceleration of accelerated gradient methods. arXiv preprint arXiv:2102.07366 (2021)

  27. Siegel, J.: Accelerated first-order methods: differential equations and Lyapunov functions. arXiv preprint arXiv:1903.05671 (2019)

  28. Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  29. Taylor, A., Drori, Y.: An optimal gradient method for smooth strongly convex minimization. Math. Progr. (2022)

Download references

Acknowledgements

J.-F. Aujol acknowledges the support of the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 777826. The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-PRC-CE23 MaSDOL and the support of FMJH Program PGMO 2019-0024 and from the support to this program from EDF-Thales-Orange.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Rondepierre.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A The continuous case: Proof of Theorem 5

Our analysis is based on the following Lyapunov energy:

$$\begin{aligned} {\mathcal {E}}(t)=t^2(F(x(t)-F^*)+\frac{1}{2}\left\| {\lambda (x(t)-x^*)+t{\dot{x}}(t)}\right\| ^2,\quad \lambda =\frac{2\alpha }{\gamma +2} \end{aligned}$$
(78)

where the parameter \(\lambda \) is chosen accordingly to Aujol et al. [6]. Remember that the expected asymptotic convergence rate is polynomial in \(\mathcal O\left( t^{-\frac{2\alpha \gamma }{\gamma +2}}\right) \) [6] with an exponent equal to \(\lambda \gamma \). Differentiating the Lyapunov energy \({\mathcal {E}}\), we easily prove that:

$$\begin{aligned} {\mathcal {E}}'(t)+\frac{\gamma \lambda -2}{t}{\mathcal {E}}(t)= & {} \lambda \gamma t\left( F(x(t))-F^* -\frac{1}{\gamma }\langle \nabla F(x(t)),x(t)-x^*\rangle \right) \\{} & {} +\,\frac{\lambda ^2(\gamma \lambda -2)}{2t}\Vert x(t)-x^*\Vert ^2\\{} & {} + (\lambda ^2(\gamma +1) -\lambda -\alpha \lambda )\langle x(t)-x^*,{\dot{x}}(t)\rangle \\{} & {} +\, t\left( \lambda +1-\alpha + \frac{\gamma \lambda -2}{2}\right) \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

Using the flatness assumption and replacing \(\lambda =\frac{2\alpha }{\gamma +2}\), we finally get:

$$\begin{aligned} {\mathcal {E}}'(t)+\frac{\gamma \lambda -2}{t}{\mathcal {E}}(t) \leqslant K(\alpha ) \left( \frac{2\alpha }{(\gamma +2)t}\Vert x(t)-x^*\Vert ^2 + \langle x(t)-x^*,{\dot{x}}(t)\rangle \right) \nonumber \\ \end{aligned}$$
(79)

where: \(K(\alpha )=\frac{2\alpha \gamma }{(\gamma +2)^2}(\alpha -1-\frac{2}{\gamma })\). We now need to control the scalar product whose sign is unknown. Combining the following two inequalities:

$$\begin{aligned} |\langle x(t)-x^*,{\dot{x}}(t)\rangle |\leqslant \frac{\sqrt{\mu }}{2}\left\| {x(t)-x^*}\right\| ^2+\frac{1}{2\sqrt{\mu }}\left\| {\dot{x}(t)}\right\| ^2 \end{aligned}$$
(80)

where the coefficients to bound the scalar product \(\sqrt{\mu }\) are chosen to get the tightest control on the energy, and

$$\begin{aligned} t^2\left\| {\dot{x}(t)}\right\| ^2\leqslant & {} \left( 1+\theta \frac{\alpha }{t\sqrt{\mu }}\right) \left\| {\lambda (x(t)-x^*)+t\dot{x}(t)}\right\| ^2+\lambda ^2\left( 1+\frac{t\sqrt{\mu }}{\theta \alpha }\right) \left\| {x(t)-x^*}\right\| ^2\nonumber \\ \end{aligned}$$
(81)

for any \(\theta >0\), we get:

$$\begin{aligned} {\mathcal {E}}'(t)+\frac{\gamma \lambda -2}{t}{\mathcal {E}}(t)\leqslant & {} K(\alpha ) \left[ \frac{\sqrt{\mu }}{2} + \frac{2\alpha }{(\gamma +2)t} \left( 1+\frac{1}{(\gamma +2)\theta }\right) \right. \nonumber \\{} & {} \left. +\,\frac{2\alpha ^2}{(\gamma +2)^2\sqrt{\mu }t^2} \right] \Vert x(t)-x^*\Vert ^2\nonumber \\{} & {} +\, \frac{K(\alpha )}{2\sqrt{\mu }t^2}\left( 1+\theta \frac{\alpha }{t\sqrt{\mu }}\right) \left\| {\lambda (x(t)-x^*)+t\dot{x}(t)}\right\| ^2 \end{aligned}$$
(82)
$$\begin{aligned}\leqslant & {} \frac{2}{\mu }K(\alpha ) \left[ \frac{\sqrt{\mu }}{2} + \frac{2\alpha }{(\gamma +2)t} \left( 1+\frac{1}{(\gamma +2)\theta }\right) \right. \nonumber \\{} & {} \left. +\frac{2\alpha ^2}{(\gamma +2)^2\sqrt{\mu }t^2} \right] (F(x(t))-F^*)\nonumber \\{} & {} +\,\frac{K(\alpha )}{2\sqrt{\mu }t^2}\left( 1+\theta \frac{\alpha }{t\sqrt{\mu }}\right) \left\| {\lambda (x(t)-x^*)+t\dot{x}(t)}\right\| ^2\nonumber \\ \end{aligned}$$
(83)

using the growth condition \({\mathcal {G}}^2_\mu \). We then choose the parameter \(\theta \) to make equal the coefficients before \(\frac{1}{t^3}\) in \(t^2(F(x(t))-F^*)\) and \(\frac{1}{2}\left\| {\lambda (x(t)-x^*)+t\dot{x}(t)}\right\| ^2\), i.e. such that:

$$\begin{aligned} \frac{2}{\mu }\frac{2\alpha }{(\gamma +2)} \left( 1+\frac{1}{(\gamma +2)\theta }\right) = \frac{\theta \alpha }{\mu } \end{aligned}$$
(84)

or equivalently:

$$\begin{aligned} (\gamma +2)^2\theta ^2-4(\gamma +2)\theta -4=0. \end{aligned}$$
(85)

A straightforward computation shows that this last equation has exactly one positive root:

$$\begin{aligned} \theta = \frac{2}{\gamma +2} (1+\sqrt{2}). \end{aligned}$$
(86)

For these choice of parameters, we have:

$$\begin{aligned} {\mathcal {E}}'(t)+\frac{\gamma \lambda -2}{t}{\mathcal {E}}(t)\leqslant \frac{K(\alpha )}{\mu t^2}\left( \sqrt{\mu }+\frac{2\alpha }{(\gamma +2)t}(1+\sqrt{2}) +\frac{4\alpha ^2}{(\gamma +2)^2\sqrt{\mu }t^2}\right) {\mathcal {E}}(t).\nonumber \\ \end{aligned}$$
(87)

Let us now define:

$$\begin{aligned} \varphi (t):=\ \frac{K(\alpha )}{\mu t^2}\left( \sqrt{\mu }+\frac{2\alpha }{(\gamma +2)t}(1+\sqrt{2}) +\frac{4\alpha ^2}{(\gamma +2)^2\sqrt{\mu }t^2}\right) \end{aligned}$$
(88)

and: \(\Phi (t)=\int _{t}^{+\infty }\varphi (x)dx\). We so have:

$$\begin{aligned} \forall t\geqslant t_0,\quad {\mathcal {E}}'(t)+\frac{\gamma \lambda -2}{t}{\mathcal {E}}(t) \leqslant \varphi (t){\mathcal {E}}(t). \end{aligned}$$

Consequently the function \(t\mapsto {\mathcal {E}}(t)t^{\lambda \gamma -2}e^{\Phi (t)}\) is non-increasing, and for any \(t_1\in {\mathbb {R}}\), we get:

$$\begin{aligned} \forall t\geqslant t_1,\quad {\mathcal {E}}(t)\leqslant {\mathcal {E}}(t_1)\left( \frac{t_1}{t}\right) ^{\lambda \gamma -2}e^{\Phi (t_1)-\Phi (t)}. \end{aligned}$$
(89)

A good choice of \(t_1\) is one ensuring a control as tight as possible on the energy \({\mathcal {E}}\). For that, \(t_1\) is chosen such that \(t_1\) minimizes the function \(u\mapsto u^{\lambda \gamma -2}e^{\Phi (u)}\) i.e. such that \(t_1\) satisfies the equation:

$$\begin{aligned} \frac{\lambda \gamma -2}{u}-\varphi (u)=0 \end{aligned}$$
(90)

Noticing that: \(\lambda \gamma -2=\frac{\gamma +2}{\alpha } K(\alpha )\) and simplifying the equation by \(K(\alpha )\), the equation can be rewritten as:

$$\begin{aligned} \frac{\gamma +2}{\alpha u}=\frac{1}{\mu u^2} \left( \sqrt{\mu }+\frac{2\alpha }{(\gamma +2)u}(1+\sqrt{2}) +\frac{4\alpha ^2}{(\gamma +2)^2\sqrt{\mu }u^2}\right) . \end{aligned}$$
(91)

Introducing \(r=(\gamma +2)\frac{\sqrt{\mu }}{\alpha }u\), we finally have to solve:

$$\begin{aligned} r^3-r^2-2(1+\sqrt{2})r-4=0. \end{aligned}$$
(92)

A straightforward computation shows that the polynomial \(r\mapsto r^3-r^2-2(1+\sqrt{2})r-4\) has only one real root: \(r^*\simeq 3\) (for which Python gives us an analytical value).

Defining \(t_1=\frac{\alpha }{(\gamma +2)\sqrt{\mu }}r^*\), the control on the energy is given by:

$$\begin{aligned} \forall t\geqslant t_1,\quad {\mathcal {E}}(t)\leqslant {\mathcal {E}}(\frac{\alpha }{(\gamma +2)\sqrt{\mu }}r^*)\left( \frac{\alpha r^*}{t(\gamma +2)\sqrt{\mu }}\right) ^{\gamma \lambda -2}e^{\Phi (t_1)-\Phi (t)}. \end{aligned}$$
(93)

Observe now that the term \({\mathcal {E}}(\frac{\alpha }{(\gamma +2)\sqrt{\mu }}r^*)\) can be bounded by the mechanical energy of the system:

$$\begin{aligned} E_m(t)=F(x(t))-F^* + \frac{1}{2}\Vert \dot{x}(t)\Vert ^2 \end{aligned}$$
(94)

Note that this energy is non-increasing since: \(E_m'(t) = \langle \nabla F(x(t))+ \ddot{x}(t),\dot{x}(t)\rangle = -\frac{\alpha }{t}\Vert \dot{x}(t)\Vert ^2 \leqslant 0\), hence \(E_m\) is uniformly bounded on \([t_0,+\infty [\). We then have:

$$\begin{aligned} {\mathcal {E}}(t_1)&=t_1^2(F(x(t_1))-F^*)+\frac{1}{2}\left\| {\frac{2\alpha }{\gamma +2}(x(t_1)-x^*)+t_1\dot{x}(t_1))}\right\| ^2\\&= t_1^2(F(x(t_1))-F^*+\frac{1}{2}\left\| {{\dot{x}}(t_1)}\right\| ^2)\\&\quad +\frac{2\alpha ^2}{(\gamma +2)^2}\left\| {x(t_1)-x^*}\right\| ^2+\frac{2\alpha }{\gamma +2}t_1\langle x(t_1)-x^*,\dot{x}(t_1)\rangle \\&=t_1^2E_m(t_1)+\frac{2\alpha ^2}{(\gamma +2)^2}\left\| {x(t_1)-x^*}\right\| ^2+\frac{2\alpha }{\gamma +2}t_1\langle x(t_1)-x^*,\dot{x}(t_1)\rangle \end{aligned}$$

Using again (80) to control the scalar product combined with the quadratic growth condition \({\mathcal {G}}_\mu ^2\), we can prove that:

$$\begin{aligned} 2\langle x(t_1)-x^*,\dot{x}(t_1)\rangle\leqslant & {} \sqrt{\mu } \Vert x(t_1)-x^*\Vert ^2 + \frac{1}{\sqrt{\mu }}\Vert \dot{x}(t_1)\Vert ^2\\\leqslant & {} \frac{2}{\sqrt{\mu }}(F(x(t_1))-F^*)+ \frac{1}{\sqrt{\mu }}\Vert \dot{x}(t_1)\Vert ^2 = \frac{2}{\sqrt{\mu }}E_m(t_1) \end{aligned}$$

Noticing that the quadratic growth condition also implies:

$$\begin{aligned} \left\| {x(t_1)-x^*}\right\| ^2\leqslant \frac{2}{\mu }(F(x(t_1))-F^*) \leqslant \frac{2}{\mu }E_m(t_1) \end{aligned}$$

and remembering that \(t_1=\frac{\alpha }{(\gamma +2)\sqrt{\mu }}r^*\), we finally get:

$$\begin{aligned} {\mathcal {E}}(t_1)&\leqslant t_1^2E_m(t_1) + \frac{2\alpha ^2}{(\gamma +2)^2}\left\| {x(t_1)-x^*}\right\| ^2 + \frac{2\alpha }{(\gamma +2)\sqrt{\mu }}t_1E_m(t_1) \end{aligned}$$
(95)
$$\begin{aligned}&\leqslant \left[ t_1^2 + \frac{4\alpha ^2}{(\gamma +2)^2\mu } + \frac{2\alpha }{(\gamma +2)\sqrt{\mu }}t_1\right] E_m(t_1) =\left( 1+\frac{2}{r^{*}}+\frac{4}{r^{*2}}\right) t_1^2E_m(t_1) \end{aligned}$$
(96)
$$\begin{aligned}&\leqslant \left( 1+\frac{2}{r^{*}}+\frac{4}{r^{*2}}\right) t_1^2E_m(t_0) \end{aligned}$$
(97)

Observe that the primitive \(\Phi (t)=\int _{t}^{+\infty }\varphi (x)dx\) of \(\varphi \) has a simple analytic expression showing that \(\Phi \) is non-positive and:

$$\begin{aligned} \Phi (t_1)=(\gamma +2)\frac{K(\alpha )}{\alpha }\left( \frac{1}{r^*} + \frac{1+\sqrt{2}}{r^{*2}}+\frac{4}{3r^{*3}}\right) \end{aligned}$$
(98)

We finally obtain the following control on the values:

$$\begin{aligned} F(x(t))-F^*\leqslant C_1E_m(t_0)\left( \frac{\alpha r^*}{t(\gamma +2)\sqrt{\mu }}\right) ^{\frac{2\alpha \gamma }{\gamma +2}}e^{\frac{2\gamma }{\gamma +2}C_2(\alpha -1-\frac{2}{\gamma })} \end{aligned}$$
(99)

where \( C_1 = 1+\frac{2}{r^{*}}+\frac{4}{r^{*2}},~C_2 =\frac{1}{r^*} + \frac{1+\sqrt{2}}{r^{*2}}+\frac{4}{3r^{*3}}. \)

B Technical Lemmas for Theorem 6

The proof of Theorem 6 is based on the following Lyapunov energy:

$$\begin{aligned} E_n = 2sn^2(F(x_n)-F^*) + \Vert \lambda (x_{n-1}-x^*)+n(x_n-x_{n-1})\Vert ^2 \end{aligned}$$
(100)

which can be rewritten as:

$$\begin{aligned} E_n = n^2w_n + \left( \lambda ^2-\lambda n\right) h_{n-1} + \left( n^2-\lambda n\right) \delta _n +\lambda n h_n \end{aligned}$$
(101)

using the reduced notations (65).

1.1 B.1 Proof of Lemma 1.

First step: using the reduced notations (65), we prove that:

$$\begin{aligned} E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n\leqslant & {} \frac{4\alpha K(\alpha )}{3}\frac{h_n}{n} + A_1(n,\alpha ) \delta _n + B_1(n,\alpha ) (h_{n-1}-h_n)\nonumber \\{} & {} + B_3(n,\alpha )(h_{n+1}-h_n-\delta _{n+1}) \end{aligned}$$
(102)

with:

$$\begin{aligned} A_1(n,\alpha )= & {} \frac{17\alpha ^2}{9}-\frac{8\alpha }{3}+2-\alpha \frac{(10\alpha ^2-18\alpha +9)n+7\alpha ^3-12\alpha ^2+6\alpha }{3(n+\alpha )^2},\\ B_1(n,\alpha )= & {} -\frac{2}{9}\alpha ^2+\frac{4}{3}\alpha -1+\frac{1}{3}\frac{3\alpha -2\alpha ^3}{n+\alpha }+\frac{1}{27}\frac{8\alpha ^3-24\alpha ^2}{n},\quad B_3(n,\alpha )=\frac{2}{3}\alpha -1. \end{aligned}$$

Indeed:

$$\begin{aligned} E_{n+1} -\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) E_n{} & {} = (n+1)^2w_{n+1}-\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) n^2w_n \nonumber \\{} & {} \quad + \,\left( (n+1)^2-\lambda (n+1)\right) \delta _{n+1}- \left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) \left( n^2-\lambda n\right) \delta _n\nonumber \\{} & {} \quad +\,\left( \lambda ^2-\lambda (n+1)-\lambda n \left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) \right) h_n+\lambda (n+1)h_{n+1}\nonumber \\{} & {} \quad -\, (\lambda ^2-\lambda n)\left( 1 -\frac{\frac{2\alpha }{3}-2}{n}\right) h_{n-1} \end{aligned}$$
(103)

Observe now that:

$$\begin{aligned}{} & {} (n+1)^2w_{n+1}-\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) n^2w_n =\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) n^2(w_{n+1}-w_n)\\{} & {} \qquad +\,\left( (n+1)^2-n^2\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) \right) w_{n+1}\\{} & {} \quad = n\left( n -\left( \frac{2\alpha }{3}-2\right) \right) (w_{n+1}-w_n)+\left( \frac{2\alpha }{3} n +1\right) w_{n+1} \end{aligned}$$

Combining the two following inequalities

$$\begin{aligned} w_{n+1}-w_n\leqslant \alpha _n^2\delta _n-\delta _{n+1} \end{aligned}$$
(104)

from Chambolle and Dossal [14] and:

$$\begin{aligned} w_{n+1}\leqslant \Vert x_n+\alpha _n(x_n-x_{n-1})-x^*\Vert ^2 -\Vert x_{n+1}-x^*\Vert ^2 \end{aligned}$$

from Apidopoulos et al. [3], or equivalently with our notations:

$$\begin{aligned} w_{n+1}\leqslant (1+\alpha _n)h_n-\alpha _nh_{n-1}-h_{n+1}+(\alpha _n+\alpha _n^2)\delta _n \end{aligned}$$
(105)

we then deduce:

$$\begin{aligned}{} & {} (n+1)^2w_{n+1}-\left( 1 - \frac{\frac{2\alpha }{3}-2}{n}\right) n^2w_n \\{} & {} \quad \leqslant n\left( n - \frac{2\alpha }{3}+2\right) (\alpha _n^2\delta _n-\delta _{n+1})+\left( \frac{2\alpha }{3} n+1\right) \\{} & {} \qquad \left( (1+\alpha _n)h_n-\alpha _n h_{n-1}-h_{n+1}+(\alpha _n+\alpha _n^2)\delta _n\right) \end{aligned}$$

It follows:

$$\begin{aligned} E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n\leqslant & {} A_1(n,\alpha )\delta _n+A_2(n,\alpha )\delta _{n+1}+B_1(n,\alpha )h_{n-1}\nonumber \\{} & {} +B_2(n,\alpha )h_n+B_3(n,\alpha )h_{n+1} \end{aligned}$$
(106)

where:

$$\begin{aligned} A_1(n,\alpha )= & {} \frac{17\alpha ^2}{9}-\frac{8\alpha }{3}+2{-}\alpha \frac{(10\alpha ^2-18\alpha +9)n+7\alpha ^3-12\alpha ^2+6\alpha }{3(n+\alpha )^2},\\ A_2(n,\alpha )= & {} 1-\frac{2\alpha }{3}\\ B_1(n,\alpha )= & {} -\frac{2}{9}\alpha ^2+\frac{4}{3}\alpha -1+\frac{1}{3}\frac{3\alpha -2\alpha ^3}{n+\alpha }+\frac{1}{27}\frac{8\alpha ^3-24\alpha ^2}{n}, \end{aligned}$$

and

$$\begin{aligned} B_2(n,\alpha )=\frac{2}{9}\alpha ^2-2\alpha +2-\frac{1}{3}\frac{3\alpha -2\alpha ^3}{n+\alpha },\quad B_3(n,\alpha )=\frac{2}{3}\alpha -1. \end{aligned}$$
(107)

Observe now that: \(A_2(n,\alpha )=-B_3(n,\alpha )\) and:

$$\begin{aligned} B_1(n,\alpha )+B_2(n,\alpha )+B_3(n,\alpha )=\frac{8\alpha ^2}{27}\frac{\alpha -3}{n}=\frac{4\alpha K(\alpha )}{3n}. \end{aligned}$$

so that (106) becomes:

$$\begin{aligned} E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n\leqslant & {} \frac{4\alpha K(\alpha )}{3}\frac{h_n}{n} + A_1(n,\alpha ) \delta _n + B_1(n,\alpha ) (h_{n-1}-h_n)\nonumber \\{} & {} +\, B_3(n,\alpha )(h_{n+1}-h_n-\delta _{n+1}) \end{aligned}$$
(108)

Step 2: First observe that combining the growth condition \({\mathcal {G}}_\mu ^2\) with the control of the values by the energy (namely: \(E_n \geqslant n^2w_n\) for all n), we have:

$$\begin{aligned} \forall n\in {\mathbb {N}}^*, \quad \frac{h_n}{n} \leqslant \frac{w_n}{\kappa n}\leqslant \frac{E_n}{\kappa n^3}\leqslant \frac{E_n}{\kappa n(n-\frac{2\alpha }{3})^2}, \end{aligned}$$

so that applying the following Lemma whose proof is detailed in Appendix B.4:

Lemma 4

For all \(n\geqslant 1\) and any \((A,B)\in {\mathbb {R}}^2\)

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant \left( 2|A+B|+\frac{\sqrt{2}|B|}{\sqrt{s\mu }}\right) \left( 1+\frac{4\alpha ^2}{9s\mu n^2}\right) \frac{E_n}{\left( n-\frac{2\alpha }{3}\right) ^2}. \end{aligned}$$

we can prove that:

$$\begin{aligned} \frac{4\alpha K(\alpha )}{3}\frac{h_n}{n} + A_1(n,\alpha ) \delta _n + B_1(n,\alpha ) (h_{n-1}-h_n) \leqslant \frac{\widetilde{C}_1(n,\alpha ,\kappa )E_n}{\left( n-\frac{2\alpha }{3}\right) ^2} \end{aligned}$$
(109)

and:

$$\begin{aligned} B_3(n,\alpha )(h_{n+1}-h_n-\delta _{n+1}) \leqslant \frac{{\widetilde{C}}_2(n,\alpha ,\kappa )E_{n+1}}{(n+1-\frac{2\alpha }{3})^2} \end{aligned}$$
(110)

where:

$$\begin{aligned} \widetilde{C}_1(n,\alpha ,\kappa )= & {} 2\left| \frac{5}{3}\alpha ^2-\frac{4\alpha }{3}+1 +R(n,\alpha )\right| \nonumber \\{} & {} +\,\sqrt{2}\left( \frac{|-\frac{2\alpha ^2}{9}+\frac{4\alpha }{3}-1+Q(n,\alpha )|}{\sqrt{\kappa }}\right) \left( 1+\frac{4\alpha ^2}{9\kappa n^2}\right) +\frac{4\alpha K(\alpha )}{3\kappa n}\nonumber \\ \end{aligned}$$
(111)

with:

$$\begin{aligned} |R(\alpha ,n)|= & {} \left| A_1(n,\alpha )+B_1(n,\alpha )-\left( \frac{5}{3}\alpha ^2-\frac{4\alpha }{3}+1\right) \right| \leqslant \frac{8\alpha ^3}{n}\\ |Q(\alpha ,n)|= & {} \frac{\alpha ^3}{3n}\left| n\frac{3-2\alpha ^2}{\alpha ^2(n+\alpha )}+8\frac{\alpha -3}{9\alpha }\right| \leqslant \frac{\alpha ^3}{n}, \end{aligned}$$

and:

$$\begin{aligned} {\widetilde{C}}_2(n,\alpha ,\kappa )=\left( \frac{2\alpha }{3}-1\right) \left( 4 +\frac{\sqrt{2}}{\sqrt{\kappa }}\right) \left( 1+\frac{4\alpha ^2}{9\kappa (n+1)^2}\right) \end{aligned}$$
(112)

Finally observe that since \(\kappa \in [0,1]\), for all \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\), we have:

$$\begin{aligned} \frac{1}{n-\frac{2 \alpha }{3}} \leqslant \frac{1}{n} \left( 1 + \sqrt{\kappa }\right) \quad \text{ and } \quad \frac{1}{n+1-\frac{2 \alpha }{3}} \leqslant \frac{1}{n+1} \left( 1 + \sqrt{\kappa }\right) \end{aligned}$$
(113)

hence:

$$\begin{aligned} \forall n\geqslant & {} \frac{4\alpha }{3\sqrt{\kappa }},\quad E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n \nonumber \\\leqslant & {} (1+\sqrt{\kappa })^2\left( \widetilde{C}_1(n,\alpha ,\kappa )\frac{E_n}{n^2}+\widetilde{C}_2(n,\alpha ,\kappa )\frac{E_{n+1}}{(n+1)^2}\right) . \end{aligned}$$
(114)

Step 3: The last step is to uniformly bound the coefficients \({\widetilde{C}}_1(n,\alpha ,\kappa )\) and \({\widetilde{C}}_2(n,\alpha ,\kappa )\) with respect to n. For any \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\) and \(\alpha \geqslant 3\), we have:

$$\begin{aligned} {\widetilde{C}}_2(n,\alpha ,\kappa )= & {} \left( \frac{2\alpha }{3}-1\right) \left( 4 +\frac{\sqrt{2}}{\sqrt{\kappa }}\right) \left( 1+\frac{4\alpha ^2}{9\kappa (n+1)^2}\right) \\\leqslant & {} \frac{5}{4}\sqrt{\frac{2}{\kappa }} \left( \frac{2\alpha }{3}-1 \right) \left( 1+2 \sqrt{2\kappa }\right) \end{aligned}$$

The calculations to bound the coefficient \(\widetilde{C}_1(n,\alpha ,\kappa )\) are similar but a little more painful. For all \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\), we have:

$$\begin{aligned} 4\alpha \frac{K(\alpha )}{3\kappa n} \leqslant \frac{2\alpha (\alpha -3)}{9\sqrt{\kappa }}\quad \text{ and } \quad \frac{4\alpha ^2}{9\kappa n^2} \leqslant \frac{1}{4} \end{aligned}$$

so that for all \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\):

$$\begin{aligned} {\widetilde{C}}_1(n,\alpha ,\kappa )= & {} 2\left| \frac{5}{3}\alpha ^2-\frac{4\alpha }{3}+1+R(n,\alpha )\right| +\sqrt{\frac{2}{\kappa }}\left| -\frac{2\alpha ^2}{9} +\frac{4\alpha }{3}-1\right. \\{} & {} \left. \,+Q(n,\alpha )\right| \left( 1+\frac{4\alpha ^2}{9\kappa n^2}\right) +4\alpha \frac{K(\alpha )}{3\kappa n}\\\leqslant & {} \frac{5}{4} \sqrt{\frac{2}{\kappa }}\left[ \left| -\frac{2\alpha ^2}{9}+\frac{4\alpha }{3}-1+Q(n,\alpha )\right| +\frac{4\sqrt{2}\alpha (\alpha -3)}{45}+\frac{4}{5}\left| \frac{5}{3}\alpha ^2\right. \right. \\{} & {} \left. \left. -\,\frac{4\alpha }{3}+1+R(n,\alpha )\right| \sqrt{2\kappa }\right] \end{aligned}$$

Assuming now that \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\), we have: \(\left| -\frac{2\alpha ^2}{9}+\frac{4\alpha }{3}-1\right| = \frac{2}{9}(\alpha -3)^2-1\), and:

$$\begin{aligned} {\widetilde{C}}_1(n,\alpha ,\kappa )\leqslant & {} \frac{5}{4} \sqrt{\frac{2}{\kappa }}\left[ \frac{2}{9}(\alpha -3)^2-1 +\frac{6\alpha (\alpha -3)}{45}+|Q(n,\alpha )\right. \\{} & {} \left. +\, \frac{4}{5}\left| \frac{5}{3}\alpha ^2-\frac{4\alpha }{3}+1+R(n,\alpha )\right| \sqrt{2\kappa }\right] \end{aligned}$$

Let The coefficient \({\widetilde{C}}_1(n,\alpha ,\kappa )\) can be rewritten as:

$$\begin{aligned} \forall n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}, \quad \widetilde{C}_1(n,\alpha ,\kappa )\leqslant & {} \frac{5}{4} \sqrt{\frac{2}{\kappa }}P(\alpha ) \left[ 1+ \left| \frac{Q(n,\alpha )}{P(\alpha )}\right| \right. \\{} & {} \left. +\, \left( \frac{5\alpha ^2-4\alpha +3}{3P(\alpha )}+\left| \frac{R(n,\alpha )}{P(\alpha )}\right| \right) \sqrt{2\kappa }\right] . \end{aligned}$$

Studying the variations of the functions \(\alpha \mapsto \frac{\alpha ^2}{P(\alpha )}\) and \(\alpha \mapsto \frac{5\alpha ^2-4\alpha +3}{P(\alpha )}\), we easily prove that they are uniformly bounded for any real \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\), so that there exists a real constant \(B>0\) such that:

$$\begin{aligned} \forall n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}, \quad \left| \frac{Q(n,\alpha )}{P(\alpha )}\right| \leqslant \frac{\alpha ^3}{nP(\alpha )}\leqslant B\sqrt{\kappa }. \end{aligned}$$

Likewise:

$$\begin{aligned} \forall n\geqslant \frac{4\alpha }{3\sqrt{\kappa }},\quad \left| \frac{R(n,\alpha )}{P(\alpha )}\right| \leqslant 8\frac{\alpha ^3}{nP(\alpha )}\leqslant B\sqrt{\kappa }. \end{aligned}$$

It finally exists some real constants \({{\tilde{c}}}_1\) and \(\tilde{c}_2\) such that for any \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\) and any \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\),

$$\begin{aligned} {\widetilde{C}}_1(n,\alpha ,\kappa ) \leqslant \frac{5}{4} \sqrt{\frac{2}{\kappa }} P(\alpha ) \left( 1+ {\tilde{c}}_1 \sqrt{\kappa }{+ {\tilde{c}}_2 \kappa } \right) . \end{aligned}$$
(115)

Combining (113) and (115), the inequality (67) holds as expected for any \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\) and without any condition on \(\kappa \):

$$\begin{aligned} {\forall n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}},\quad E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n \leqslant \frac{C_1(\alpha ,\kappa )E_n}{n^2}+\frac{C_2(\alpha ,\kappa )E_{n+1}}{(n+1)^2} \end{aligned}$$

with:

$$\begin{aligned} C_1(\alpha ,\kappa )= & {} \frac{5}{4} \sqrt{\frac{2}{\kappa }}\left[ \frac{2}{9}(\alpha -3)\left( \frac{8}{5}\alpha -3\right) -1\right] (1+\sqrt{\kappa })^2 \left( 1+ {\tilde{c}}_1 \sqrt{\kappa }{+ {\tilde{c}}_2 \kappa } \right) \end{aligned}$$
(116)
$$\begin{aligned} C_2(\alpha ,\kappa )= & {} \frac{5}{4}\sqrt{\frac{2}{\kappa }} \left( \frac{2\alpha }{3}-1\right) (1+\sqrt{\kappa })^2(1+2 \sqrt{2\kappa }). \end{aligned}$$
(117)

\(\square \)

1.2 B.2 Proof of Lemma 2

Assume that the energy \(E_n\) satisfies:

$$\begin{aligned} E_{n+1}-\left( 1-\frac{\frac{2\alpha }{3}-2}{n}\right) E_n \leqslant \frac{C_1(\alpha ,\kappa )E_n}{n^2}+\frac{C_2(\alpha ,\kappa )E_{n+1}}{(n+1)^2} \end{aligned}$$

i.e.:

$$\begin{aligned} \left( 1-\frac{C_2(\alpha ,\kappa )}{(n+1)^2}\right) E_{n+1}-\left( 1 -\frac{\frac{2\alpha }{3}-2}{n} +\frac{C_1(\alpha ,\kappa )}{n^2}\right) E_n\leqslant 0. \end{aligned}$$
(118)

Let \(n_0\geqslant {\frac{4\alpha }{3\sqrt{\kappa }}}\). We then deduce:

$$\begin{aligned} \forall n\geqslant n_0,\quad \log (E_{n+1})-\log (E_{n_0})\leqslant \sum _{k=n_0}^n\log \left( \frac{1-\frac{\frac{2\alpha }{3}-2}{k}+\frac{C_1(\alpha ,\kappa )}{k^2} }{1-\frac{C_2(\alpha ,\kappa )}{(k+1)^2}}\right) . \end{aligned}$$
(119)

Using now the following classical inequalities:

$$\begin{aligned} \forall x>-1,\quad \frac{x}{x+1} \leqslant \log (1+x) \leqslant x, \end{aligned}$$
(120)

we get:

$$\begin{aligned} \log \left( 1 -\frac{\frac{2\alpha }{3}-2}{k} +\frac{C_1(\alpha ,\kappa )}{k^2}\right) \leqslant -\frac{\frac{2\alpha }{3}-2}{k} +\frac{C_1(\alpha ,\kappa )}{k^2} \end{aligned}$$
(121)

and

$$\begin{aligned} -\log \left( 1-\frac{C_2(\alpha ,\kappa )}{(k+1)^2}\right) \leqslant \frac{C_2(\alpha ,\kappa )}{(k+1)^2-C_2(\alpha ,\kappa )} \end{aligned}$$
(122)

We therefore get:

$$\begin{aligned} \log \left( \frac{1 -\frac{\frac{2\alpha }{3}-2}{k} +\frac{C_1(\alpha ,\kappa )}{k^2} }{1-\frac{C_2(\alpha ,\kappa )}{(k+1)^2}}\right) \leqslant -\frac{\frac{2\alpha }{3}-2}{k} +\frac{C_1(\alpha ,\kappa )}{k^2} +\frac{C_2(\alpha ,\kappa )}{(k+1)^2-C_2(\alpha ,\kappa )}\nonumber \\ \end{aligned}$$
(123)

Hence:

$$\begin{aligned} \log (E_{n+1})-\log (E_{n_0})\leqslant \sum _{k=n_0}^n \left( -\frac{\frac{2\alpha }{3}-2}{k} +\frac{C_1(\alpha ,\kappa )}{k^2} +\frac{C_2(\alpha ,\kappa )}{(k+1)^2-C_2(\alpha ,\kappa )} \right) \nonumber \\ \end{aligned}$$
(124)

We are now going to make use of the fact that the functions \(x \mapsto \frac{1}{x}\), \(x \mapsto \frac{1}{x^2}\) and \(x \mapsto \frac{C_2(\alpha ,\kappa )}{x^2-C_2(\alpha ,\kappa )}\) are decreasing functions on \((C_2,+ \infty )\). Observe that all coefficients in the very last inequality are actually non negative since \(\alpha \geqslant \alpha _0>3\). We then have:

$$\begin{aligned} \int _{k}^{k+1} \frac{dx}{x} \leqslant \frac{1}{k},\quad \frac{1}{k^2} \leqslant \int _{k-1}^{k} \frac{dx}{x^2} \end{aligned}$$
(125)

and:

$$\begin{aligned} \frac{C_2(\alpha ,\kappa )}{(k+1)^2-C_2(\alpha ,\kappa )} \leqslant \int _{k}^{k+1} \frac{C_2(\alpha ,\kappa )}{x^2-C_2(\alpha ,\kappa )} \, dx \end{aligned}$$
(126)

so that:

$$\begin{aligned} \log (E_{n+1})-\log (E_{n_0})\leqslant & {} -\left( \frac{2 \alpha }{3} -2 \right) \int _{n_0}^{n+1} \frac{dx}{x} +C_1(\alpha ,\kappa ) \int _{n_0-1}^{n} \frac{dx}{x^2}\\{} & {} +\,C_2(\alpha ,\kappa ) \int _{n_0}^{n+1} \frac{dx}{x^2-C_2(\alpha ,\kappa )} \end{aligned}$$

Noticing that:

$$\begin{aligned} \frac{1}{x^2-C_2(\alpha ,\kappa )} = \frac{1}{2 \sqrt{C_2(\alpha ,\kappa )}} \left( \frac{1}{x - \sqrt{C_2(\alpha ,\kappa )}} - \frac{1}{x + \sqrt{C_2(\alpha ,\kappa )}} \right) , \end{aligned}$$

we eventually get:

$$\begin{aligned} \log (E_{n+1})-\log (E_{n_0})\leqslant & {} -\left( \frac{2 \alpha }{3} -2 \right) \log \left( \frac{n+1}{n_0} \right) \nonumber \\{} & {} +C_1(\alpha ,\kappa ) \left( \frac{1}{n_0-1} -\frac{1}{n} \right) + \frac{\sqrt{C_2(\alpha ,\kappa )}}{ 2}\nonumber \\{} & {} \log \left( \frac{(n+1-\sqrt{C_2(\alpha ,\kappa )}) (n_0+ \sqrt{C_2(\alpha ,\kappa )})}{(n +1+ \sqrt{C_2(\alpha ,\kappa )}) (n_0 - \sqrt{C_2(\alpha ,\kappa )})} \right) \nonumber \\ \end{aligned}$$
(127)

i.e.:

$$\begin{aligned} \log (E_{n+1})-\log (E_{n_0})\leqslant & {} -\left( \frac{2 \alpha }{3} -2 \right) \log \left( \frac{n+1}{n_0} \right) +C_1(\alpha ,\kappa ) \left( \frac{1}{n_0-1} -\frac{1}{n} \right) \nonumber \\{} & {} + \,\frac{\sqrt{C_2(\alpha ,\kappa )}}{ 2} \left( \log \left( \frac{n+1-\sqrt{C_2(\alpha ,\kappa )}}{n +1+ \sqrt{C_2(\alpha ,\kappa )} } \right) \right. \nonumber \\{} & {} \left. +\, \log \left( \frac{ n_0 + \sqrt{C_2(\alpha ,\kappa )}}{n_0 - \sqrt{C_2(\alpha ,\kappa )}}\right) \right) \end{aligned}$$
(128)

Taking the exponential, we get:

$$\begin{aligned} E_{n+1} \leqslant E_{n_0} \left( \frac{n+1}{n_0} \right) ^{-\left( \frac{2\alpha }{3}-2\right) } \exp ({\tilde{\Phi }}(n_0)- {\tilde{\Phi }}(n+1)) \end{aligned}$$
(129)

with:

$$\begin{aligned} {\tilde{\Phi }}(n) = \frac{C_1(\alpha ,\kappa ) }{n-1} +\frac{\sqrt{C_2(\alpha ,\kappa )}}{2} \log \left( \frac{ n + \sqrt{C_2(\alpha ,\kappa )}}{n- \sqrt{C_2(\alpha ,\kappa )}}\right) . \end{aligned}$$

Let us finally compute a more tractable bound on the function \({\tilde{\Phi }}(n)\): using the inequality \(\log (1+x) \leqslant x\) for \(x \leqslant 1\), we have:

$$\begin{aligned} 0\leqslant \log \left( \frac{ n + \sqrt{C_2(\alpha ,\kappa )}}{n - \sqrt{C_2(\alpha ,\kappa )}} \right) =\log \left( 1+ \frac{ 2 \sqrt{C_2(\alpha ,\kappa )}}{n - \sqrt{C_2(\alpha ,\kappa )}} \right) \leqslant \frac{2 \sqrt{C_2(\alpha ,\kappa )}}{n- \sqrt{C_2(\alpha ,\kappa )}}\nonumber \\ \end{aligned}$$
(130)

Hence we deduce that:

$$\begin{aligned} 0\leqslant \frac{\sqrt{C_2(\alpha ,\kappa )}}{2} \log \left( \frac{ n + \sqrt{C_2(\alpha ,\kappa )})}{n - \sqrt{C_2(\alpha ,\kappa )}} \right) \leqslant \frac{C_2(\alpha ,\kappa )}{n- \sqrt{C_2(\alpha ,\kappa )}} \end{aligned}$$
(131)

Now, using the definition of the coefficients \(C_1(\alpha ,\kappa )\) and \(C_2(\alpha ,\kappa )\) given in Lemma 1, we get:

$$\begin{aligned}{} & {} 0 \leqslant {\tilde{\Phi }}(n) \leqslant \frac{C_1(\alpha ,\kappa )}{n-1} +\frac{C_2(\alpha ,\kappa )}{n - \sqrt{C_2(\alpha ,\kappa )}} \leqslant {\frac{2C_1(\alpha ,\kappa )}{n}} +\frac{C_2(\alpha ,\kappa )}{n - \sqrt{C_2(\alpha ,\kappa )}} \nonumber \\{} & {} \quad \leqslant \frac{5}{4n}\sqrt{\frac{2}{\kappa }}(1 + \sqrt{\kappa })^2 \left[ 2P(\alpha ) \left( 1+ {\tilde{c}}_1 \sqrt{\kappa }+ {\tilde{c}}_2 \kappa \right) + \left( \frac{2\alpha }{3}-1\right) \frac{1+2\sqrt{2\kappa }}{1-\frac{\sqrt{C_2(\alpha ,\kappa )}}{n}}\right] \nonumber \\ \end{aligned}$$
(132)

where: \(P(\alpha )=\frac{2}{9}(\alpha -3)\left( (1+\frac{2\sqrt{2}}{5})\alpha -3\right) -1\). Observe then that for \(\kappa \) small enough and \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\),

$$\begin{aligned} \frac{1}{1-\frac{\sqrt{C_2(\alpha ,\kappa )}}{n}} \leqslant \frac{1}{1-\frac{3\sqrt{C_2(\alpha ,\kappa )}}{4\alpha }\sqrt{\kappa }}\leqslant 1+2\frac{\sqrt{C_2(\alpha ,\kappa )}}{\alpha }\sqrt{\kappa } \end{aligned}$$

so that there exists a real constant \({{\tilde{c}}}_3\) such that for \(\kappa \) small enough and \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\) we have:

$$\begin{aligned} \frac{1}{1-\frac{\sqrt{C_2(\alpha ,\kappa )}}{n}} \leqslant 1+{{\tilde{c}}}_3 \kappa ^{1/4}. \end{aligned}$$

Therefore we finally get for any \(n\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\):

$$\begin{aligned} {\tilde{\Phi }}(n)\leqslant & {} \frac{5}{4n}\sqrt{\frac{2}{\kappa }}\left( 1 +\sqrt{\kappa }\right) ^2\nonumber \\{} & {} \left( 2P(\alpha ) \left( 1+ {\tilde{c}}_1 \sqrt{\kappa }+ {\tilde{c}}_2 \kappa \right) +\left( \frac{2\alpha }{3}-1\right) (1+2\sqrt{2\kappa }) \left( 1 + {{\tilde{c}}}_3 \kappa ^{1/4} \right) \right) .\nonumber \\ \end{aligned}$$
(133)

We then deduce that there exists \(C_3>0\) (independent to \(\alpha \)) such that

$$\begin{aligned} \forall n\geqslant \frac{4\alpha }{3\sqrt{\kappa }},\quad {\tilde{\Phi }}(n)\leqslant & {} \frac{5}{4n} \sqrt{\frac{2}{\kappa }} \left( 2P(\alpha )+\frac{2\alpha }{3}-1\right) \left( 1 + C_3 \kappa ^{1/4} \right) \\\leqslant & {} \frac{5}{4n} \sqrt{\frac{2}{\kappa }} \left( 2P(\alpha )+\frac{2\alpha }{3}\right) \left( 1 + C_3 \kappa ^{1/4} \right) \end{aligned}$$

where \(2P(\alpha )+\frac{2\alpha }{3} = \frac{2}{3}(\alpha -3)(\frac{16}{15}\alpha -1)\). Let us introduce:

$$\begin{aligned} \Phi (n)= \frac{5}{6n} \sqrt{\frac{2}{\kappa }} (\alpha -3)\left( \frac{16}{15}\alpha -1\right) \left( 1 + C_3 \kappa ^{1/4} \right) . \end{aligned}$$

Let \(\alpha \geqslant 3+\frac{3}{\sqrt{2}}\) and \(n_0\geqslant \frac{4\alpha }{3\sqrt{\kappa }}\). As expected we finally get:

$$\begin{aligned} \forall n\geqslant n_0,\quad E_{n+1} \leqslant E_{n_0} \left( \frac{n+1}{n_0} \right) ^{-\left( \frac{2\alpha }{3}-2\right) } e^{\Phi (n_0)} \end{aligned}$$
(134)

\(\square \)

1.3 B.3 Proof of Lemma 3

Let \(M_n\) the mechanical energy:

$$\begin{aligned} M_n=F(x_n) - F^* + \frac{1}{2s} \Vert x_n - x_{n-1}\Vert ^2. \end{aligned}$$

Let us prove that for any \(n\in {\mathbb {N}}\), we have:

$$\begin{aligned} \frac{E_n}{2sn^2}\leqslant & {} \left( 1 + \frac{4\alpha ^2}{9\kappa n^2} + \frac{4\alpha }{3\sqrt{\kappa }n} \right) M_n = \left( 1+ \frac{2\alpha }{3\sqrt{\kappa }n}\right) ^2M_n \end{aligned}$$
(135)

First remark that:

$$\begin{aligned} b_n= & {} \left\| \frac{2 \alpha }{3} (x_{n-1}-x^*)+n(x_n-x_{n-1}) \right\| ^2 = \left\| \frac{2 \alpha }{3} (x_{n}-x^*)+\left( n-\frac{2 \alpha }{3} \right) (x_n-x_{n-1}) \right\| ^2\\= & {} \frac{4\alpha ^2}{9}\Vert x_n-x^*\Vert ^2 + \left( n-\frac{2 \alpha }{3}\right) ^2\Vert x_n-x_{n-1}\Vert ^2 + \frac{4\alpha }{3}\left( n-\frac{2 \alpha }{3}\right) \langle x_n-x^*,x_n-x_{n-1} \rangle \\\leqslant & {} \frac{4\alpha ^2}{9}\Vert x_n-x^*\Vert ^2 + n^2\Vert x_n-x_{n-1}\Vert ^2 + \frac{4\alpha }{3}\left( n-\frac{2 \alpha }{3}\right) \langle x_n-x^*,x_n-x_{n-1} \rangle \end{aligned}$$

Using a discrete version of the inequality (80), we have:

$$\begin{aligned} |\langle x_n-x^*,x_n-x_{n-1}| \rangle \leqslant \frac{\sqrt{\kappa }}{2}\Vert x_n-x^*\Vert ^2 + \frac{1}{2\sqrt{\kappa }}\Vert x_n-x_{n-1}\Vert ^2 \end{aligned}$$
(136)

so that:

$$\begin{aligned} b_n\leqslant & {} \frac{4 \alpha ^2}{9} \left\| x_{n}-x^* \right\| ^2 +n^2 \left\| x_n-x_{n-1} \right\| ^2 \nonumber \\{} & {} + \frac{2\alpha n}{3}\left( \sqrt{\kappa }\Vert x_n-x^*\Vert ^2 + \frac{1}{\sqrt{\kappa }}\Vert x_n-x_{n-1}\Vert ^2\right) \end{aligned}$$
(137)

Hence:

$$\begin{aligned} \frac{E_n}{2sn^2}= & {} F(x_n)-F^* + \frac{1}{2sn^2}b_n\\= & {} M_n + \frac{2 \alpha ^2}{9sn^2} \left\| x_{n}-x^*\right\| ^2 + \frac{\alpha }{3sn}\left( \sqrt{\kappa }\Vert x_n-x^*\Vert ^2 + \frac{1}{\sqrt{\kappa }}\Vert x_n-x_{n-1}\Vert ^2\right) \end{aligned}$$

Using now the quadratic growth condition \({\mathcal {G}}^2_\mu \) and remembering that: \(s\mu = \kappa \), we get:

$$\begin{aligned} \frac{E_n}{2sn^2}\leqslant & {} \left( 1 + \frac{4\alpha ^2}{9\kappa n^2} + \frac{4\alpha }{3\sqrt{\kappa }n} \right) M_n = \left( 1+ \frac{2\alpha }{3\sqrt{\kappa }n}\right) ^2M_n \end{aligned}$$

1.4 B.4 Proof of Lemma 4

Let us prove that for all \(n\geqslant 1\) and any \((A,B)\in {\mathbb {R}}^2\)

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant \left( 2|A+B|+\frac{\sqrt{2}|B|}{\sqrt{s\mu }}\right) \left( 1+\frac{4\alpha ^2}{9s\mu n^2}\right) \frac{E_n}{(n-\frac{2\alpha }{3})^2}.\nonumber \\ \end{aligned}$$
(138)

Firstly notice that

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)= (A+B)\delta _n+B(h_{n-1}-h_{n}-\delta _n) \end{aligned}$$
(139)

and for any \(\theta >0\)

$$\begin{aligned} |h_{n-1}-h_{n}-\delta _n|=2|\langle x_n-x_{n-1},x_{n}-x^*\rangle |\leqslant \frac{h_n}{\theta }+\theta \delta _n. \end{aligned}$$
(140)

Combining the last two inequalities, it follows that for any \(\theta >0\):

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant (A+B+\theta |B|)\delta _n+\frac{|B|}{\theta }h_n \end{aligned}$$
(141)

To bound the coefficient of \(\delta _n\) we use a specific expression of \(b_n\):

$$\begin{aligned} b_n=\left\| {\frac{2\alpha }{3}(x_{n}-x^*)+\left( n-\frac{2\alpha }{3}\right) (x_n-x_{n-1})}\right\| ^2 \end{aligned}$$
(142)

Applying the inequality \(\left\| {u}\right\| ^2\leqslant 2\left\| {u+v}\right\| ^2+2\left\| {v}\right\| ^2\) to \(u=\left( n-\frac{2\alpha }{3}\right) (x_n-x_{n-1})\) and \(v=\frac{2\alpha }{3}(x_{n}-x^*)\), we get:

$$\begin{aligned} \left( n-\frac{2\alpha }{3}\right) ^2\delta _n\leqslant 2b_n+\frac{8\alpha ^2}{9}h_n. \end{aligned}$$
(143)

It follows that

$$\begin{aligned} \delta _n\leqslant \frac{2}{\left( n-\frac{2\alpha }{3}\right) ^2}b_n+\frac{8\alpha ^2}{9(n-\frac{2\alpha }{9})^2}h_n. \end{aligned}$$
(144)

and thus

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant & {} (|A+B|+\theta |B|)\frac{2}{\left( n-\frac{2\alpha }{3}\right) ^2}b_n\nonumber \\{} & {} +\left( \frac{|B|}{\theta }+ \frac{8\alpha ^2}{9\left( n-\frac{3\alpha }{4}\right) ^2}\right) h_n \end{aligned}$$
(145)

Using now the growth condition \(h_n\leqslant \frac{1}{s\mu }w_n\) for all \(n\in {\mathbb {N}}\), we get:

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant & {} (|A+B|+\theta |B|)\frac{2}{\left( n-\frac{2\alpha }{3}\right) ^2}b_n\nonumber \\{} & {} +\left( \frac{|B|}{s\mu \theta }+ \frac{8\alpha ^2}{9s\mu (n-\frac{2\alpha }{3})^2}\right) w_n \end{aligned}$$
(146)

Choosing \(\theta =\frac{1}{\sqrt{2 s\mu }}\) we finally deduce:

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant & {} (2|A+B|+\frac{\sqrt{2}|B|}{\sqrt{s\mu }})\frac{b_n}{\left( n-\frac{2\alpha }{3}\right) ^2} +\left( \frac{\sqrt{2}|B|}{\sqrt{s\mu }}\right. \nonumber \\{} & {} \left. +(2|A+B|+\frac{\sqrt{2}|B|}{\sqrt{s\mu })}\frac{4\alpha ^2}{9s\mu \left( n-\frac{2\alpha }{3}\right) ^2}\right) w_n \end{aligned}$$
(147)

and

$$\begin{aligned} A\delta _n+B(h_{n-1}-h_n)\leqslant \left( 2|A+B|+\frac{\sqrt{2}|B|}{\sqrt{s\mu }}\right) \left( 1+\frac{4\alpha ^2}{9s\mu n^2}\right) \frac{E_n}{\left( n-\frac{2\alpha }{3}\right) ^2},\nonumber \\ \end{aligned}$$
(148)

which concludes the proof of the lemma.

C Sketch of the proof of Theorem 7

The proof of Theorem 7 follows the same line than the proof of Theorem 6, and is based on the following Lyapunov energy:

$$\begin{aligned} E_n=2s\,n^2(F(x_n)-F^*+\left\| {\frac{\alpha }{2}(x_{n-1}-x^*)+\left( n-\frac{\alpha }{4}\right) (x_n-x_{n-1})}\right\| ^2. \end{aligned}$$
(149)

As in the proof of Theorem 6, the first step of this proof consists in establishing some discrete version of the differential inequality (87):

Lemma 5

Let \(\alpha >4+2\sqrt{2}\) and \(\kappa =\frac{\mu }{L}\). There exists \(\kappa _0>0\) such that for any \(\kappa \leqslant \kappa _0\), there exists some real constants \({\tilde{c}}_1\) and \({\tilde{c}}_2\) such that:

$$\begin{aligned} \forall n\geqslant \frac{3\alpha }{2\sqrt{\kappa }},\quad E_{n+1}-\left( 1-\frac{\alpha -2}{n}\right) E_n \leqslant C_1(\alpha ,\kappa )\frac{E_n}{n^2}+C_2(\alpha ,\kappa )\frac{E_{n+1}}{(n+1)^2}\nonumber \\ \end{aligned}$$
(150)

where:

$$\begin{aligned} C_1(\alpha ,\kappa )= & {} \frac{19}{36\sqrt{2\kappa }}(\alpha -2)(2\alpha -1)\left[ 1+{\widetilde{c}}_1 \sqrt{\kappa } + {\widetilde{c}}_2 \kappa \right] (1+\sqrt{\kappa })^2 \qquad \quad \end{aligned}$$
(151)
$$\begin{aligned} C_2(\alpha ,\kappa )= & {} \frac{19(\alpha -2)^2}{72\sqrt{2\kappa }}\left( 1+11\sqrt{\kappa }\right) (1+\sqrt{\kappa })^2. \end{aligned}$$
(152)

As in the proof of Theorem 7, the next step consists in integrating the inequality (150):

Lemma 6

Let \(\alpha \geqslant 4+2\sqrt{2}\) and \(n_0\geqslant \frac{3\alpha }{2\sqrt{\kappa }}\). If \(E_n\) satisfies (150) then there exists a real constant \(C_3>0\) such that:

$$\begin{aligned} \forall n\geqslant n_0, \quad E_{n} \leqslant E_{n_0} \, \left( \frac{n_0}{n}\right) ^{\alpha -2} e^{\Phi (n_0)} \end{aligned}$$
(153)

with

$$\begin{aligned} \Phi (n)= \frac{19(\alpha -2)(3\alpha -2)}{24n \sqrt{2\kappa }}\left( 1 + C_3\kappa ^{1/4}\right) . \end{aligned}$$
(154)

The proofs of Lemmas 5 and 6 are very similar to those of Lemmas 1 and 2 and are omitted here. They can be found in [7].

A good choice for \(n_0\) is one ensuring a control as tight as possible on the values \(F(x_n)-F^*\). For that \(n_0\) is chosen such that it minimizes the function \(f:x\mapsto x^{\alpha -2}e^{\Phi (x)}\). A straightforward computation gives:

$$\begin{aligned} n_0:= \frac{19(3\alpha -2)}{24\sqrt{2 \kappa }} \left( 1+C_3 \kappa ^{1/4} \right) . \end{aligned}$$
(155)

Observe that \(f(n_0)=\left( e\,n_0\right) ^{ \alpha -2}\) and that for any \(\alpha \geqslant 4+2\sqrt{2}\), the optimized value of \(n_0\) satisfies: \(n_0>\frac{3\alpha }{2\sqrt{\kappa }}\) without any condition on \(\kappa \) and that reducing \(\kappa _0\) if needed, we get:

$$\begin{aligned} \forall \alpha \geqslant 4+2\sqrt{2}, \quad \frac{3\alpha }{2\sqrt{\kappa }} \leqslant \frac{19(3\alpha -2)}{24\sqrt{2 \kappa }}\left( 1+C_3 \kappa ^{1/4}\right) \leqslant \frac{5\alpha }{\sqrt{2\kappa }}. \end{aligned}$$
(156)

Hence:

$$\begin{aligned} \forall n\geqslant \frac{5\alpha }{2\sqrt{\kappa }},\quad F(x_n)-F(x^*) \leqslant \frac{E_{n}}{2s n^2} \leqslant \frac{E_{n_0}}{2s n_0^2} \left( \frac{n_0}{n}\right) ^{\alpha } e^{\alpha -2} \end{aligned}$$
(157)

i.e.:

$$\begin{aligned} \forall n\geqslant \frac{5\alpha }{\sqrt{2\kappa }},\quad F(x_n)-F(x^*)\leqslant & {} \frac{E_{n_0}}{2s e^2 n_0^2} \left( e \, \frac{5\alpha }{2n\sqrt{\kappa }} \right) ^{\alpha } \end{aligned}$$
(158)

Uniformly bounding the energy \(E_{n_0}\) and noticing that: \(\frac{\alpha }{2n_0\sqrt{\kappa }}\leqslant \frac{1}{3}\), we have:

$$\begin{aligned} \frac{E_{n_0}}{2s n_0^2} \leqslant \left( 1+ \frac{\alpha }{2n_0\sqrt{\kappa }}\right) ^2 M_{n_0}\leqslant \frac{16}{9}M_{n_0} \end{aligned}$$

where \(M_n\) denotes the potential energy: \(M_n=F(x_n) - F^* + \frac{1}{2} \Vert x_n - x_{n-1}\Vert ^2\). Since the mechanical energy associated to the Nesterov scheme is non-increasing (see [14, Corollary 2]) and \(x_{-1}=x_0\), we then get:

$$\begin{aligned} \forall n\geqslant \frac{5\alpha }{\sqrt{2\kappa }},\quad F(x_n)-F(x^*) \leqslant \frac{16}{9} \left( e \, \frac{5\alpha }{2n\sqrt{\kappa }} \right) ^{\alpha } e^{-2}M_0. \end{aligned}$$
(159)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aujol, JF., Dossal, C. & Rondepierre, A. FISTA is an automatic geometrically optimized algorithm for strongly convex functions. Math. Program. 204, 449–491 (2024). https://doi.org/10.1007/s10107-023-01960-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-023-01960-6

Keywords

Mathematics Subject Classification

Navigation