Skip to main content
Log in

Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

Abstract

In a real Hilbert space, we study a new class of forward-backward algorithms for structured non-smooth minimization problems. As a special case of the parameters, we recover the method AFB (Accelerated Forward-Backward) that was recently discussed as an enhanced variant of FISTA (Fast Iterative Soft Thresholding Algorithm). Our algorithms enjoy the well-known properties of AFB. Namely, they generate convergent sequences (xn) that minimize the function values at the rate o(n− 2). Another important specificity of our processes is that they can be regarded as discrete models suggested by first-order formulations of Newton-like dynamical systems. This permit us to extend to the non-smooth setting, a property of fast convergence to zero of the gradients, established so far for discrete Newton-like dynamics with smooth potentials only. In specific, as a new result, we show that the latter property also applies to AFB. To prove this stability phenomenon, we develop a technical analysis that can be also useful regarding many other related developments. Numerical experiments are furthermore performed so as to illustrate the properties of the considered algorithms comparing with other existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian driven damping. Application to Optimization and Mechanics. J. Math. Pures appl. 81(8), 747–779 (2002)

    Article  MathSciNet  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Control. Cybern. 31, 643–657 (2002)

    MATH  Google Scholar 

  3. Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  4. Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Applied Math. Optimization 80, 547–598 (2019)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)

  6. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. programming, Volume 168, Issue 1–2, pp. 123–175 (2018)

  7. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optimization 26(3), 1824–1834 (2016)

    Article  MathSciNet  Google Scholar 

  8. Attouch, J., Peypouquet, P.R.: Fast convex optimization via intertial dynamics with hessian driven damping. J Differential Equations 261 (10), 5734–5783 (2016)

    Article  MathSciNet  Google Scholar 

  9. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  10. Brezis, H.: Opérateurs Maximaux Monotones, Math. Stud, 5. North-Holland, Amsterdam (1973)

  11. Chambolle, A., Dossal, C.: On the convergence of the iterates of FISTA. JOTA 166(3), 968–982 (2015)

    Article  Google Scholar 

  12. Cruz, J.B., Nghia, T.: On the convergence of the proximal forward-backward splitting method with linesearches. Optim. Methods and Software 31 (6), 1209–1238 (2016)

    Article  MathSciNet  Google Scholar 

  13. Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optimization 29, 403–419 (1991)

    Article  MathSciNet  Google Scholar 

  14. Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)

    Article  MathSciNet  Google Scholar 

  15. Iutzeler, F., Hendrickx, J.M.: A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia arXiv:1603.05398v3 (2017)

  16. Lemaire, B.: The Proximal Algorithm. In: New Methods in Optimization and Their Industrial Uses, J.P. Penot (Ed), Internat. Ser. Numer. Math, 87, pp. 73-87. Birkhauser, Basel (1989)

  17. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MathSciNet  Google Scholar 

  18. Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision, pp. 1–15 (2014)

  19. Labarre, F., Maingé, P. E.: First-order frameworks for continuous Newton-like dynamics governed by maximally monotone operators. Set-Valued and Variational Analysis, pp. 1–27 (2021)

  20. Maingé, P.E., Maruster, S.: Convergence in norm of modified Krasnoselski-Mann iterations for fixed points of demicontractive mappings Applied Mathematics and Computation. Elsevier 217(24), 9864–9874 (2011)

    MATH  Google Scholar 

  21. May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish Journal of Mathematics, 41(3). https://doi.org/10.3906/mat-1512-28 (2015)

  22. Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)

    Article  MathSciNet  Google Scholar 

  23. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  24. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programming, Ser. B 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5

    Article  Google Scholar 

  25. Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)

    Article  MathSciNet  Google Scholar 

  26. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    Article  MathSciNet  Google Scholar 

  27. Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Machine learning Reasearch 17(153), 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  28. Scheinberg, D.K., Goldfarb, X.: Bai, Fast first-order methods for composite convex optimization with backtraking. Found. Comput. Math. 14(3), 389–417 (2014)

    Article  MathSciNet  Google Scholar 

  29. Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via High-Resolution differential equations. https://doi.org/10.13140/RG.2.2.20063.92329 (2018)

  30. Apidopoulos, V., Aujol, J.F., Dossal, C.: Convegence rate of inertial forward-backward algorithms beyong Nesterov’s rule, Mathematical Programming, Serie A, Springer, pp. 1-20 (ff10.1007/s10107-018-1350-9) (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul-Emile Maingé.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Proof of Proposition 3.2

Appendix 1. Proof of Proposition 3.2

Let {κ, e, νn} be positive parameters, and set ϱ = 1 − κ, τn = e + νn+ 1 and un = ynxn. It is easily seen that (3.8a) can be rewritten as (for np)

$$ \begin{array}{@{}rcl@{}} && \theta_n = \frac{1}{\tau_{n} } \left (\nu_{n} - {\varrho} \nu_{n+1} \right ), \end{array} $$
(A.1a)
$$ \begin{array}{@{}rcl@{}} & & \dot{x}_{n+1} + \chi^{*}_{n} + \theta_n u_n =0, \end{array} $$
(A.1b)
$$ \begin{array}{@{}rcl@{}} & & \dot{y}_{n+1} + \kappa u_{n} =0. \end{array} $$
(A.1c)

The sequel of the proof can be divided into the following parts (r1)(r4):

(r1) An estimate from the inertial part of the method. Given \((s,q) \in [0,\infty ) \times {\mathcal H} \), we begin with proving that the discrete derivative \( \dot {G}_{n+1}(s,q)\) satisfies

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + s \tau_n \langle \chi^{*}_{n}, x_{n+1} -q \rangle =\\ {\kern48pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} , u_n\rangle \\ {\kern48pt}- \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2}- \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$
(A.2)

In order to get this result, we readily notice that \(\dot {G}_{n+1}(s,q)\) can be formulated as

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) = s (\dot{\nu}_{n+1} a_{n}+ \nu_{n+1}\dot{a}_{n+1} ) + s e \dot{b}_{n+1} + \nu^{2}_{n+1}\dot{c}_{n+1}+ {c}_{n} (\nu_{n+1}^{2} - {\nu_{n}^{2}}), \end{array} $$
(A.3)

where an := 〈qxn,un〉, bn := (1/2)∥xnq2 and cn := (1/2)∥un2. For the sake of clarity, and so as to estimate the right side of (A.3), we set

$$ \begin{array}{@{}rcl@{}} && P_{n}= \langle q-x_{n+1} , \dot{x}_{n+1} \rangle , R_{n}= \langle q-x_{n+1} , \dot{y}_{n+1} \rangle~ \text{and} ~W_{n}=\langle \chi^{*}_{n}, x_{n+1} -q \rangle . \end{array} $$

Clearly, by an = 〈qxn,un〉 and \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1} \) (from (A.1c)), we get

$$ \begin{array}{l} a_n= \langle q-x_{n+1} ,u_n\rangle + \langle \dot{x}_{n+1} , u_n\rangle = - \frac{1 }{\kappa} R_n + \langle \dot{x}_{n+1} , u_n\rangle . \end{array} $$
(A.4)

Again from an = 〈qxn,un〉, and noticing \(\dot {u}_{n+1} = \dot {y}_{n+1} - \dot {x}_{n+1} \) (as un = ynxn), we readily have

$$ \dot{a}_{n+1} = \langle -\dot{x}_{n+1} , u_n \rangle + \langle q -x_{n+1} , \dot{u}_{n+1} \rangle = - \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n. $$
(A.5)

Taking the scalar product of each side of (A.1b) by \(q-\dot {x}_{n+1}\), along with \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1}\) (from (A.1c)), amounts to PnWn = κ− 1𝜃nRn, which, by \( \theta _{n} = \tau _{n}^{-1} (\nu _{n} - {\varrho } \nu _{n+1})\) (from (A.1a)) is equivalent to

$$ (\nu_{n} - {\varrho} \nu_{n+1}) R_n= \kappa \tau_n(P_n - W_n). $$
(A.6)

Therefore, by (A.4), (A.5) and (A.6), and recalling that ϱ = 1 − κ, we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{\nu}_{n+1} a_n+ \nu_{n+1}\dot{a}_{n+1}\\ &=&\dot{\nu}_{n+1} \left ( \langle \dot{x}_{n+1} , u_n\rangle - \frac{1 }{\kappa} R_n\right ) + \nu_{n+1}\left (- \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n \right )\\ &=& (\dot{\nu}_{n+1}-\nu_{n+1} ) \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \left (\nu_{n+1} - \frac{1 }{\kappa}\dot{\nu}_{n+1} \right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \frac{1}{\kappa} \left (\nu_{n} - {\varrho} \nu_{n+1}\right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \tau_n \left ( P_n - W_n \right ). \end{array} $$
(A.7)

From bn+ 1 = (1/2)∥xn+ 1q2, we readily get

$$ \begin{array}{l} \dot{b}_{n+1} =\frac{1}{2}\langle \dot{x}_{n+1} , x_{n+1} -q \rangle + \frac{1}{2}\langle x_n -q , \dot{x}_{n+1} \rangle =-P_n - \frac{1}{2} \| \dot{x}_{n+1} \|^{2}. \end{array} $$
(A.8)

In addition, by cn+ 1 = (1/2)∥un+ 12, and \(\dot {u}_{n+1}= - \kappa u_{n} - \dot {x}_{n+1} \) (from un = ynxn and \(\dot {y}_{n+1} =-\kappa u_{n}\)), we immediately have

$$ \begin{array}{@{}rcl@{}} \dot{c}_{n+1} &=& \frac{1}{2} \langle \dot{u}_{n+1}, {u}_{n+1} + u_n \rangle\\ &=&\langle - \dot{u}_{n+1}, -\frac{1}{2}\dot{u}_{n+1} -u_n\rangle\\ &=&\langle \kappa u_n + \dot{x}_{n+1} , \left (\frac{\kappa}{2} - 1 \right ) u_n + \frac{1}{2}\dot{x}_{n+1} \rangle\\ &=&\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_n\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_n \rangle . \end{array} $$
(A.9)

In light of (A.3) together with (A.7), (A.8) and (A.9), we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{G}_{n+1}(s,q)\\ &&= s \left (- \nu_{n} \langle \dot{x}_{n+1} , u_{n} \rangle - \nu_{n+1} P_{n} + \tau_{n} \left ( P_{n} - W_{n} \right ) \right )\\ &&\quad+ se \left (-P_{n} - \frac{1}{2} \| \dot{x}_{n+1} \|^{2} \right ) \\ &&\quad+ \nu_{n+1}^{2} \left (\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_{n}\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_{n} \rangle \right ) \\ &&\quad+ \frac{1}{2} (\nu_{n+1}^{2} - {\nu_{n}^{2}})\|u_{n}\|^{2}\\ \\ &&= -\left ( s\nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} ,u_{n} \rangle - \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} -\bar{\eta}_{n} \|u_{n}\|^{2} - s \tau_{n} W_{n}, \end{array} $$

where the quantity \( \bar {\eta }_{n}\) is given by

$$ \begin{array}{l} \bar{\eta}_{n}= \kappa \left (1- \frac{ \kappa}{2} \right )\nu_{n+1}^{2} - \frac{ 1 }{2}(\nu_{n+1}^{2} - {\nu_{n}^{2}}) \\ {\kern12.5pt}= \frac{1}{2} \left ({\nu_{n}^{2}}- \nu_{n+1}^{2} (1-\kappa)^{2}\right ) (\text{since}~ \kappa \left (1- \frac{ \kappa}{2} \right ) = \frac{1}{2} -\frac{1}{2}(1-\kappa)^{2} ). \end{array} $$

This leads to the desired result.

(r2) An estimate from the proximal part of the method. Let us establish that, for any ξn≠ 1, it holds that

$$ \begin{array}{l} \xi_n \langle \chi_n^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \| \dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}= \theta_n (1-\xi_n) \langle u_n,\dot{x}_{n+1} \rangle + \frac{1}{2} \theta_n^{2} \|u_n\|^{2} - \left (\xi_n- \frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$
(A.10)

Indeed, we have \(\dot {x}_{n+1} =-\theta _n u_{n} -\chi _{n}^{*}\) (from (A.1b)), which, for any ξn≠ 1, can be rewritten as

$$ \begin{array}{l} \xi_n \dot{x}_{n+1} = -(1-\xi_n) \left (\dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \right ) - \chi_n^{*} = -(1-\xi_n) H_n - \chi_n^{*} , \end{array} $$
(A.11)

where \(H_{n}= \dot {x}_{n+1} + (1-\xi _n)^{-1}\theta _n u_{n} \). Furthermore, by \(-\chi _{n}^{*}= \dot {x}_{n+1}+\theta _{n} u_{n}\) (again using (A.1b)) and denoting \(Q_{n}=\langle \dot {x}_{n+1}, u_{n}\rangle \), we simply obtain

$$ \begin{array}{l} \langle (-\chi_n^{*}), H_n \rangle = \langle \dot{x}_{n+1} + \theta_n u_n, \dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \rangle \\ {}= \|\dot{x}_{n+1} \|^{2} + (1-\xi_n)^{-1}\theta_n^{2} \|u_n\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)}\theta_n Q_{n}.\ \end{array} $$
(A.12)

Therefore, by adding \((1/2) \| \chi _{n}^{*}\|^{2}\) to the scalar product of the left side of equality (A.11) with \(\chi _{n}^{*}\), and using (A.12) and \( \| \chi _{n}^{*}\|^{2}= \|\dot {x}_{n+1}+ \theta _{n} u_{n}\|^{2}\), we get

$$ \begin{array}{l} \xi_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \| \chi_{n}^{*}\|^{2} = (1-\xi_n) \langle (-\chi_{n}^{*}), H_{n} \rangle - \frac{1}{2} \| \chi_{n}^{*}\|^{2}\\ {\kern99pt}= (1-\xi_n) \left (\|\dot{x}_{n+1} \|^{2} + \frac{ \theta_n^{2} }{(1-\xi_n)} \|u_{n}\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)} \theta_n Q_{n} \right ) \\ {\kern110pt}- \frac{1}{2}\left (\|\dot{x}_{n+1} \|^{2} + \theta_n^{2} \|u_{n}\|^{2} + 2 \theta_n Q_{n}\right ) \\ {\kern99pt}= (1-\xi_{n}) \theta_n Q_{n} + \frac{1}{2}\theta_n^{2} \|u_{n}\|^{2} + \left (\frac{1}{2} -\xi_{n} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

This yields (A.10).

(r3) Combining proximal and inertial effects. Let (3.10) hold and set \(\rho _{n}= 1- (1-\kappa ) \frac {\nu _{n+1}}{\nu _{n}}\). It is worthwhile noticing that the term 𝜃n involved in (A.1) can be simply expressed as

$$ \theta_n =\frac{\nu_{n} \rho_{n} }{ \tau_{n}} ~\text{where}~ \tau_{n}=e+\nu_{n+1}. $$
(A.13)

So, by (A.13), in light of ρn > 0 (from condition (3.3a)), we deduce that 𝜃n is a positive sequence.

Now, we introduce the real sequence \((\bar {\gamma }_{n})\) defined by

$$ \bar{\gamma}_{n}=1- \frac{s \rho_{n}}{\tau_{n}} ~(\text{with}~ s > 0). $$
(A.14)

Clearly, by (A.14), along with τn > 0 (as τn := e + νn+ 1) and ρn > 0, we obviously have

$$ \bar{\gamma}_{n} < 1 ~(\text{for any}~ s >0). $$
(A.15)

Next, given s > 0, we show that the iterates produced by (3.8a) (or, equivalently, by (A.1)) verify

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + \frac{1}{2} \rho_n^{-1} \tau_n^{2} \|\dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}+ (s \tau_n)\langle \chi_n^{*}, x_{n+1} -q \rangle + \bar{\gamma}_{n} \rho_n^{-1}\tau_n^{2} \langle \chi_n^{*},\dot{x}_{n+1} \rangle = -T_n(u_n,\dot{x}_{n+1} ) , \end{array} $$
(A.16)

where Tn(u, x) is defined for any \((u,x) \in {\mathcal H}^{2}\) by

$$ T_{n}(u,x) = w_{n} \langle u, x \rangle +\eta_{n} \| u \|^{2}+ \sigma_{n} \|x\|^{2}, $$
(A.17)

together with the parameters

$$ \begin{array}{@{}rcl@{}} && {w}_{n} = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), {\eta}_{n}= \frac{1}{2} {\varrho} \rho_{n} \nu_{n} \nu_{n+1} , \end{array} $$
(A.18a)
$$ \begin{array}{@{}rcl@{}} && {\sigma}_{n} =\frac{1}{2} \left (se - \nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ). \end{array} $$
(A.18b)

Indeed, by (A.2) and setting \(Q=\langle \dot {x}_{n+1} , u_{n} \rangle \), we know that

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_n)\langle \chi^{*}_{n}, x_{n+1} -q \rangle = \\ {\kern12pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) Q_n - \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2} - \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$
(A.19)

Moreover, in light of \(\bar {\gamma }_{n} \neq 1\) (from (A.15)), by using (A.10) (with the special value \(\xi _{n}= \gamma _{n}:=1- \frac {s \rho _{n}}{\tau _{n}}\)) and recalling that \(\theta _n = \frac {\nu _{n} \rho _{n}}{\tau _{n}}\) we obtain

$$ \begin{array}{l} \bar{\gamma}_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= s \frac{\nu_{n} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} Q_{n}+ \frac{1}{2} \frac{{\nu_{n}^{2}} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} \|u_{n}\|^{2} - \left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$
(A.20)

Then, multiplying equality (A.20) by \(\rho _{n}^{-1} {\tau _{n}^{2}}\), and adding the resulting equality to (A.19), we get

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_{n})\langle \chi^{*}_{n}, x_{n+1} -q \rangle \\ {\kern12pt}+ \bar{\gamma}_{n} \rho_{n}^{-1} {\tau_{n}^{2}} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \rho_{n}^{-1} {\tau_{n}^{2}} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= \left (- \left (s \nu_{n}+ {\varrho} \nu_{n+1}^{2} \right ) + s \nu_{n} \rho_{n} \right ) Q_{n} \\ {\kern12pt}+ \left (- \frac{1}{2} \left ({\nu_{n}^{2}} - {\varrho}^{2} \nu_{n+1}^{2}\right ) + \frac{1}{2} {\nu_{n}^{2}} \rho_{n} \right ) \| u_{n} \|^{2} \\ {\kern12pt}\left ( \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}}\left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

Hence, noticing that νnρn = νnϱνn+ 1, we infer that (A.16)–(A.17) is actually satisfied together with the parameters

$$ \begin{array}{l} {w}_{n} = \left (s \nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) - s (\nu_{n} -{\varrho} \nu_{n+1}) = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), \\ {\eta}_{n}= \frac{1}{2} \left ({\nu_{n}^{2}} - \nu_{n+1}^{2} {\varrho}^{2} \right ) - \frac{1}{2}\left ({\nu_{n}^{2}}- {\varrho} \nu_{n} \nu_{n+1}\right ) = \frac{1}{2} {\varrho} \nu_{n+1} \left ( \nu_{n} - \nu_{n+1} {\varrho} \right) = \frac{1}{2} {\varrho} \nu_{n+1} \nu_{n} \rho_{n}, \\ {\sigma}_{n} = \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}} \left (\bar{\gamma}_{n} - \frac{1}{2} \right ) = \frac{1}{2} \left (se -\nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ) . \end{array} $$

This leads to the desired result.

(r4) Finally, we give an alternative formulation of the quantity Tn(u, x) given by (A.17)–(A.18). For this purpose, we begin with reformulating σn. By the definitions τn := e + νn+ 1, \(\bar {\gamma }_{n}:=1-s \frac {\rho _{n}}{\tau _{n}}\), and by an easy computation we have

$$ \begin{array}{l} \rho_{n}^{-1}{\tau_{n}^{2}} (2 \bar{\gamma}_{n} -1) = \frac{(e+ \nu_{n+1})^{2}}{ \rho_{n}} \left (1- 2s \frac{\rho_{n}}{(e+ \nu_{n+1})} \right ) \\ {\kern69pt}= \frac{1 }{\rho_{n}} \left (e^{2}+ 2 e \nu_{n+1} + (\nu_{n+1})^{2} \right )- 2s \left (e+ \nu_{n+1} \right ) \\ {\kern69pt}= e \left (\frac{e }{\rho_{n} } -s \right ) - s e + 2 \nu_{n+1} \left (\frac{e }{\rho_{n} } -s \right ) + \frac{(\nu_{n+1} )^{2} }{\rho_{n} } \\ {\kern69pt}= \left (e + 2 \nu_{n+1}\right ) \left (\frac{e }{\rho_{n} } -s \right ) - s e + \frac{(\nu_{n+1})^{2} }{\rho_{n} } \\ {\kern69pt}= \tau_{n,e} \left (e \rho_{n}^{-1} -s \right ) - s e + \rho_{n}^{-1} (\nu_{n+1} )^{2}, \end{array} $$

where τn, t = t + 2νn+ 1 (for t ≥ 0). As a consequence, by the previous definition of σn (in (A.18)), we obtain

$$ \begin{array}{l} 2 {\sigma}_{n}= (\rho_n^{-1}-1) (\nu_{n+1} )^{2} + \tau_{n,e} \left (e \rho_n^{-1}-s \right )\\ {\kern18pt}= \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_n^{-1} -1 \right ) + \tau_{n,e} (e -s). \end{array} $$
(A.21)

Then we consider the following two situations relative to the constant κ:

- In the special case when κ = 1 (hence ρn = 1 and \( \rho _{n}^{-1}=1\)), we obviously have wn = 0 and ηn = 0. Then, for \((u,x) \in {\mathcal H}^{2}\), by definition of Tn (in (A.17)) along with \( {\sigma }_{n}= \frac {\left (e -s \right )}{2} \tau _{n,e} \) (from (A.21)) we obtain

$$ T_{n}(u,x)= \frac{\left (e -s \right )}{2} \tau_{n,e} \|x\|^{2}. $$
(A.22)

- For \(\kappa \in (0,1) \cup (1,\infty )\) (hence ηn≠ 0), also setting \({\varsigma }_{n}:=\frac {w_{n}}{2 \eta _{n}}\), and \(\psi _{n}:= 4 {\sigma }_{n} {\eta }_{n}- {w}_{n}^{2}\), by definition of Tn (in (A.17)) we classically have

$$ T_{n}(u,x)= \eta_{n} \|u+{\varsigma}_{n}x \|^{2}+ \frac{\psi_{n}}{4 \eta_{n} } \|x\|^{2}. $$
(A.23)

On the one hand, by \({w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right )\) (from (A.18)) and remembering that τn, s = s + 2νn+ 1, we simply have \({w}_{n}^{2} = ({\varrho } \nu _{n+1})^{2} \left ((\nu _{n+1} )^{2} + s \tau _{n,s} \right )\). Hence, by (A.21) while using the definition of ψn, and setting Sn := ϱρnνnνn+ 1 (so that Sn = 2ηn and \(\psi _{n}= 2 {\sigma }_{n} S_{n}- {w}_{n}^{2}\)), we obtain

$$ \psi_{n}= S_{n} \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_{n}^{-1} -1 \right ) + S_{n} \tau_{n,e} (e -s) - ({\varrho} \nu_{n+1})^{2} \left ( (\nu_{n+1} )^{2} + s \tau_{n,s} \right).$$

It is also easily checked that \(S_{n} \left (\rho _{n}^{-1} -1 \right ) = ({\varrho } \nu _{n+1})^{2} \), which by the previous equality yields

$$ \begin{array}{l} \psi_{n}= S_{n} \tau_{n,e} \left (e-s \right ) + ({\varrho} \nu_{n+1})^{2} \left (e \tau_{n,e} - s \tau_{n,s} \right ). \end{array} $$
(A.24)

Then, noticing that eτn, esτn, s = (es)τn, e+s (as τn, t := t + 2νn+ 1, for t ≥ 0), we infer that \( \psi _{n} = \left (e-s \right ) \left (S_{n} \tau _{n,e} + ({\varrho } \nu _{n+1})^{2} \tau _{n,e+s} \right )\), which by (A.23) entails that

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \|u+{\varsigma}_{n} x \|^{2}+ \frac{\left (e-s \right )}{2 S_{n}} \left ( S_{n} \tau_{n,e} + ({\varrho} \nu_{n+1})^{2} \tau_{n,e+s} \right ) \|x\|^{2}. $$

On the other hand, we clearly have \({\varsigma }_{n}=\frac {w_{n}}{S_{n}}\) (since Sn = 2ηn), together with \( {w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right ) \), Sn = ϱρnνnνn+ 1 and \( \frac {1}{\theta _{n}}=\frac { e+ \nu _{n+1}}{ \rho _{n} \nu _{n}}\) (from (A.13)), which gives us

$$ \begin{array}{l} {\varsigma}_{n}= \frac{ \nu_{n+1} + s }{ \rho_{n} \nu_{n} } = \frac{ \left (e+ \nu_{n+1} \right ) -(e- s) }{ \rho_{n} \nu_{n} } = \frac{1}{\theta_n } - \frac{ (e-s) }{ \nu_{n} \rho_{n} }. \end{array} $$

Combining the last two results amounts to

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \left \|u+ \left (\frac{1}{\theta_n } - \frac{ e-s }{ \nu_{n} \rho_{n} } \right ) x \right \|^{2}+ \frac{\left (e-s \right )}{2} \left ( \tau_{n,e} + \frac{({\varrho} \nu_{n+1})^{2}}{S_{n}} \tau_{n,e+s} \right ) \|x\|^{2}. $$

This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maingé, PE., Labarre, F. Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer Algor 90, 99–136 (2022). https://doi.org/10.1007/s11075-021-01181-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11075-021-01181-y

Keywords

Mathematics Subject Classification (2010)

Navigation