Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

Maingé, Paul-Emile; Labarre, Florian

doi:10.1007/s11075-021-01181-y

Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

Original Paper
Published: 27 August 2021

Volume 90, pages 99–136, (2022)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

366 Accesses
1 Citation
Explore all metrics

Abstract

In a real Hilbert space, we study a new class of forward-backward algorithms for structured non-smooth minimization problems. As a special case of the parameters, we recover the method AFB (Accelerated Forward-Backward) that was recently discussed as an enhanced variant of FISTA (Fast Iterative Soft Thresholding Algorithm). Our algorithms enjoy the well-known properties of AFB. Namely, they generate convergent sequences (x_n) that minimize the function values at the rate o(n^− 2). Another important specificity of our processes is that they can be regarded as discrete models suggested by first-order formulations of Newton-like dynamical systems. This permit us to extend to the non-smooth setting, a property of fast convergence to zero of the gradients, established so far for discrete Newton-like dynamics with smooth potentials only. In specific, as a new result, we show that the latter property also applies to AFB. To prove this stability phenomenon, we develop a technical analysis that can be also useful regarding many other related developments. Numerical experiments are furthermore performed so as to illustrate the properties of the considered algorithms comparing with other existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Article Open access 03 June 2024

Fast Convex Optimization via Differential Equation with Hessian-Driven Damping and Tikhonov Regularization

Article 30 May 2024

Modified projection method and strong convergence theorem for solving variational inequality problems with non-Lipschitz operators

Article 01 June 2024

References

Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian driven damping. Application to Optimization and Mechanics. J. Math. Pures appl. 81(8), 747–779 (2002)
Article MathSciNet Google Scholar
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Control. Cybern. 31, 643–657 (2002)
MATH Google Scholar
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Applied Math. Optimization 80, 547–598 (2019)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. programming, Volume 168, Issue 1–2, pp. 123–175 (2018)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k². SIAM J. Optimization 26(3), 1824–1834 (2016)
Article MathSciNet Google Scholar
Attouch, J., Peypouquet, P.R.: Fast convex optimization via intertial dynamics with hessian driven damping. J Differential Equations 261 (10), 5734–5783 (2016)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Brezis, H.: Opérateurs Maximaux Monotones, Math. Stud, 5. North-Holland, Amsterdam (1973)
Chambolle, A., Dossal, C.: On the convergence of the iterates of FISTA. JOTA 166(3), 968–982 (2015)
Article Google Scholar
Cruz, J.B., Nghia, T.: On the convergence of the proximal forward-backward splitting method with linesearches. Optim. Methods and Software 31 (6), 1209–1238 (2016)
Article MathSciNet Google Scholar
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optimization 29, 403–419 (1991)
Article MathSciNet Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MathSciNet Google Scholar
Iutzeler, F., Hendrickx, J.M.: A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia arXiv:1603.05398v3 (2017)
Lemaire, B.: The Proximal Algorithm. In: New Methods in Optimization and Their Industrial Uses, J.P. Penot (Ed), Internat. Ser. Numer. Math, 87, pp. 73-87. Birkhauser, Basel (1989)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet Google Scholar
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision, pp. 1–15 (2014)
Labarre, F., Maingé, P. E.: First-order frameworks for continuous Newton-like dynamics governed by maximally monotone operators. Set-Valued and Variational Analysis, pp. 1–27 (2021)
Maingé, P.E., Maruster, S.: Convergence in norm of modified Krasnoselski-Mann iterations for fixed points of demicontractive mappings Applied Mathematics and Computation. Elsevier 217(24), 9864–9874 (2011)
MATH Google Scholar
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish Journal of Mathematics, 41(3). https://doi.org/10.3906/mat-1512-28 (2015)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programming, Ser. B 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Article Google Scholar
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)
Article MathSciNet Google Scholar
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Article MathSciNet Google Scholar
Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Machine learning Reasearch 17(153), 1–43 (2016)
MathSciNet MATH Google Scholar
Scheinberg, D.K., Goldfarb, X.: Bai, Fast first-order methods for composite convex optimization with backtraking. Found. Comput. Math. 14(3), 389–417 (2014)
Article MathSciNet Google Scholar
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via High-Resolution differential equations. https://doi.org/10.13140/RG.2.2.20063.92329 (2018)
Apidopoulos, V., Aujol, J.F., Dossal, C.: Convegence rate of inertial forward-backward algorithms beyong Nesterov’s rule, Mathematical Programming, Serie A, Springer, pp. 1-20 (ff10.1007/s10107-018-1350-9) (2018)

Download references

Author information

Authors and Affiliations

MEMIAD, Université des Antilles, Campus de Schoelcher, 97233, Martinique, F.W.I., France
Paul-Emile Maingé & Florian Labarre

Authors

Paul-Emile Maingé
View author publications
You can also search for this author in PubMed Google Scholar
Florian Labarre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul-Emile Maingé.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Proof of Proposition 3.2

Let {κ, e, ν_n} be positive parameters, and set ϱ = 1 − κ, τ_n = e + ν_n+ 1 and u_n = y_n − x_n. It is easily seen that (3.8a) can be rewritten as (for n ≥ p)

$$ \begin{array}{@{}rcl@{}} && \theta_n = \frac{1}{\tau_{n} } \left (\nu_{n} - {\varrho} \nu_{n+1} \right ), \end{array} $$

(A.1a)

$$ \begin{array}{@{}rcl@{}} & & \dot{x}_{n+1} + \chi^{*}_{n} + \theta_n u_n =0, \end{array} $$

(A.1b)

$$ \begin{array}{@{}rcl@{}} & & \dot{y}_{n+1} + \kappa u_{n} =0. \end{array} $$

(A.1c)

The sequel of the proof can be divided into the following parts (r1)–(r4):

(r1) An estimate from the inertial part of the method. Given $(s,q) \in [0,\infty ) \times {\mathcal H} $, we begin with proving that the discrete derivative $ \dot {G}_{n+1}(s,q)$ satisfies

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + s \tau_n \langle \chi^{*}_{n}, x_{n+1} -q \rangle =\\ {\kern48pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} , u_n\rangle \\ {\kern48pt}- \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2}- \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

(A.2)

In order to get this result, we readily notice that $\dot {G}_{n+1}(s,q)$ can be formulated as

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) = s (\dot{\nu}_{n+1} a_{n}+ \nu_{n+1}\dot{a}_{n+1} ) + s e \dot{b}_{n+1} + \nu^{2}_{n+1}\dot{c}_{n+1}+ {c}_{n} (\nu_{n+1}^{2} - {\nu_{n}^{2}}), \end{array} $$

(A.3)

where a_n := 〈q − x_n,u_n〉, b_n := (1/2)∥x_n − q∥² and c_n := (1/2)∥u_n∥². For the sake of clarity, and so as to estimate the right side of (A.3), we set

$$ \begin{array}{@{}rcl@{}} && P_{n}= \langle q-x_{n+1} , \dot{x}_{n+1} \rangle , R_{n}= \langle q-x_{n+1} , \dot{y}_{n+1} \rangle~ \text{and} ~W_{n}=\langle \chi^{*}_{n}, x_{n+1} -q \rangle . \end{array} $$

Clearly, by a_n = 〈q − x_n,u_n〉 and $u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1} $ (from (A.1c)), we get

$$ \begin{array}{l} a_n= \langle q-x_{n+1} ,u_n\rangle + \langle \dot{x}_{n+1} , u_n\rangle = - \frac{1 }{\kappa} R_n + \langle \dot{x}_{n+1} , u_n\rangle . \end{array} $$

(A.4)

Again from a_n = 〈q − x_n,u_n〉, and noticing $\dot {u}_{n+1} = \dot {y}_{n+1} - \dot {x}_{n+1} $ (as u_n = y_n − x_n), we readily have

$$ \dot{a}_{n+1} = \langle -\dot{x}_{n+1} , u_n \rangle + \langle q -x_{n+1} , \dot{u}_{n+1} \rangle = - \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n. $$

(A.5)

Taking the scalar product of each side of (A.1b) by $q-\dot {x}_{n+1}$, along with $u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1}$ (from (A.1c)), amounts to P_n − W_n = κ^− 1𝜃_nR_n, which, by $ \theta _{n} = \tau _{n}^{-1} (\nu _{n} - {\varrho } \nu _{n+1})$ (from (A.1a)) is equivalent to

$$ (\nu_{n} - {\varrho} \nu_{n+1}) R_n= \kappa \tau_n(P_n - W_n). $$

(A.6)

Therefore, by (A.4), (A.5) and (A.6), and recalling that ϱ = 1 − κ, we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{\nu}_{n+1} a_n+ \nu_{n+1}\dot{a}_{n+1}\\ &=&\dot{\nu}_{n+1} \left ( \langle \dot{x}_{n+1} , u_n\rangle - \frac{1 }{\kappa} R_n\right ) + \nu_{n+1}\left (- \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n \right )\\ &=& (\dot{\nu}_{n+1}-\nu_{n+1} ) \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \left (\nu_{n+1} - \frac{1 }{\kappa}\dot{\nu}_{n+1} \right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \frac{1}{\kappa} \left (\nu_{n} - {\varrho} \nu_{n+1}\right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \tau_n \left ( P_n - W_n \right ). \end{array} $$

(A.7)

From b_n+ 1 = (1/2)∥x_n+ 1 − q∥², we readily get

$$ \begin{array}{l} \dot{b}_{n+1} =\frac{1}{2}\langle \dot{x}_{n+1} , x_{n+1} -q \rangle + \frac{1}{2}\langle x_n -q , \dot{x}_{n+1} \rangle =-P_n - \frac{1}{2} \| \dot{x}_{n+1} \|^{2}. \end{array} $$

(A.8)

In addition, by c_n+ 1 = (1/2)∥u_n+ 1∥², and $\dot {u}_{n+1}= - \kappa u_{n} - \dot {x}_{n+1} $ (from u_n = y_n − x_n and $\dot {y}_{n+1} =-\kappa u_{n}$), we immediately have

$$ \begin{array}{@{}rcl@{}} \dot{c}_{n+1} &=& \frac{1}{2} \langle \dot{u}_{n+1}, {u}_{n+1} + u_n \rangle\\ &=&\langle - \dot{u}_{n+1}, -\frac{1}{2}\dot{u}_{n+1} -u_n\rangle\\ &=&\langle \kappa u_n + \dot{x}_{n+1} , \left (\frac{\kappa}{2} - 1 \right ) u_n + \frac{1}{2}\dot{x}_{n+1} \rangle\\ &=&\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_n\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_n \rangle . \end{array} $$

(A.9)

In light of (A.3) together with (A.7), (A.8) and (A.9), we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{G}_{n+1}(s,q)\\ &&= s \left (- \nu_{n} \langle \dot{x}_{n+1} , u_{n} \rangle - \nu_{n+1} P_{n} + \tau_{n} \left ( P_{n} - W_{n} \right ) \right )\\ &&\quad+ se \left (-P_{n} - \frac{1}{2} \| \dot{x}_{n+1} \|^{2} \right ) \\ &&\quad+ \nu_{n+1}^{2} \left (\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_{n}\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_{n} \rangle \right ) \\ &&\quad+ \frac{1}{2} (\nu_{n+1}^{2} - {\nu_{n}^{2}})\|u_{n}\|^{2}\\ \\ &&= -\left ( s\nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} ,u_{n} \rangle - \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} -\bar{\eta}_{n} \|u_{n}\|^{2} - s \tau_{n} W_{n}, \end{array} $$

where the quantity $ \bar {\eta }_{n}$ is given by

$$ \begin{array}{l} \bar{\eta}_{n}= \kappa \left (1- \frac{ \kappa}{2} \right )\nu_{n+1}^{2} - \frac{ 1 }{2}(\nu_{n+1}^{2} - {\nu_{n}^{2}}) \\ {\kern12.5pt}= \frac{1}{2} \left ({\nu_{n}^{2}}- \nu_{n+1}^{2} (1-\kappa)^{2}\right ) (\text{since}~ \kappa \left (1- \frac{ \kappa}{2} \right ) = \frac{1}{2} -\frac{1}{2}(1-\kappa)^{2} ). \end{array} $$

This leads to the desired result.

(r2) An estimate from the proximal part of the method. Let us establish that, for any ξ_n≠ 1, it holds that

$$ \begin{array}{l} \xi_n \langle \chi_n^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \| \dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}= \theta_n (1-\xi_n) \langle u_n,\dot{x}_{n+1} \rangle + \frac{1}{2} \theta_n^{2} \|u_n\|^{2} - \left (\xi_n- \frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$

(A.10)

Indeed, we have $\dot {x}_{n+1} =-\theta _n u_{n} -\chi _{n}^{*}$ (from (A.1b)), which, for any ξ_n≠ 1, can be rewritten as

$$ \begin{array}{l} \xi_n \dot{x}_{n+1} = -(1-\xi_n) \left (\dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \right ) - \chi_n^{*} = -(1-\xi_n) H_n - \chi_n^{*} , \end{array} $$

(A.11)

where $H_{n}= \dot {x}_{n+1} + (1-\xi _n)^{-1}\theta _n u_{n} $. Furthermore, by $-\chi _{n}^{*}= \dot {x}_{n+1}+\theta _{n} u_{n}$ (again using (A.1b)) and denoting $Q_{n}=\langle \dot {x}_{n+1}, u_{n}\rangle $, we simply obtain

$$ \begin{array}{l} \langle (-\chi_n^{*}), H_n \rangle = \langle \dot{x}_{n+1} + \theta_n u_n, \dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \rangle \\ {}= \|\dot{x}_{n+1} \|^{2} + (1-\xi_n)^{-1}\theta_n^{2} \|u_n\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)}\theta_n Q_{n}.\ \end{array} $$

(A.12)

Therefore, by adding $(1/2) \| \chi _{n}^{*}\|^{2}$ to the scalar product of the left side of equality (A.11) with $\chi _{n}^{*}$, and using (A.12) and $ \| \chi _{n}^{*}\|^{2}= \|\dot {x}_{n+1}+ \theta _{n} u_{n}\|^{2}$, we get

$$ \begin{array}{l} \xi_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \| \chi_{n}^{*}\|^{2} = (1-\xi_n) \langle (-\chi_{n}^{*}), H_{n} \rangle - \frac{1}{2} \| \chi_{n}^{*}\|^{2}\\ {\kern99pt}= (1-\xi_n) \left (\|\dot{x}_{n+1} \|^{2} + \frac{ \theta_n^{2} }{(1-\xi_n)} \|u_{n}\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)} \theta_n Q_{n} \right ) \\ {\kern110pt}- \frac{1}{2}\left (\|\dot{x}_{n+1} \|^{2} + \theta_n^{2} \|u_{n}\|^{2} + 2 \theta_n Q_{n}\right ) \\ {\kern99pt}= (1-\xi_{n}) \theta_n Q_{n} + \frac{1}{2}\theta_n^{2} \|u_{n}\|^{2} + \left (\frac{1}{2} -\xi_{n} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

This yields (A.10).

(r3) Combining proximal and inertial effects. Let (3.10) hold and set $\rho _{n}= 1- (1-\kappa ) \frac {\nu _{n+1}}{\nu _{n}}$. It is worthwhile noticing that the term 𝜃_n involved in (A.1) can be simply expressed as

$$ \theta_n =\frac{\nu_{n} \rho_{n} }{ \tau_{n}} ~\text{where}~ \tau_{n}=e+\nu_{n+1}. $$

(A.13)

So, by (A.13), in light of ρ_n > 0 (from condition (3.3a)), we deduce that 𝜃_n is a positive sequence.

Now, we introduce the real sequence $(\bar {\gamma }_{n})$ defined by

$$ \bar{\gamma}_{n}=1- \frac{s \rho_{n}}{\tau_{n}} ~(\text{with}~ s > 0). $$

(A.14)

Clearly, by (A.14), along with τ_n > 0 (as τ_n := e + ν_n+ 1) and ρ_n > 0, we obviously have

$$ \bar{\gamma}_{n} < 1 ~(\text{for any}~ s >0). $$

(A.15)

Next, given s > 0, we show that the iterates produced by (3.8a) (or, equivalently, by (A.1)) verify

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + \frac{1}{2} \rho_n^{-1} \tau_n^{2} \|\dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}+ (s \tau_n)\langle \chi_n^{*}, x_{n+1} -q \rangle + \bar{\gamma}_{n} \rho_n^{-1}\tau_n^{2} \langle \chi_n^{*},\dot{x}_{n+1} \rangle = -T_n(u_n,\dot{x}_{n+1} ) , \end{array} $$

(A.16)

where T_n(u, x) is defined for any $(u,x) \in {\mathcal H}^{2}$ by

$$ T_{n}(u,x) = w_{n} \langle u, x \rangle +\eta_{n} \| u \|^{2}+ \sigma_{n} \|x\|^{2}, $$

(A.17)

together with the parameters

$$ \begin{array}{@{}rcl@{}} && {w}_{n} = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), {\eta}_{n}= \frac{1}{2} {\varrho} \rho_{n} \nu_{n} \nu_{n+1} , \end{array} $$

(A.18a)

$$ \begin{array}{@{}rcl@{}} && {\sigma}_{n} =\frac{1}{2} \left (se - \nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ). \end{array} $$

(A.18b)

Indeed, by (A.2) and setting $Q=\langle \dot {x}_{n+1} , u_{n} \rangle $, we know that

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_n)\langle \chi^{*}_{n}, x_{n+1} -q \rangle = \\ {\kern12pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) Q_n - \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2} - \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$

(A.19)

Moreover, in light of $\bar {\gamma }_{n} \neq 1$ (from (A.15)), by using (A.10) (with the special value $\xi _{n}= \gamma _{n}:=1- \frac {s \rho _{n}}{\tau _{n}}$) and recalling that $\theta _n = \frac {\nu _{n} \rho _{n}}{\tau _{n}}$ we obtain

$$ \begin{array}{l} \bar{\gamma}_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= s \frac{\nu_{n} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} Q_{n}+ \frac{1}{2} \frac{{\nu_{n}^{2}} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} \|u_{n}\|^{2} - \left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

(A.20)

Then, multiplying equality (A.20) by $\rho _{n}^{-1} {\tau _{n}^{2}}$, and adding the resulting equality to (A.19), we get

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_{n})\langle \chi^{*}_{n}, x_{n+1} -q \rangle \\ {\kern12pt}+ \bar{\gamma}_{n} \rho_{n}^{-1} {\tau_{n}^{2}} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \rho_{n}^{-1} {\tau_{n}^{2}} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= \left (- \left (s \nu_{n}+ {\varrho} \nu_{n+1}^{2} \right ) + s \nu_{n} \rho_{n} \right ) Q_{n} \\ {\kern12pt}+ \left (- \frac{1}{2} \left ({\nu_{n}^{2}} - {\varrho}^{2} \nu_{n+1}^{2}\right ) + \frac{1}{2} {\nu_{n}^{2}} \rho_{n} \right ) \| u_{n} \|^{2} \\ {\kern12pt}\left ( \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}}\left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

Hence, noticing that ν_nρ_n = ν_n − ϱν_n+ 1, we infer that (A.16)–(A.17) is actually satisfied together with the parameters

$$ \begin{array}{l} {w}_{n} = \left (s \nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) - s (\nu_{n} -{\varrho} \nu_{n+1}) = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), \\ {\eta}_{n}= \frac{1}{2} \left ({\nu_{n}^{2}} - \nu_{n+1}^{2} {\varrho}^{2} \right ) - \frac{1}{2}\left ({\nu_{n}^{2}}- {\varrho} \nu_{n} \nu_{n+1}\right ) = \frac{1}{2} {\varrho} \nu_{n+1} \left ( \nu_{n} - \nu_{n+1} {\varrho} \right) = \frac{1}{2} {\varrho} \nu_{n+1} \nu_{n} \rho_{n}, \\ {\sigma}_{n} = \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}} \left (\bar{\gamma}_{n} - \frac{1}{2} \right ) = \frac{1}{2} \left (se -\nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ) . \end{array} $$

This leads to the desired result.

(r4) Finally, we give an alternative formulation of the quantity T_n(u, x) given by (A.17)–(A.18). For this purpose, we begin with reformulating σ_n. By the definitions τ_n := e + ν_n+ 1, $\bar {\gamma }_{n}:=1-s \frac {\rho _{n}}{\tau _{n}}$, and by an easy computation we have

$$ \begin{array}{l} \rho_{n}^{-1}{\tau_{n}^{2}} (2 \bar{\gamma}_{n} -1) = \frac{(e+ \nu_{n+1})^{2}}{ \rho_{n}} \left (1- 2s \frac{\rho_{n}}{(e+ \nu_{n+1})} \right ) \\ {\kern69pt}= \frac{1 }{\rho_{n}} \left (e^{2}+ 2 e \nu_{n+1} + (\nu_{n+1})^{2} \right )- 2s \left (e+ \nu_{n+1} \right ) \\ {\kern69pt}= e \left (\frac{e }{\rho_{n} } -s \right ) - s e + 2 \nu_{n+1} \left (\frac{e }{\rho_{n} } -s \right ) + \frac{(\nu_{n+1} )^{2} }{\rho_{n} } \\ {\kern69pt}= \left (e + 2 \nu_{n+1}\right ) \left (\frac{e }{\rho_{n} } -s \right ) - s e + \frac{(\nu_{n+1})^{2} }{\rho_{n} } \\ {\kern69pt}= \tau_{n,e} \left (e \rho_{n}^{-1} -s \right ) - s e + \rho_{n}^{-1} (\nu_{n+1} )^{2}, \end{array} $$

where τ_{n, t} = t + 2ν_n+ 1 (for t ≥ 0). As a consequence, by the previous definition of σ_n (in (A.18)), we obtain

$$ \begin{array}{l} 2 {\sigma}_{n}= (\rho_n^{-1}-1) (\nu_{n+1} )^{2} + \tau_{n,e} \left (e \rho_n^{-1}-s \right )\\ {\kern18pt}= \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_n^{-1} -1 \right ) + \tau_{n,e} (e -s). \end{array} $$

(A.21)

Then we consider the following two situations relative to the constant κ:

- In the special case when κ = 1 (hence ρ_n = 1 and $ \rho _{n}^{-1}=1$), we obviously have w_n = 0 and η_n = 0. Then, for $(u,x) \in {\mathcal H}^{2}$, by definition of T_n (in (A.17)) along with $ {\sigma }_{n}= \frac {\left (e -s \right )}{2} \tau _{n,e} $ (from (A.21)) we obtain

$$ T_{n}(u,x)= \frac{\left (e -s \right )}{2} \tau_{n,e} \|x\|^{2}. $$

(A.22)

- For $\kappa \in (0,1) \cup (1,\infty )$ (hence η_n≠ 0), also setting ${\varsigma }_{n}:=\frac {w_{n}}{2 \eta _{n}}$, and $\psi _{n}:= 4 {\sigma }_{n} {\eta }_{n}- {w}_{n}^{2}$, by definition of T_n (in (A.17)) we classically have

$$ T_{n}(u,x)= \eta_{n} \|u+{\varsigma}_{n}x \|^{2}+ \frac{\psi_{n}}{4 \eta_{n} } \|x\|^{2}. $$

(A.23)

On the one hand, by ${w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right )$ (from (A.18)) and remembering that τ_{n, s} = s + 2ν_n+ 1, we simply have ${w}_{n}^{2} = ({\varrho } \nu _{n+1})^{2} \left ((\nu _{n+1} )^{2} + s \tau _{n,s} \right )$. Hence, by (A.21) while using the definition of ψ_n, and setting S_n := ϱρ_nν_nν_n+ 1 (so that S_n = 2η_n and $\psi _{n}= 2 {\sigma }_{n} S_{n}- {w}_{n}^{2}$), we obtain

$$ \psi_{n}= S_{n} \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_{n}^{-1} -1 \right ) + S_{n} \tau_{n,e} (e -s) - ({\varrho} \nu_{n+1})^{2} \left ( (\nu_{n+1} )^{2} + s \tau_{n,s} \right).$$

It is also easily checked that $S_{n} \left (\rho _{n}^{-1} -1 \right ) = ({\varrho } \nu _{n+1})^{2} $, which by the previous equality yields

$$ \begin{array}{l} \psi_{n}= S_{n} \tau_{n,e} \left (e-s \right ) + ({\varrho} \nu_{n+1})^{2} \left (e \tau_{n,e} - s \tau_{n,s} \right ). \end{array} $$

(A.24)

Then, noticing that eτ_{n, e} − sτ_{n, s} = (e − s)τ_{n, e+s} (as τ_{n, t} := t + 2ν_n+ 1, for t ≥ 0), we infer that $ \psi _{n} = \left (e-s \right ) \left (S_{n} \tau _{n,e} + ({\varrho } \nu _{n+1})^{2} \tau _{n,e+s} \right )$, which by (A.23) entails that

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \|u+{\varsigma}_{n} x \|^{2}+ \frac{\left (e-s \right )}{2 S_{n}} \left ( S_{n} \tau_{n,e} + ({\varrho} \nu_{n+1})^{2} \tau_{n,e+s} \right ) \|x\|^{2}. $$

On the other hand, we clearly have ${\varsigma }_{n}=\frac {w_{n}}{S_{n}}$ (since S_n = 2η_n), together with $ {w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right ) $, S_n = ϱρ_nν_nν_n+ 1 and $ \frac {1}{\theta _{n}}=\frac { e+ \nu _{n+1}}{ \rho _{n} \nu _{n}}$ (from (A.13)), which gives us

$$ \begin{array}{l} {\varsigma}_{n}= \frac{ \nu_{n+1} + s }{ \rho_{n} \nu_{n} } = \frac{ \left (e+ \nu_{n+1} \right ) -(e- s) }{ \rho_{n} \nu_{n} } = \frac{1}{\theta_n } - \frac{ (e-s) }{ \nu_{n} \rho_{n} }. \end{array} $$

Combining the last two results amounts to

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \left \|u+ \left (\frac{1}{\theta_n } - \frac{ e-s }{ \nu_{n} \rho_{n} } \right ) x \right \|^{2}+ \frac{\left (e-s \right )}{2} \left ( \tau_{n,e} + \frac{({\varrho} \nu_{n+1})^{2}}{S_{n}} \tau_{n,e+s} \right ) \|x\|^{2}. $$

This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maingé, PE., Labarre, F. Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer Algor 90, 99–136 (2022). https://doi.org/10.1007/s11075-021-01181-y

Download citation

Received: 04 November 2020
Accepted: 27 July 2021
Published: 27 August 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11075-021-01181-y

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

Abstract

Access this article

Similar content being viewed by others

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Fast Convex Optimization via Differential Equation with Hessian-Driven Damping and Tikhonov Regularization

Modified projection method and strong convergence theorem for solving variational inequality problems with non-Lipschitz operators

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix 1. Proof of Proposition 3.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

Abstract

Access this article

Similar content being viewed by others

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Fast Convex Optimization via Differential Equation with Hessian-Driven Damping and Tikhonov Regularization

Modified projection method and strong convergence theorem for solving variational inequality problems with non-Lipschitz operators

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix 1. Proof of Proposition 3.2

Appendix 1. Proof of Proposition 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation