1 Introduction

In convex optimization, first-order methods are iterative algorithms that use function values and (generalized) derivatives to build minimizing sequences. Perhaps the oldest and simplest of them is the gradient method [13], which can be interpreted as a finite-difference discretization of the differential equation

$$\begin{aligned} \dot{x}(t)+\nabla \phi \big (x(t)\big )=0, \end{aligned}$$
(1)

describing the steepest descent dynamics. The gradient method is applicable to smooth functions, but there are more contemporary variations that can deal with nonsmooth ones, and even exploit the functions’ structure to enhance the algorithm’s per iteration complexity, or overall performance. A keynote example (see [9] for further insight) is the proximal-gradient (or forward–backward) method [18, 28], (see also [22, 30]), which is in close relationship with a nonsmooth version of (1). In any case, the analysis of related differential equations or inclusions is a valuable source of insight into the dynamic behavior of these iterative algorithms.

In [25, 29], the authors introduced inertial substeps in the iterations of the gradient method, in order to accelerate its convergence. This variation improves the worst-case convergence rate from \(\mathcal O(k^{-1})\) to \(\mathcal O(k^{-2})\). In the strongly convex case, the constants that describe the linear convergence rate are also improved. This method was extended to the proximal-gradient case in [10], and to fixed point iterations [16, 21] (see, for example [3, 4, 14, 19, 20], among others). Su et al. [31] showed that Nesterov’s inertial gradient algorithm—and, analogously, the proximal variant—can be interpreted as a discretization of the ordinary differential equation with Asymptotically Vanishing Damping

$$\begin{aligned} \ddot{x}(t) + \dfrac{\alpha }{t}\dot{x}(t) + \nabla \phi (x(t)) =0, \end{aligned}$$
(AVD)

where \(\alpha >0\). The function values vanish along the trajectories [6], and they do so at a rate of \(\mathcal {O}(t^{-2})\) for \(\alpha \ge 3\) [31], and \(o(t^{-2})\) for \(\alpha > 3\) [23], in the worst-case scenario.

Despite its faster convergence rate guarantees, trajectories satisfying (AVD)—as well as sequences generated by inertial first order methods—exhibit a somewhat chaotic behavior, especially if the objective function is ill-conditioned. In particular, the function values tend not to decrease monotonically, but to present an oscillatory character, instead.

Example 1.1

We consider the quadratic function \(\phi :{\mathbb R}^3\rightarrow {\mathbb R}\), defined by

$$\begin{aligned} \phi (x_1,x_2,x_3)=\frac{1}{2}(x_1^2+\rho x_2^2+\rho ^2x_3^2), \end{aligned}$$
(2)

whose condition number is \(\max \{\rho ^2,\rho ^{-2}\}\). Figure 1 shows the behavior of the solution to (AVD), with \(x(1)=(1,1,1)\) and \(\dot{x}(1)=-\nabla \phi \big (x(1)\big )\) (the direction of maximum descent).

Fig. 1
figure 1

Depiction of the function values according to Example 1.1, on the interval [1, 35], for \(\alpha =3.1\), and \(\rho =10\) (left) and \(\rho =100\) (right)

In order to avoid this undesirable behavior, and partly inspired by a continuous version of Newton’s method [2], Attouch et al. [5] proposed a Dynamic Inertial Newton system with Asymptotically Vanishing Damping, given by

$$\begin{aligned} \ddot{x}(t) + \dfrac{\alpha }{t}\dot{x}(t) + \nabla \phi (x(t)) + \beta \nabla ^2 \phi (x(t))\dot{x}(t)=0, \end{aligned}$$
(DIN-AVD)

where \(\alpha ,\beta >0\). In principle, this expression only makes sense when \(\phi \) is twice differentiable, but the authors show that it can be transformed into an equivalent first-order equation in time and space, which can be extended to a differential inclusion that is well posed whenever \(\phi \) is closed and convex. The authors presented (DIN-AVD) as a continuous-time model for the design of new algorithms, a line of research already outlined in [5], and continued in [7]. Back to (DIN-AVD), the function values vanish along the solutions, with the same rates as for (AVD). Nevertheless, in contrast with the solutions of (AVD), the oscillations are tame.

Example 1.2

In the context of Example 1.1, Fig. 2 shows the behavior of the solution to (AVD) in comparison with that of (DIN-AVD), both with \(x(1)=(1,1,1)\) and \(\dot{x}(1)=-\nabla \phi \big (x(1)\big )\).

Fig. 2
figure 2

Depiction of the function values according to Example 1.2, on the interval [1, 35], for \(\alpha =3.1\), \(\beta =1\), and \(\rho =10\) (left) and \(\rho =100\) (right)

An alternative way to avoid–or at least moderate—the oscillations exemplified in Fig. 1 for the solutions of (AVD) is to stop the evolution and restart it with zero initial velocity, from time to time. The simplest option is to do so periodically, at fixed intervals. This idea is used in [26] for the accelerated gradient method, where the number of iterations between restarts that depends on the parameter of strong convexity of the function. See also [1, 8, 24], where the problem of estimating the appropriate restart times is addressed. An adaptive policy for the restarting of Nesterov’s Method was proposed by O’Donoghue and Candès [27], where the algorithm is restarted at the first iteration k such that \(\phi (x_{k+1})>\phi (x_{k})\), which prevents the function values to increase locally. This kind of restarting criteria shows a remarkable performance, although convergence rate guarantees have not been established, although some partial steps in this direction have been made in [15, 17]. Moreover, the authors of [27] observe that this heuristic displays an erratic behavior when the difference \(\phi (x_{k})-\phi (x_{k+1})\) is small, due to the prevalence of cancellation errors. Therefore, this method must be handled with care if high accuracy is desired. A different restarting scheme, based on the speed of the trajectories, is proposed for (AVD) in [31], where rates of convergence are established. The improvement can be remarkable, as shown in Fig. 3.

Fig. 3
figure 3

Values along the trajectory, with (red) and without (blue) restarting, for (AVD)

In [31], the authors also perform numerical tests using Nesterov’s inertial gradient method, with this restarting scheme as a heuristic, and observe a faster convergence to the optimal value.

The aim of this work is to analyze the impact that the speed restarting scheme has on the solutions of (DIN-AVD), in order to set the theoretical foundations to further accelerate Hessian-driven inertial algorithms—like the ones in [7]—by means of a restarting policy. This approach combines two oscillation mitigation principles that result in a monotonic and fast convergence of the function values. We provide linear convergence rates for functions with quadratic growth and observe a noticeable improvement in the behavior of the trajectories in terms of stability and convergence speed, both in comparison with the non-restarted trajectories, and with the restarted solutions of (AVD). As a byproduct, we generalize and improve some of the results in [31]. It is worth noticing that the convergence rate result holds for all values of \(\alpha >0\) and \(\beta \ge 0\), in contrast with those in [5,6,7].

The paper is organized as follows: In Sect. 2, we describe the speed restart scheme and state the convergence rate of the corresponding trajectories, which is the main theoretical result of this paper. Section 3 contains the technical auxiliary results—especially some estimations on the restarting time—leading to the proof of our main result, which is carried out in Sect. 4. Finally, we present a few simple numerical examples in Sect. 5, in order to illustrate the improvements, in terms of convergence speed, of the restarted trajectories.

2 Restarted Trajectories for (DIN-AVD)

Throughout this paper, \(\phi :{\mathbb R}^n\rightarrow {\mathbb R}\) is a twice continuously differentiable convex function, which attains its minimum value \(\phi ^*\), and whose gradient \(\nabla \phi \) is Lipschitz-continuous with constant \(L>0\). Consider the ordinary differential equation (DIN-AVD), with initial conditions \(x(0)=x_0\), \(\dot{x}(0)=0\), and parameters \(\alpha > 0\) and \(\beta \ge 0\). A solution is a function in \(\mathcal {C}^2\left( (0,+\infty ); {\mathbb R}^n \right) \cap \mathcal {C}^1\left( [0,+\infty ); {\mathbb R}^n \right) \), such that \(x(0)=x_0\), \(\dot{x}(0)=0\) and (DIN-AVD) holds for every \(t>0\). Existence and uniqueness of such a solution is not straightforward due to the singularity at \(t=0\), but can be established by a limiting procedure. As shown in Appendix 1, we have the following:

Theorem 2.1

For every \(x_0 \in {\mathbb R}^n\), the differential equation (DIN-AVD), with initial conditions \(x(0)=x_0\) and \(\dot{x}(0)=0\), has a unique solution.

We are concerned with the design and analysis of a restart scheme to accelerate the convergence of the solutions of (DIN-AVD) to minimizers of \(\phi \), based on the method proposed in [31].

2.1 A Speed Restarting Scheme and the Main Theoretical Result

Since the damping coefficient \(\alpha /t\) goes to 0 as \(t\rightarrow \infty \), large values of t result in a smaller stabilization of the trajectory. The idea is thus to restart the dynamics at the point where the speed ceases to increase.

Given \(z\in {\mathbb R}^n\), let \(y_z\) be the solution of (DIN-AVD), with initial conditions \(y_z(0)=z\) and \(\dot{y}_z(0)=0\). Set

$$\begin{aligned} T(z)=\inf \left\{ t >0 : \dfrac{d}{dt} \left\| \dot{y}_z(t)\right\| ^2 \le 0 \right\} . \end{aligned}$$
(3)

Remark 2.1

Take \(z\notin {{\,\textrm{argmin}\,}}(\phi )\), and define \(y_z\) as above. For \(t\in \big (0,T(z)\big )\), we have

$$\begin{aligned} \dfrac{d}{dt}\phi (y_{z}(t))&=\langle \nabla \phi (y_{z}(t)), \dot{y}_{z}(t) \rangle \\&= - \langle \ddot{y}_{z}(t), \dot{y}_{z}(t) \rangle - \dfrac{\alpha }{t}\left\| \dot{y}_{z}(t)\right\| ^2 - \beta \langle \nabla ^2 \phi (y_{z}(t))\dot{y}_{z}(t), \dot{y}_{z}(t) \rangle . \end{aligned}$$

But \(\langle \nabla ^2 \phi (y_{z}(t))\dot{y}_{z}(t), \dot{y}_{z}(t) \rangle \ge 0\) by convexity, and \(\langle \ddot{y}_{z}(t), \dot{y}_{z}(t) \rangle \ge 0\) by the definition of T(z). Therefore,

$$\begin{aligned} \dfrac{d}{dt}\phi (y_{z}(t))\le - \dfrac{\alpha }{t}\left\| \dot{y}_{z}(t)\right\| ^2. \end{aligned}$$
(4)

In particular, \(t\mapsto \phi \big (y_z(t)\big )\) decreases in [0, T(z)].

If \(z\notin {{\,\textrm{argmin}\,}}(\phi )\), then T(z) cannot be 0. In fact, we shall prove (see Corollaries 3.2 and 3.3) that

$$\begin{aligned} 0<\inf \big \{T(z):z\notin {{\,\textrm{argmin}\,}}(\phi )\big \}\le \sup \big \{T(z):z\notin {{\,\textrm{argmin}\,}}(\phi )\big \}<\infty . \end{aligned}$$
(5)

Definition 2.1

Given \(x_0\in {\mathbb R}^n\), the restarted trajectory \(\chi _{x_0}:[0,\infty )\rightarrow {\mathbb R}^n\) is defined inductively:

  1. 1.

    First, compute \(y_{x_0}\), \(T_1=T(x_0)\) and \(S_1=T_1\), and define \(\chi _{x_0}(t)=y_{x_0}(t)\) for \(t\in [0,S_1]\).

  2. 2.

    For \(i\ge 1\), having defined \(\chi _{x_0}(t)\) for \(t\in [0,S_i]\), set \(x_i=\chi _{x_0}(S_i)\), and compute \(y_{x_i}\). Then, set \(T_{i+1}=T(x_i)\) and \(S_{i+1}=S_{i}+T_{i+1}\), and define \(\chi _{x_0}(t)=y_{x_i}(t-S_{i})\) for \(t\in (S_i,S_{i+1}]\).

In view of (5), \(S_i\) is defined for all \(i\ge 1\), \(\inf _{i\ge 1}(S_{i+1}-S_i)>0\) and \(\lim _{i\rightarrow \infty }S_i=\infty \). Moreover, in view of Remark 2.1, we have

Proposition 2.1

The function \(t\mapsto \phi \big (\chi _{x_0}(t)\big )\) is nonincreasing on \([0,\infty )\).

Our main theoretical result establishes that \(\phi \big (\chi _{x_0}(t)\big )\) converges linearly to \(\phi ^*\), provided there exists \(\mu >0\) such that

$$\begin{aligned} \mu (\phi (z) - \phi ^*) \le \dfrac{1}{2}\left\| \nabla \phi (z)\right\| ^2 \end{aligned}$$
(6)

for all \(z\in {\mathbb R}^n\). The Łojasiewicz inequality (6) is equivalent to quadratic growth and is implied by strong convexity (see [11]). More precisely, we have the following:

Theorem 2.2

Let \(\phi :{\mathbb R}^n\rightarrow {\mathbb R}\) be convex and twice continuously differentiable. Assume \(\nabla \phi \) is Lipschitz-continuous with constant \(L>0\), there exists \(\mu >0\) such that (6) holds, and that the minimum value \(\phi ^*\) of \(\phi \) is attained. Given \(\alpha >0\) and \(\beta \ge 0\), let \(\chi _{x_0}\) be the restarted trajectory defined by (DIN-AVD) from an initial point \(x_0 \in {\mathbb R}^n\). Then, there exist constants \(C,K>0\) such that

$$\begin{aligned} \phi \big (\chi _{x_0}(t)\big )-\phi ^*\le Ce^{-Kt}\big (\phi (x_0)-\phi ^*\big )\le \frac{CL}{2}e^{-Kt}{{\,\textrm{dist}\,}}\big (x_0,{{\,\textrm{argmin}\,}}(\phi )\big )^2 \end{aligned}$$

for all \(t>0\).

The rather technical proof is split into several parts and presented in the next subsections.

3 Technicalities

Throughout this section, we fix \(z\notin {{\,\textrm{argmin}\,}}(\phi )\) and, in order to simplify the notation, we denote by x (instead of \(y_{z}\)) the solution of (DIN-AVD) with initial condition \(x(0)=z\) and \(\dot{x}(0)=0\).

3.1 A Few Useful Artifacts

We begin by defining some useful auxiliary functions and point out the main relationships between them.

To this end, we first rewrite Eq. (DIN-AVD) as

$$\begin{aligned} \dfrac{d}{dt}(t^\alpha \dot{x}(t)) = - t^\alpha \nabla \phi (x(t)) - \beta t^\alpha \nabla ^2 \phi (x(t))\dot{x}(t). \end{aligned}$$
(7)

Integrating (7) over [0, t], we get

$$\begin{aligned} t^\alpha \dot{x}(t)&= - \int _{0}^{t}u^\alpha \nabla \phi (x(u))\, du -\beta \int _{0}^{t}u^\alpha \nabla ^2 \phi (x(u))\dot{x}(u)\, du \nonumber \\&= -\left[ \int _{0}^{t} u^\alpha (\nabla \phi (x(u))- \nabla \phi (z))\, du\right] - \left[ \beta \int _{0}^{t}u^\alpha \nabla ^2 \phi (x(u))\dot{x}(u)\, du\right] \end{aligned}$$
(8)
$$\begin{aligned}&\quad - \dfrac{t^{\alpha +1}}{\alpha +1}\nabla \phi (z). \end{aligned}$$
(9)

In order to obtain an upper bound for the speed \(\dot{x}\), the integrals

$$\begin{aligned} I_z(t)=\int _{0}^{t} u^\alpha (\nabla \phi (x(u))- \nabla \phi (z))\, du \quad \hbox {and}\quad J_z(t)=\beta \int _{0}^{t}u^\alpha \nabla ^2 \phi (x(u))\dot{x}(u)\, du \end{aligned}$$
(10)

will be majorized using the function

$$\begin{aligned} M_z(t)=\sup _{u \in (0,t]} \left[ \dfrac{\left\| \dot{x}(u)\right\| }{u} \right] , \end{aligned}$$
(11)

which is positive, nondecreasing and continuous.

Lemma 3.1

For every \(t>0\), we have

$$\begin{aligned} \left\| I_z(t)\right\| \le \dfrac{LM_z(t)t^{\alpha +3}}{2(\alpha +3)} \quad \hbox {and}\quad \left\| J_z(t)\right\| \le \dfrac{\beta L M_z(t)t^{\alpha +2}}{\alpha +2}. \end{aligned}$$

Proof

For the first estimation, we use the Lipschitz-continuity of \(\nabla \phi \) and the fact that \(M_z\) in nondecreasing, to obtain

$$\begin{aligned} \left\| \nabla \phi (x(u)) - \nabla \phi (z)\right\|&\le L\Vert x(u)-z\Vert \\&\le L \left\| \int _{0}^{u} \dot{x}(s)\, ds\right\| \le L \int _{0}^{u} s \dfrac{\left\| \dot{x}(s)\right\| }{s}\, ds \le LM_z(u) \int _{0}^{u} s \, ds , \end{aligned}$$

which results in

$$\begin{aligned} \left\| \nabla \phi (x(u)) - \nabla \phi (z)\right\| \le \dfrac{Lu^2M_z(u)}{2}. \end{aligned}$$
(12)

Then, from the definition of \(I_z(t)\) we deduce that

$$\begin{aligned} \left\| I_z(t)\right\| \le \int _{0}^{t} u^\alpha \left\| \nabla \phi (x(u)) - \nabla \phi (z)\right\| \, du \le \dfrac{LM_z(t)}{2} \int _{0}^{t} u^{\alpha +2} \, du = \dfrac{LM_z(t)t^{\alpha +3}}{2(\alpha +3)}. \end{aligned}$$

For the second inequality, we proceed analogously to get

$$\begin{aligned} \left\| \nabla ^2 \phi (x(u))\dot{x}(u)\right\|&= \left\| {\lim _{r \rightarrow u}} \dfrac{\nabla \phi (x(r)) - \nabla \phi (x(u))}{r-u}\right\| \\&\le {\lim _{r \rightarrow u}} \dfrac{L}{r-u} \int _{u}^{r}\left\| \dot{x}(s)\right\| \, ds \le {\lim _{r \rightarrow u}} \dfrac{L M_z(r)}{r-u} \int _{u}^{r} s\, ds, \end{aligned}$$

which yields

$$\begin{aligned} \left\| \nabla ^2 \phi (x(u))\dot{x}(u)\right\| \le LuM_z(u). \end{aligned}$$
(13)

Then,

$$\begin{aligned} \left\| J_z(t)\right\|&\le \beta \int _{0}^{t}u^\alpha \left\| \nabla ^2\phi (x(u))\dot{x}(u)\right\| \, du \le \beta \int _{0}^{t} u^{\alpha +1} LM_z(u) \,du \\&\le \dfrac{\beta L M_z(t)t^{\alpha +2}}{\alpha +2}, \end{aligned}$$

as claimed. \(\square \)

The dependence of \(M_z\) on the initial condition z may be greatly simplified. To this end, set

$$\begin{aligned} H(t)=1 -\dfrac{L\beta t}{(\alpha +2)} - \dfrac{Lt^2}{2(\alpha +3)}. \end{aligned}$$
(14)

The function H is concave, quadratic, does not depend on z, and has exactly one positive zero, given by

$$\begin{aligned} \tau _1=- \left( \dfrac{ \alpha +3}{\alpha +2}\right) \beta +\sqrt{\left( \dfrac{ \alpha +3}{\alpha +2}\right) ^2\beta ^2 + \dfrac{2(\alpha +3)}{L}}. \end{aligned}$$
(15)

In particular, H decreases strictly from 1 to 0 on \([0,\tau _1]\).

Lemma 3.2

For every \(t \in (0,\tau _1)\),

$$\begin{aligned} M_z(t) \le \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)H(t)}. \end{aligned}$$
(16)

Proof

If \(0< u\le t\), using (8) and (10), along with Lemma 3.1, we obtain

$$\begin{aligned} \dfrac{\left\| \dot{x}(u)\right\| }{u}&\le \dfrac{\left\| I_z(u)+J_z(u)\right\| }{u^{\alpha +1}} + \dfrac{\left\| \nabla \phi (z)\right\| }{\alpha +1} \le \left[ \dfrac{L u^2}{2(\alpha +3)} + \dfrac{L\beta u}{\alpha +2} \right] M_z(u)\nonumber \\&\quad + \dfrac{\left\| \nabla \phi (z)\right\| }{\alpha +1}. \end{aligned}$$
(17)

Since the right-hand side is nondecreasing in t, we take the supremum for \(u\in [0,t]\) to deduce that

$$\begin{aligned} M_z(t) \le \left[ \dfrac{L t^2}{2(\alpha +3)} + \dfrac{L\beta t}{\alpha +2} \right] M_z(t) + \dfrac{\left\| \nabla \phi (z)\right\| }{\alpha +1}. \end{aligned}$$

Rearranging the terms, and using the definition of H, given in (14), we see that

$$\begin{aligned} H(t)M_z(t) \le \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)}. \end{aligned}$$

We conclude by observing that H is positive on \((0,\tau _1)\). \(\square \)

By combining Lemmas 3.1 and 3.2, and inequalities (12) and (13), we obtain:

Corollary 3.1

For every \(t \in (0,\tau _1)\), we have

$$\begin{aligned}&\left\| I_z(t)+J_z(t)\right\| \le t^{\alpha +1}\,\left[ \dfrac{1-H(t)}{H(t)} \right] \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)}, \\&\quad \left\| \big (\nabla \phi (x(t)) - \nabla \phi (z)\big ) + \beta \nabla ^2\phi (x(t))\dot{x}(t)\right\| \le \left[ \dfrac{Lt^2}{2}+\beta Lt\right] \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)H(t)}. \end{aligned}$$

We highlight the fact that the bound above depends on z only via the factor \(\left\| \nabla \phi (z)\right\| \).

3.2 Estimates for the Restarting Time

We begin by finding a lower bound for the restarting time, depending on the parameters \(\alpha \), \(\beta \) and L, but not on the initial condition z.

Lemma 3.3

Let \(z\notin {{\,\textrm{argmin}\,}}(\phi )\), and let x be the solution of (DIN-AVD) with initial conditions \(x(0)=z\) and \(\dot{x}(0)=0\). For every \(t\in (0,\tau _1)\), we have

$$\begin{aligned} \langle \dot{x}(t), \ddot{x}(t) \rangle \ge \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2H(t)^2}\left( 1-\dfrac{(2\alpha +3)\beta Lt}{(\alpha +2)}-\dfrac{(\alpha +2)Lt^2}{(\alpha +3)}\right) . \end{aligned}$$

Proof

From (8) and (10), we know that

$$\begin{aligned} \dot{x}(t)= -\dfrac{1}{t^\alpha }\big (I_z(t) +J_z(t) \big ) - \dfrac{t}{\alpha +1}\nabla \phi (z). \end{aligned}$$
(18)

On the other hand,

$$\begin{aligned} \dfrac{d}{dt}\left[ \dfrac{1}{t^\alpha }\big (I_z(t) +J_z(t) \big ) \right]&= -\dfrac{\alpha }{t^{\alpha +1}}\big (I_z(t) +J_z(t) \big ) +\left( \nabla \phi \big (x(t)\big ) - \nabla \phi (z)\right) \\&\quad + \beta \nabla ^2 \phi (x(t))\dot{x}(t). \end{aligned}$$

Then,

$$\begin{aligned} \ddot{x}(t)&= \dfrac{\alpha }{t^{\alpha +1}} \big (I_z(t)+J_z(t)\big ) -(\nabla \phi (x(t)) - \nabla \phi (z)) - \beta \nabla ^2\phi (x(t))\dot{x}(t) \\&\quad - \dfrac{1}{\alpha +1}\nabla \phi (z) \\&= A(t)-B(t), \end{aligned}$$

where

$$\begin{aligned} A(t)=\dfrac{\alpha }{t^{\alpha +1}} \big (I_z(t)+J_z(t)\big )- \dfrac{1}{\alpha +1}\nabla \phi (z), \end{aligned}$$

and

$$\begin{aligned} B(t)=\big (\nabla \phi (x(t)) - \nabla \phi (z)\big ) + \beta \nabla ^2\phi (x(t))\dot{x}(t). \end{aligned}$$

With this notation, we have

$$\begin{aligned} \langle \dot{x}(t), \ddot{x}(t) \rangle = \langle \dot{x}(t), A(t) \rangle -\langle \dot{x}(t), B(t) \rangle \ge \langle \dot{x}(t), A(t) \rangle -\Vert \dot{x}(t)\Vert \,\Vert B(t)\Vert . \end{aligned}$$

For the first term, we do as follows:

$$\begin{aligned} \langle \dot{x}(t), A(t) \rangle&= -\left\langle \dfrac{1}{t^\alpha }\big (I_z(t) +J_z(t) \big ) + \dfrac{t}{\alpha +1}\nabla \phi (z), \dfrac{\alpha }{t^{\alpha +1}} \big (I_z(t)+J_z(t)\big )\right. \\&\quad \left. - \dfrac{1}{\alpha +1}\nabla \phi (z) \right\rangle \\&\ge \frac{t}{(\alpha +1)^2}\Vert \nabla \phi (z)\Vert ^2 -\dfrac{\alpha }{t^{2\alpha +1}}\Vert I_z(t)+J_z(t)\Vert ^2 \\&\quad -\dfrac{(\alpha -1)}{t^{\alpha }(\alpha +1)}\Vert \nabla \phi (z)\Vert \,\Vert I_z(t)+J_z(t)\Vert \\&\ge \frac{t}{(\alpha +1)^2}\Vert \nabla \phi (z)\Vert ^2 -\dfrac{\alpha t}{(\alpha +1)^2}\left[ \dfrac{1-H(t)}{H(t)} \right] ^2\left\| \nabla \phi (z)\right\| ^2 \\&\quad -\dfrac{(\alpha -1)t}{(\alpha +1)^2}\left[ \dfrac{1-H(t)}{H(t)} \right] \Vert \nabla \phi (z)\Vert ^2 \\&= \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2}\left( 1-\alpha \left[ \dfrac{1-H(t)}{H(t)} \right] ^2 - (\alpha -1)\left[ \dfrac{1-H(t)}{H(t)} \right] \right) \\&= \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2H(t)^2}\left( H(t)^2-\alpha \big (1-H(t)\big )^2 - (\alpha -1)H(t)\big (1-H(t)\big )\right) \\&= \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2H(t)^2}\big ((\alpha +1)H(t)-\alpha \big ), \end{aligned}$$

where we have used the Cauchy–Schwarz inequality and Corollary 3.1. For the second term, we first use (18) and observe that

$$\begin{aligned} \Vert \dot{x}(t)\Vert \le \dfrac{1}{t^\alpha }\Vert I_z(t)+J_z(t)\Vert +\dfrac{t}{(\alpha +1)}\Vert \nabla \phi (z)\Vert \le \dfrac{t\,\Vert \nabla \phi (z)\Vert }{(\alpha +1)H(t)}, \end{aligned}$$

and

$$\begin{aligned} B(t)\le \left[ \dfrac{Lt^2}{2}+\beta Lt\right] \dfrac{\Vert \nabla \phi (z)\Vert }{(\alpha +1)H(t)}, \end{aligned}$$

by Corollary 3.1. We conclude that

$$\begin{aligned} \langle \dot{x}(t), \ddot{x}(t) \rangle&\ge \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2H(t)^2}\left( (\alpha +1)H(t)-\alpha -\dfrac{Lt^2}{2}-\beta Lt\right) \\&= \frac{t\,\Vert \nabla \phi (z)\Vert ^2}{(\alpha +1)^2H(t)^2}\left( 1-\dfrac{(2\alpha +3)\beta Lt}{(\alpha +2)}-\dfrac{(\alpha +2)Lt^2}{(\alpha +3)}\right) , \end{aligned}$$

as stated. \(\square \)

The function G, defined by

$$\begin{aligned} G(t)=1-\dfrac{(2\alpha +3)\beta Lt}{(\alpha +2)}-\dfrac{(\alpha +2)Lt^2}{(\alpha +3)} =(\alpha +1)H(t)-\alpha -\dfrac{Lt^2}{2}-\beta Lt, \nonumber \\ \end{aligned}$$
(19)

does not depend on the initial condition z. Its unique positive zero is

$$\begin{aligned} \tau _3=- \dfrac{(\alpha +3)(2\alpha +3)}{2(\alpha +2)^2}\beta +\sqrt{\dfrac{(\alpha +3)^2(2\alpha +3)^2}{4(\alpha +2)^4}\beta ^2 + \dfrac{(\alpha +3)}{(\alpha +2)L}}. \end{aligned}$$
(20)

In view of the definition of the restarting time, an immediate consequence of Lemma 3.3 is

Corollary 3.2

Let \(T_*=\inf \big \{T(z):z\notin {{\,\textrm{argmin}\,}}(\phi )\big \}\). Then, \(\tau _3\le T_*\).

Remark 3.1

If \(\beta =0\), then

$$\begin{aligned} \tau _3=\sqrt{\dfrac{(\alpha +3)}{(\alpha +2)L}}. \end{aligned}$$

The case \(\alpha =3\) and \(\beta =0\) was studied in [31], and the authors provided \(\frac{4}{5\sqrt{L}}\) as a lower bound for the restart. The arguments presented here yield a higher bound, since

$$\begin{aligned} \tau _3=\sqrt{\dfrac{6}{5L}}> \dfrac{1}{\sqrt{L}} > \frac{4}{5\sqrt{L}}. \end{aligned}$$

Recall that the function H given in (14) decreases from 1 to 0 on \([0,\tau _1]\). Therefore, \(H(t)>\frac{1}{2}\) for all \(t\in [0,\tau _2)\), where

$$\begin{aligned} \tau _2=H^{-1}\left( \frac{1}{2}\right) =-\left( \dfrac{ \alpha +3}{\alpha +2}\right) \beta + \sqrt{\left( \dfrac{\alpha +3}{\alpha +2} \right) ^2 \beta ^2 + \dfrac{\alpha +3}{L}}<\tau _1. \end{aligned}$$
(21)

Evaluating the right-hand side of (19), we see that

$$\begin{aligned} G(\tau _2)=\frac{(1-\alpha )-L\tau _2^2-2\beta L\tau _2}{2}<0, \end{aligned}$$

whence

$$\begin{aligned} \tau _1>\tau _2>\tau _3>0. \end{aligned}$$
(22)

These facts will be useful to provide an upper bound for the restarting time.

Proposition 3.1

Let \(z\notin {{\,\textrm{argmin}\,}}(\phi )\), and let x be the solution of (DIN-AVD) with initial conditions \(x(0)=z\) and \(\dot{x}(0)=0\). Let \(\phi \) satisfy (6) with \(\mu >0\). For each \(\tau \in (0,\tau _2)\cap (0,T(z)]\), we have

$$\begin{aligned} T(z) \le \tau \exp \left[ \dfrac{(\alpha +1)^2}{2\alpha \mu \tau ^2\varPsi (\tau )} \right] , \quad \hbox {where}\quad \varPsi (\tau )=\left[ 2-\frac{1}{H(\tau )}\right] ^2. \end{aligned}$$

Proof

In view of (8) and (10), we can use Corollary 3.1 to obtain

$$\begin{aligned} \left\| \dot{x}(\tau ) + \frac{\tau }{\alpha +1}\nabla \phi (z)\right\| = \dfrac{1}{\tau ^\alpha } \left\| I(\tau ) + J(\tau ) \right\| \le \tau \,\left[ \dfrac{1}{H(\tau )}-1 \right] \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)}. \end{aligned}$$

From the (reverse) triangle inequality and the definition of H, it ensues that

$$\begin{aligned} \left\| \dot{x}(\tau )\right\| \ge \dfrac{\tau \left\| \nabla \phi (z)\right\| }{\alpha +1}-\tau \,\left[ \dfrac{1}{H(\tau )}-1 \right] \dfrac{\left\| \nabla \phi (z)\right\| }{(\alpha +1)} =\tau \left[ 2-\frac{1}{H(\tau )}\right] \dfrac{\left\| \nabla \phi (z)\right\| }{\alpha +1}, \end{aligned}$$
(23)

which is positive, because \(\tau \in (0,\tau _2)\). Now, take \(t \in [\tau ,T(z)]\). Since \(\left\| \dot{x}(t)\right\| ^2\) increases in \(\big [0,T(z)\big ]\), Remark 2.1 gives

$$\begin{aligned} \dfrac{d}{dt}\phi \big (x(t)\big ) \le -\dfrac{\alpha }{t}\left\| \dot{x}(t)\right\| ^2 \le -\dfrac{\alpha }{t}\left\| \dot{x}(\tau )\right\| ^2 \le -\dfrac{1}{t}\left[ \dfrac{\alpha \tau ^2\varPsi (\tau )\left\| \nabla \phi (z)\right\| ^2}{(\alpha +1)^2}\right] . \end{aligned}$$

Integrating over \([\tau ,T(z)]\), we get

$$\begin{aligned} \phi \big (x(T(z))\big ) - \phi \big (x(\tau )\big ) \le -\left[ \dfrac{\alpha \tau ^2\varPsi (\tau )\left\| \nabla \phi (z)\right\| ^2}{(\alpha +1)^2}\right] \ln \left[ \dfrac{T(z)}{\tau } \right] . \end{aligned}$$
(24)

It follows that

$$\begin{aligned} \left[ \dfrac{\alpha \tau ^2\varPsi (\tau )\left\| \nabla \phi (z)\right\| ^2}{(\alpha +1)^2}\right] \ln \left[ \dfrac{T(z)}{\tau } \right]&\le \phi \big (x(\tau )\big )- \phi \big (x(T(z))\big ) \le \phi (z)-\phi ^* \\&\le \dfrac{\left\| \nabla \phi (z)\right\| ^2}{2\mu }, \end{aligned}$$

in view of (6). It suffices to rearrange the terms to conclude. \(\square \)

Corollary 3.3

Let \(\phi \) satisfy (6) with \(\mu >0\), and let \(\tau _*\in (0,\tau _2)\cap (0,T_*]\). Then,

$$\begin{aligned} \sup \big \{T(z):z\notin {{\,\textrm{argmin}\,}}(\phi )\big \} \le \tau _* \exp \left[ \dfrac{(\alpha +1)^2}{2\alpha \mu \tau _*^2\varPsi (\tau _*)} \right] . \end{aligned}$$

4 Function Value Decrease and Proof of Theorem 2.2

The next result provides the ratio at which the function values have been reduced by the time the trajectory is restarted.

Proposition 4.1

Let \(z\notin {{\,\textrm{argmin}\,}}(\phi )\), and let x be the solution of (DIN-AVD) with initial conditions \(x(0)=z\) and \(\dot{x}(0)=0\). Let \(\phi \) satisfy (6) with \(\mu >0\). For each \(\tau \in (0,\tau _2)\cap (0,T(z)]\), we have

$$\begin{aligned} \phi \big (x(t)\big ) - \phi ^* \le \left[ 1-\dfrac{\alpha \mu \tau ^2\varPsi (\tau )}{(\alpha +1)^2}\right] \big (\phi (z) - \phi ^*\big ) \end{aligned}$$

for every \(t\in [\tau ,T(z)]\).

Proof

Take \(s\in (0,\tau )\). By combining Remark 2.1 with (23), we obtain

$$\begin{aligned} \dfrac{d}{ds}\phi (x(s)) \le - \dfrac{\alpha }{s}\left\| \dot{x}(s)\right\| ^2 \le - \dfrac{\alpha s\left\| \nabla \phi (z)\right\| ^2}{(\alpha +1)^2}\left[ 2-\frac{1}{H(s)}\right] ^2 \le - \dfrac{\alpha s\left\| \nabla \phi (z)\right\| ^2}{(\alpha +1)^2}\varPsi (\tau ) \end{aligned}$$

because H decreases in \((0,\tau _1)\), which contains \((0,\tau )\). Integrating on \((0,\tau )\) and using (6), we obtain

$$\begin{aligned} \phi \big (x(\tau )\big )-\phi ^*&\le \phi (z)-\phi ^*-\dfrac{\alpha \tau ^2 \varPsi (\tau ) \left\| \nabla \phi (z)\right\| ^2}{2(\alpha +1)^2}\\&\le \left[ 1-\dfrac{\alpha \mu \tau ^2\varPsi (\tau )}{(\alpha +1)^2}\right] \big (\phi (z)-\phi ^*\big ). \end{aligned}$$

To conclude, it suffices to observe that \(\phi \big (x(t)\big )\le \phi \big (x(\tau )\big )\) in view of Remark 2.1. \(\square \)

Remark 4.1

Since \(\varPsi \) is decreasing in \([0,\tau _2)\), we have \(\varPsi (t)\ge \varPsi (\tau _*)>0\), whenever \(0\le t\le \tau _* < \tau _2\). Moreover, in view of (22) and Corollary 3.2, we can take \(\tau _*=\tau _3\) to obtain a lower bound. If \(\beta =0\), we obtain

$$\begin{aligned} \varPsi (t)\ge \varPsi (\tau _3)&=\left[ 2-\frac{1}{H(\tau _3)}\right] ^2=\left[ 2-\frac{1}{1-\frac{1}{2(\alpha +2)}}\right] ^2=\left[ 2-\frac{2\alpha +4}{2\alpha +3}\right] ^2\\&=\left[ \dfrac{2\alpha +2}{2\alpha +3}\right] ^2, \end{aligned}$$

which is independent of L. As a consequence, the inequality in Proposition 4.1 becomes

$$\begin{aligned} \phi \big (x(t)\big )-\phi ^*\le \left( 1 - \frac{4\alpha (\alpha +3)}{(\alpha +2)(2\alpha +3)^2}\dfrac{\mu }{L}\right) (\phi (x_0) - \phi ^*). \end{aligned}$$

For \(\alpha =3\), this gives

$$\begin{aligned} \phi \big (x(t)\big )-\phi ^*\le \left( 1 - \frac{8}{45}\dfrac{\mu }{L}\right) (\phi (x_0) - \phi ^*). \end{aligned}$$

For this particular case, a similar result is obtained in [31] for strongly convex functions, namely

$$\begin{aligned} \phi \big (x(t)\big )-\phi ^*\le \left( 1 - \frac{3}{25} \left( \frac{67}{71}\right) ^2\dfrac{\mu }{L}\right) (\phi (x_0) - \phi ^*). \end{aligned}$$

Our constant is approximately 66.37% larger than the one from [31], which implies a greater reduction in the function values each time the trajectory is restarted. On the other hand, if \(\beta >0\), we can still obtain a slightly smaller lower bound, namely \(\varPsi (\tau _3)>\left( \dfrac{2\alpha +1}{2\alpha +2}\right) ^2\), independent from \(\beta \) and L. The proof is technical and will be omitted.

Proof of Theorem 2.2

Adopt the notation in Definition 2.1, take any \(\tau _*\in (0,\tau _2)\cap (0,T_*]\), and set

$$\begin{aligned} \tau ^*=\tau _* \exp \left[ \dfrac{(\alpha +1)^2}{2\alpha \mu \tau _*^2\varPsi (\tau _*)} \right] , \quad \hbox {where}\quad \varPsi (\tau _*)=\left[ 2-\frac{1}{H(\tau _*)}\right] ^2. \end{aligned}$$

In view of Corollaries 3.2 and 3.3, we have

$$\begin{aligned} \tau _*\le T(x_i)\le \tau ^* \end{aligned}$$

for all \(i\ge 0\) (we assume \(x_i\notin {{\,\textrm{argmin}\,}}(\phi )\) since the result is trivial otherwise). Given \(t>0\), let m be the largest positive integer such that \(m\tau ^*\le t\). By time t, the trajectory will have been restarted at least m times. By Proposition 2.1, we know that

$$\begin{aligned} \phi \big (\chi _{x_0}(t)\big )\le \phi \big (\chi _{x_0}(m\tau ^*)\big ) \le \phi \big (\chi _{x_0}(m\tau _*)\big ). \end{aligned}$$

We may now apply Proposition 4.1 repeatedly to deduce that

$$\begin{aligned} \phi \big (\chi _{x_0}(t)\big )-\phi ^* \le Q^{m}\big (\phi (x_0) - \phi ^*\big )\quad \hbox {where}\quad Q=\left[ 1-\dfrac{\alpha \mu \tau _*^2\varPsi (\tau _*)}{(\alpha +1)^2}\right] <1. \end{aligned}$$

By definition, \((m+1)\tau ^*>t\), which entails \(m>\frac{t}{\tau ^*}-1\). Since \(Q\in (0,1)\), we have

$$\begin{aligned} Q^m \le Q^{\frac{t}{\tau ^*}-1}=\dfrac{1}{Q}\exp \left( \dfrac{\ln (Q)}{\tau ^*}\,t\right) , \end{aligned}$$

and the result is established, with \(C=Q^{-1}\) and \(K=-\frac{1}{\tau ^*}\ln (Q)\). The proof is finished due to the fact that \(\phi (u)\le \phi ^*+\frac{L}{2}\Vert u-u^*\Vert ^2\) for every \(u^*\in {{\,\textrm{argmin}\,}}(\phi )\).\(\square \)

The convergence rate given in Theorem 2.2 holds for C and K of the form

$$\begin{aligned} C=C(\tau _*)=\left[ 1-\dfrac{\alpha \mu \tau _*^2\varPsi (\tau _*)}{(\alpha +1)^2}\right] ^{-1} \end{aligned}$$

and

$$\begin{aligned} K=K(\tau _*)&=-\frac{1}{\tau _*}\exp \left[ -\dfrac{(\alpha +1)^2}{2\alpha \mu \tau _*^2\varPsi (\tau _*)} \right] \ln \left[ 1-\dfrac{\alpha \mu \tau _*^2\varPsi (\tau _*)}{(\alpha +1)^2}\right] \\&> \dfrac{\alpha \mu \tau _*\varPsi (\tau _*)}{(\alpha +1)^2}\exp \left[ -\dfrac{(\alpha +1)^2}{2\alpha \mu \tau _*^2\varPsi (\tau _*)} \right] , \end{aligned}$$

for any \(\tau _*\in (0,\tau _2)\cap (0,T_*]\). In view of (22) and Corollary 3.2, \(\tau _*=\tau _3\) is a valid choice. On the other hand, the function \(K(\cdot )\) vanishes at \(\tau \in \{0,\tau _2\}\) and is positive on \((0,\tau _2)\). By continuity, it attains its maximum at some \(\hat{\tau }_*\in (0,\tau _2)\cap (0,T_*]\). Therefore, \(K(\hat{\tau }_*)\) yields the fastest convergence rate prediction in this framework.

Remark 4.2

It is possible to implement a fixed restart scheme. To this end, we modify Definition 2.1 by setting \(T_i\equiv \tau \), with any \(\tau \in (0,\tau _2)\cap (0,T_*]\), such as \(\hat{\tau }_*\) or \(\tau _3\), for example. In theory, \(\hat{\tau }_*\) gives the same convergence rate as the original restart scheme presented throughout this work. From a practical perspective, though, restarting the dynamics too soon may result in a poorer performance. Therefore, finding larger values of \(\hat{\tau }_*\) and \(\tau _3\) is crucial to implement a fixed restart (see Remarks 3.1 and 4.1).

5 Numerical Illustration

In this section, we provide a very simple numerical example to illustrate how the convergence is improved by the restarting scheme. A more thorough numerical analysis will be carried out in a forthcoming paper, where implementable optimization algorithms will be analyzed.

5.1 Example 1.2 Revisited

We consider the quadratic function \(\phi :{\mathbb R}^3\rightarrow {\mathbb R}\), defined in Example 1.1 by (2), with \(\rho = 10\). We set \(\alpha =3.1\) and \(\beta =0.25\) and compute the solutions of (AVD) and (DIN-AVD), starting from \(x(1)=x_1=(1,1,1)\) and zero initial velocity, with and without restarting, using the Python tool odeint from the scipy package. Figure 4 shows a comparison of the values along the trajectory with and without restarting, first for (AVD), and then, for (DIN-AVD). We observe the monotonic behavior established in Proposition 2.1, as well as a faster linear convergence rate. We shall provide more quantitative insight in a moment.

Fig. 4
figure 4

Values along the trajectory, with (red) and without (blue) restarting, for (DIN-AVD)

However, one can do better. As mentioned earlier, restarting schemes based on function values are effective from a practical perspective, but show an erratic behavior as the trajectory approaches a minimizer. It seems natural as a heuristic to use the first (or n-th) function-value restart point as a warm start, and then apply speed restarts, for which we have obtained convergence rate guarantees. Although the velocity must be set to zero after each restart, there are no constraints on the initial velocity used to compute the warm starting point. The results are shown in Fig. 5, with initial velocities \(\dot{x}(1)=0\) and \(\dot{x}(1)=-\beta \nabla \phi (x_1)\), respectively.

Fig. 5
figure 5

Top: Values along the trajectory, with warm start, for (AVD) (blue) and (DIN-AVD) (red), with \(\dot{x}(1)=0\) (left) and \(\dot{x}(1)=-\beta \nabla \phi (x_1)\) (right). Bottom: Includes trajectories without restarting, for reference

A linear regression after the first restart provides estimations for the linear convergence rate of the function values along the corresponding trajectories, when modeled as \(\phi \big (\chi (t)\big )\sim Ae^{-Bt}\), with \(A,B>0\). The results are displayed in Table 1. The absolute value of the exponent B in the linear convergence rate is increased by 34.67% in the case \(\dot{x}(1)=0\), and by 39.86% in the case \(\dot{x}(1)=-\beta \nabla \phi (x_1)\). Also, the minimum values for the methods presented in Fig. 5 can be analyzed. The last and best function values on [1, 25] are displayed in Table 2. In all cases, the best value without restart is approximately \(10^4\) times larger than the one obtained with our policy. We also observe similar final values for the restarted trajectories despite the different initial velocities.

Table 1 Coefficients in the linear regression, when approximating \(\phi \big (\chi (t)\big )\sim Ae^{-Bt}\)
Table 2 Values reached for \(\phi \) at \(t=25\)

5.2 A First Exploration of the Algorithmic Consequences

Different discretizations of (DIN-AVD) can be used to design implementable algorithms and generate minimizing sequences for \(\phi \), which hopefully will share the stable behavior we observe in the solutions of (DIN-AVD). Three such algorithms were first proposed in [7], for which we implemented a speed restart scheme, analogue to the one we have used for the solutions of (DIN-AVD). Since we obtained very similar results and the numerical analysis of algorithms is not the focus of this paper, we describe only the simplest one in detail and present the numerical results for that one. As in [31], a parameter \(k_{\min }\) is introduced, to avoid two consecutive restarts to be too close.

Algorithm 1
figure a

Inertial Gradient Algorithm with Hessian Damping (IGAHD) - Speed Restart version

Example 5.1

We begin by applying Algorithm 1, as well as the variation with the warm start, to the function \(\phi :{\mathbb R}^3 \mapsto {\mathbb R}\) in Examples 1.1 and 1.2, with the parameters \(k_{\min }=10\), \(\beta =h=1/\sqrt{L}\) and \(\alpha =3.1\). Figure 6 shows the evolution of the function values along the iterations. The coefficients in the approximation \(\phi (x_k) \sim Ae^{-Bt}\), with \(A,B >0\), obtained for each algorithm, are detailed in Table 3. As one would expect, the value of B is similar and that of A is significantly lower. Also, Table 4 shows the values obtained along 1000 iterations. The best value without restart is \(10^5\) times larger than the one obtained with our policy.

Fig. 6
figure 6

Function values along iterations of Algorithm 1 without (left) and with (right) warm start

Table 3 Coefficients in the linear regression \(\phi (x_k) \sim Ae^{-Bt}\) for Example 5.1
Table 4 Functions values for Example 5.1

Example 5.2

Given a positive definite symmetric matrix A of size \(n\times n\), and a vector \(b \in {\mathbb R}^n\), define \(\phi :{\mathbb R}^n \mapsto {\mathbb R}\) by

$$\begin{aligned} \phi (x)=\dfrac{1}{2}x^T A x + b^Tx. \end{aligned}$$

For the experiment, \(n=500\), A is randomly generated with eigenvalues in (0, 1), and b is also chosen at random. We first compute L, and set \(k_{\min }=10\), \(h=1/\sqrt{L}\), \(\alpha =3.1\) and \(\beta =h\). The initial points \(x_0=x_1\) are generated randomly as well. Figure 7 shows the comparison for Algorithm 1 and a variation of it giving a warm start as the one described in the continuous setting. That is, to restart the first time when the function increases instead of decrease, and then, performing the speed restart detailed on Algorithm 1. It can be seen that the restart scheme stabilizes and accelerates the convergence in both cases. The coefficients obtained for each algorithm in the approximation \(\phi (x_k) \sim Ae^{-Bt}\), with \(A,B >0\), are presented in Table 5. Also, Table 6 shows the value gaps obtained along 1800 iterations. The best value without restart is more than \(10^4\) times larger than the one obtained with restart.

Fig. 7
figure 7

Function values along iterations of Algorithm 1 without (left) and with (right) warm start

Table 5 Coefficients in the linear regression \(\phi (x_k) \sim Ae^{-Bt}\) for Example 5.2
Table 6 Function values for Example 5.2

6 Conclusions

We have proposed and analyzed a speed restarting scheme for the inertial dynamics with Hessian-driven damping (DIN-AVD), introduced in [5]. We have established a linear convergence rate for the function values, which decrease monotonically to the minimum along the restarted trajectories, when the function \(\phi \) has quadratic growth, for every value of \(\alpha >0\) and \(\beta \ge 0\). As a byproduct, we improve and extend the results of Su, Boyd and Candès [31], obtained in the strongly convex case for \(\alpha =3\) and \(\beta =0\).

Our numerical experiments suggest that the Hessian-driven damping and the restarting scheme produce a better improvement in the performance of the dynamics and related iterative algorithms—for the purpose of approximating the minimum of \(\phi \)—when used together, compared to using either technique separately.