1 Introduction

In the Hilbert setting H, where \(\langle \cdot , \cdot \rangle \) denotes the inner product and the norm is defined as usual \( \Vert \cdot \Vert = \sqrt{\langle \cdot , \cdot \rangle } \), we will study the convergence properties of the following second order in time differential equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \beta \frac{d}{dt} \nabla \Phi _{\lambda (t)}(x(t)) + b(t) \nabla \Phi _{\lambda (t)}(x(t)) + \varepsilon (t) x(t) = 0 \text { for } t \ge t_0, \nonumber \\ \end{aligned}$$
(1)

with initial conditions \(x(t_0) = x_0 \in H\), \(\dot{x}(t_0) = \dot{x}_0 \in H\), where \( \alpha , \beta \text { and } t_0 > 0 \), \(\lambda : [t_0, +\infty ) \mapsto \mathbb {R}_+\) and \(b: [t_0, +\infty ) \mapsto \mathbb {R}_+\) are non-negative, non-decreasing and differentiable, \( \Phi : H \mapsto \overline{\mathbb {R}} = \mathbb {R} \cup \{ \pm \infty \} \) is a proper, convex and lower semicontinuous function and \(\Phi _\lambda \) is its Moreau envelope of the index \(\lambda > 0\) and the function \(\varepsilon : [t_0, +\infty ) \mapsto \mathbb {R}_+\) is continuously differentiable and non-increasing with the property \(\lim _{t \rightarrow +\infty } \varepsilon (t) = 0\). In addition, we assume that \(\mathop {\textrm{argmin}}\limits \Phi \), which is the set of global minimizers of \(\Phi \), is not empty and denote by \(\Phi ^*\) the optimal objective value of \(\Phi \). The system (1) has a connection to the minimization problem

$$\begin{aligned} \min _{x \in H} \Phi (x) \end{aligned}$$

of a proper, convex and lower semicontinuous function \(\Phi \). Studying such systems provides better understanding of their discrete counterpart—optimization algorithms, since there is a strong connection between them, and the question of transitioning from one to another attracts a lot of attention in the modern literature.

One of the main goals of this research is to improve (compared to [23]) the fast rates of convergence for the Moreau envelope of the objective function and the objective function itself to \(\Phi ^*\), as well as for the gradient of the Moreau envelope of the objective function in terms of the Moreau parameter function \(\lambda \) and the time scaling function b. Moreover, we also deduce the strong convergence of the trajectory of the dynamics to the minimal norm element of \(\mathop {\textrm{argmin}}\limits \Phi \). We introduce two settings with different assumptions for each result. To conclude we provide multiple numerical results in order to illustrate our theoretical discoveries.

1.1 Nonsmooth Optimization with Time Scaling

In the smooth setting the pioneering research in studying second order dynamical systems was conducted by Su–Boyd–Candes [30] for the sake of obtaining faster asymptotic convergence for convex functions. They managed to deduce the rates of convergence of the function values being of the order \(\frac{1}{t^2}\). Later Attouch–Peypouquet–Redont [20] also established the weak (and in some particular cases the strong) convergence of the trajectories to a minimizer of the objective function. In [19] the same authors continued the development in this direction by adding Hessian-driven damping term in order to obtain the rates for the gradient of the objective function and to eliminate any possible oscillations in the dynamical behaviour of the trajectories.

Concerning the nonsmooth setting we must point out that the Moreau envelope of a proper, convex and lower semicontinuous function \(\Phi : H \rightarrow \overline{\mathbb {R}}\) proved to be of a significant importance in designing continuous-time approaches and numerical algorithms for the minimization of nonsmooth functions. The rigorous definition of this construction is

$$\begin{aligned} \Phi _\lambda : H \rightarrow \mathbb {R}, \quad \Phi _{\lambda } (x) \ = \ \inf _{y \in H} \left\{ \Phi (y) + \frac{1}{2 \lambda } \Vert x - y \Vert ^2 \right\} , \end{aligned}$$

where \(\lambda > 0\) is the parameter of the Moreau envelope (see, for instance, [21]). One of the most important properties of Moreau approximation is that for every \(\lambda > 0\), the functions \(\Phi \) and \(\Phi _{\lambda }\) share the same optimal objective value and also the same set of minimizers. Moreover, \(\Phi _\lambda \) is convex and continuously differentiable with

$$\begin{aligned} \nabla \Phi _{\lambda } (x) = \frac{1}{\lambda } ( x - \mathop {\textrm{prox}}\limits \nolimits _{\lambda \Phi } (x)) \quad \forall x \in H, \end{aligned}$$
(2)

and \(\nabla \Phi _\lambda \) is \(\frac{1}{\lambda }\)-Lipschitz continuous, where

$$\begin{aligned} \mathop {\textrm{prox}}\limits \nolimits _{\lambda \Phi }: H \rightarrow H, \quad \mathop {\textrm{prox}}\limits \nolimits _{\lambda \Phi } (x) = \mathop {\textrm{argmin}}\limits _{y \in H} \left\{ \Phi (y) + \frac{1}{2 \lambda } \Vert x - y \Vert ^2 \right\} , \end{aligned}$$

denotes the proximal operator of \(\Phi \) of parameter \(\lambda \). The last fact we would like to mention is that for every \(x \in H\), the function \(\lambda \in (0, +\infty ) \rightarrow \Phi _\lambda (x)\) is nonincreasing and differentiable (see [14], Lemma A1), namely,

$$\begin{aligned} \frac{d}{d \lambda } \Phi _{\lambda } (x) = -\frac{1}{2} \Vert \nabla \Phi _{\lambda } (x) \Vert ^2 \quad \forall \lambda > 0. \end{aligned}$$
(3)

Our research is a logical continuation of the one conducted in [24], where authors applied the time rescaling technique to a nonsmooth optimization problem (for more information on time scaling see also [5, 10, 11, 13]). They considered the following system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \beta (t) \frac{d}{dt} \nabla \Phi _{\lambda (t)}(x(t)) + b(t) \nabla \Phi _{\lambda (t)}(x(t)) = 0, \end{aligned}$$
(4)

where \(\alpha \ge 1 \), \(t_0 > 0\), and \(\beta : [t_0, +\infty ) \mapsto [0, +\infty )\) and \( b, \lambda : [t_0, +\infty ) \mapsto (0, +\infty ) \) are differentiable functions. On the one hand, the presence of the Hessian damping term is believed to help reducing the oscillations in the dynamical behaviour and provides the rates for the gradient of the objective function \(\Phi \). On the other hand, the time-scaling technique (which is considered to be an artificial way to speed up the convergence of values) affects the convergence rates while bringing more restrictions to the analysis. The following properties were established

$$\begin{aligned} \Phi _{\lambda (t)}(x(t)) - \Phi ^* = o\left( \frac{1}{t^2 b(t)} \right) \text { and } \Vert \dot{x}(t) \Vert = o\left( \frac{1}{t} \right) \text { as } t \rightarrow +\infty , \end{aligned}$$

from where through proximal mapping the convergence rates for the objective function \(\Phi \) itself along the trajectory were obtained

$$\begin{aligned} \Phi \big ( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \big ) - \Phi ^*= & o\left( \frac{1}{t^2 b(t)} \right) \text { and } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \\ = & o\left( \frac{\sqrt{\lambda (t)}}{t \sqrt{b(t)}} \right) \text { as } t \rightarrow +\infty . \end{aligned}$$

Note that by taking \(b(\cdot ) \equiv 1\) we arrive at the well-known convergence rate of the values being of the order \(o\left( \frac{1}{t^2} \right) \). In addition, the following rates for the gradient of the Moreau envelope were deduced

$$\begin{aligned} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ = \ o \left( \frac{1}{t \sqrt{b(t) \lambda (t)}} \right) , \text { as } t \rightarrow +\infty . \end{aligned}$$

Finally, the weak convergence of the trajectories x(t) to a minimizer of \(\Phi \) as \(t \rightarrow +\infty \) was obtained.

In our analysis we borrow some ideas of [24] and develop them further in order to fit the new setting, namely, to adapt to a presence of the whole new term—Tikhonov regularization. The analysis becomes more involved and technical, some fundamental properties of Tikhonov regularization had to be proved for a nonsmooth setting. Its presence affects the set of conditions, which we have to impose on the system parameters: even though some of the conditions are formulated in the same spirit as in [24] (for instance, (11) and (14)), the other ones are completely new due to the presence of the Tikhonov term. Moreover, depending on how fast \(\varepsilon \) decays, two different setting arise providing different fundamental results (Sects. 3 and 4).

1.2 Tikhonov Regularization

It turned out that having additional term with specific properties in a system equation leads to improving the weak convergence of the trajectories to a minimizer of the objective function \(\Phi \) to a strong one to the element of minimal norm of \(\mathop {\textrm{argmin}}\limits \Phi \). Such systems were studied, for instance, in [4, 6, 9, 12, 17, 23, 27]. The main goal of such a research is to show that these systems preserve all the typical properties of the second order in time dynamical system (fast convergence of the values, the rates for the gradient etc.) but moreover there is an improvement to the strong convergence of the trajectories to the minimal norm solution instead of a weak one to an arbitrary minimizer. One of the many examples of such systems is presented below (see [23])

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \beta \nabla ^2 \Phi (x(t)) \dot{x}(t) + \nabla \Phi (x(t)) + \varepsilon (t) x(t) = 0 \text { for } t \ge t_0, \end{aligned}$$

where \( \alpha \ge 3\), \(t_0 > 0\), \(\Phi : H \mapsto \mathbb {R}\) is twice continuously differentiable and convex and for the rest of the section the function \(\varepsilon : [t_0, +\infty ) \mapsto \mathbb {R}_+\) is continuously differentiable and non-increasing with the property \(\lim _{t \rightarrow +\infty } \varepsilon (t) = 0\). In that manuscript they provided two settings: one for the fast convergence of values obtaining

$$\begin{aligned} \Phi (x(t)) - \Phi ^* \ = \ o\left( \frac{1}{t^2} \right) , \text { as } t \rightarrow +\infty \end{aligned}$$

and the weak convergence of the trajectories to a minimizer of \(\Phi \) and another setting for the strong convergence of x to \(x^*\), as \(t \rightarrow +\infty \).

Another fine example is given in [4]:

$$\begin{aligned} \ddot{x}(t) + \alpha \sqrt{\varepsilon (t)} \dot{x}(t) + \nabla \Phi (x(t)) + \varepsilon (t) x(t) = 0 \text { for } t \ge t_0, \end{aligned}$$
(5)

where \( \alpha \), \( t_0 > 0 \) and \(\Phi : H \mapsto \mathbb {R}\) is continuously differentiable and convex. In that paper authors obtained the rates for the function values \( \Phi (x(t)) - \Phi ^* \), as well as for the quantity \( \Vert x(t) - x_{\varepsilon (t)} \Vert \), as \(t \rightarrow +\infty \), where \( x_{\varepsilon (t)} = \mathop {\textrm{argmin}}\limits _H \left( \Phi (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2} \right) \). Thus, they assured the strong convergence of the trajectories to the minimal norm solution \( x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0) \) under the appropriate assumptions and properly chosen energy functional, using the properties of Tikhonov regularization. The most important thing about this approach is that authors were able to establish fast convergence of values and strong convergence of the trajectories in the very same setting.

The next step was done in [6]:

$$\begin{aligned} \ddot{x}(t) + \alpha \sqrt{\varepsilon (t)} \dot{x}(t) + \beta \frac{d}{dt} \Big ( \nabla \varphi _t (x(t)) + (p - 1) \varepsilon (t) x(t) \Big ) + \nabla \varphi _t (x(t)) = 0 \text { for } t \ge t_0, \end{aligned}$$

where \(\varphi _t (x) = \Phi (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2}\), \( \Phi : H \mapsto \mathbb {R}\) is twice continuously differentiable and convex and \(p \in [0, 1]\). This system while preserving all the properties of (5), additionally provides the integral estimate for the norm of the gradient of \(\varphi _t\).

1.3 Our Contribution

In that paper we will develop the ideas presented in [23] to cover the nonsmooth case with time scaling. We will obtain the fast convergence of the function values (as well as for the gradient of the Moreau envelope of the objective fucntion \(\Phi \)) for the family of dynamical systems (1) governed by the Moreau envelope of the nonsmooth function \(\Phi \) and having the Tiknonov term in their formulation:

$$\begin{aligned} \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ = \ o \left( \frac{1}{t^2 b(t)} \right) \text { as } t \rightarrow +\infty ; \end{aligned}$$

in terms of the function itself:

$$\begin{aligned} \Phi (\mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t))) - \Phi ^* \ = \ o \left( \frac{1}{t^2 b(t)} \right) \text { as } t \rightarrow +\infty , \end{aligned}$$

where

$$\begin{aligned} \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ o \left( \frac{\sqrt{\lambda (t)}}{t \sqrt{b(t)}} \right) \text { as } t \rightarrow +\infty \end{aligned}$$

and finally

$$\begin{aligned} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ = \ o \left( \frac{1}{t \sqrt{b(t) \lambda (t)}} \right) \text { as } t \rightarrow +\infty . \end{aligned}$$

We will also deduce (under some appropriate conditions) the following result

$$\begin{aligned} \liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0, \end{aligned}$$

which under some restrictions will be improved to the full strong convergence of the trajectories of (1) to the minimal norm solution.

The paper is organized in the following way. Section 2 is devoted to some preliminary results, which we will need later. We will establish the fast rates of convergence of function values and its Moreau envelope, as well as the gradient of Moreau envelope along the trajectories of the dynamical system (Sect. 3). We will show that under some assumptions the strong convergence of the trajectories to the element of minimal norm from the set of all minimizers of the objective function takes place (Sect. 4). We will provide two settings for the polynomial choice of parameter functions to fulfill the assumptions made through the analysis (Sect. 5) and equip this manuscript with various numerical results (Sect. 6).

2 Preparatory Results

We start with the following lemma (see [21], Proposition 12.22, for the first term of the lemma and [18], Appendix, A1, for the second one).

Lemma 1

Let \(\Phi : H \mapsto \overline{\mathbb {R}}\) be a proper, convex and lower semicontinuous function, \(\lambda , \mu > 0\). Then

  1. 1.

    \((\Phi _\lambda )_\mu = \Phi _{\lambda + \mu }\).

  2. 2.

    \( \mathop {\textrm{prox}}\limits _{\mu \Phi _\lambda } = \frac{\lambda }{\lambda + \mu } \mathop {\textrm{Id}}\limits + \frac{\mu }{\lambda + \mu } \mathop {\textrm{prox}}\limits _{(\lambda + \mu )\Phi } \).

Let us mention two key properties of the Tikhonov regularization, which we will use later in the analysis (see, for instance, [2] or [21] Theorem 23.44 for its classic analogue). First let us introduce the strongly convex function \(\varphi _{\varepsilon (t), \lambda (t)}: H \mapsto \mathbb {R}\) as \(\varphi _{\varepsilon (t), \lambda (t)} (x) = \Phi _{\lambda (t)}(x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2}\) and denote the unique minimizer of \(\varphi _{\varepsilon (t), \lambda (t)}\) as \(x_{\varepsilon (t), \lambda (t)} = \mathop {\textrm{argmin}}\limits _{H} \varphi _{\varepsilon (t), \lambda (t)}\). Thus, the first order optimality condition reads as

$$\begin{aligned} \nabla \Phi _{\lambda (t)} (x_{\varepsilon (t), \lambda (t)}) + \varepsilon (t) x_{\varepsilon (t), \lambda (t)} = 0. \end{aligned}$$
(6)

Now we are ready to formulate the following result:

Lemma 2

Suppose that

$$\begin{aligned} \lim _{t \rightarrow +\infty } \lambda (t) \varepsilon (t) \ = \ 0. \end{aligned}$$
(7)

Then the following properties of the mapping \(t \mapsto x_{\varepsilon (t), \lambda (t)}\) are satisfied:

$$\begin{aligned} \text { for } x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0), \ \Vert x_{\varepsilon (t), \lambda (t)} \Vert \le \Vert x^* \Vert \text { for all } t \ge t_0 \end{aligned}$$
(8)

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert x_{\varepsilon (t), \lambda (t)} - x^* \Vert = 0. \end{aligned}$$
(9)

Proof

By the monotonicity of \(\nabla \Phi _\lambda \) we deduce

$$\begin{aligned} \left\langle \nabla \Phi _{\lambda (t)}(x_{\varepsilon (t), \lambda (t)}) - \nabla \Phi _{\lambda (t)}(x^*), x_{\varepsilon (t), \lambda (t)} - x^* \right\rangle \ \ge \ 0. \end{aligned}$$

By (6) we obtain

$$\begin{aligned} \left\langle - \varepsilon (t) x_{\varepsilon (t), \lambda (t)}, x_{\varepsilon (t), \lambda (t)} - x^* \right\rangle \ = \ \varepsilon (t) \left( - \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 + \left\langle x_{\varepsilon (t), \lambda (t)}, x^* \right\rangle \right) \ \ge \ 0. \end{aligned}$$

Using Cauchy–Schwarz inequality we derive

$$\begin{aligned} \Vert x_{\varepsilon (t), \lambda (t)} \Vert \ \le \ \Vert x^* \Vert . \end{aligned}$$

This proves the first claim. For the second one consider (6) again and note that it is equivalent to

$$\begin{aligned} x_{\varepsilon (t), \lambda (t)} = \mathop {\textrm{prox}}\limits \nolimits _{\frac{1}{\varepsilon (t)}\Phi _{\lambda (t)}} (0) = \frac{\mathop {\textrm{prox}}\limits \nolimits _{\left( \lambda (t) + \frac{1}{\varepsilon (t)} \right) \Phi } (0)}{\lambda (t) \varepsilon (t) + 1} \end{aligned}$$

by the item 2. of Lemma 1. Note that \(\lambda (t) + \frac{1}{\varepsilon (t)} \rightarrow +\infty \), as \(t \rightarrow +\infty \). Thus, the rest of the proof goes in line with Theorem 23.44 of [21]. \(\square \)

Our nearest goal is to deduce the existence and uniqueness of the solutions of the dynamical system (1). Suppose \(\beta > 0\). Let us integrate (1) from \(t_0\) to t to obtain

$$\begin{aligned} & \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) + \int _{t_0}^t \left( \frac{\alpha }{s} \dot{x}(s) + b(s) \nabla \Phi _{\lambda (s)} (x(s)) + \varepsilon (s) x(s) \right) ds \\ & - \left( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x(t_0)) \right) \ = \ 0. \end{aligned}$$

Denoting \(z(t):= \int _{t_0}^t \left( \frac{\alpha }{s} \dot{x}(s) + b(s) \nabla \Phi _{\lambda (s)} (x(s)) + \varepsilon (s) x(s) \right) ds - \big ( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x_0)) \big )\) for every \(t \ge t_0\) and noticing that \(\dot{z}(t) = \frac{\alpha }{t}\dot{x}(t) +b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t)\) we deduce, that (1) is equivalent to

$$\begin{aligned} & \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) + z(t) = 0, \\ & \dot{z}(t) - \frac{\alpha }{t} \dot{x}(t) - b(t) \nabla \Phi _{\lambda (t)}(x(t)) - \varepsilon (t) x(t) = 0, \\ & x(t_0) = x_0, \ z(t_0) = -\left( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x_0) \right) . \end{aligned}$$

Let us multiply the first line by the function b and the second one by the constant \(\beta \) and then sum them up to get rid of the gradient of the Moreau envelope in the second equation

$$\begin{aligned} & \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) + z(t) = 0, \\ & \beta \dot{z}(t) + \left( b(t) - \frac{\alpha \beta }{t} \right) \dot{x}(t) - \beta \varepsilon (t) x(t) + b(t) z(t) = 0,\\ & x(t_0) = x_0, \ z(t_0) = -\left( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x_0) \right) . \end{aligned}$$

We denote now \(y(t) = \beta z(t) + \left( b(t) - \frac{\alpha \beta }{t} \right) x(t)\), and, after simplification, we obtain the following equivalent formulation for the dynamical system

$$\begin{aligned} & \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) + \left( \frac{\alpha }{t} - \frac{b(t)}{\beta } \right) x(t) + \frac{1}{\beta } y(t) = 0, \\ & \dot{y}(t) - \left( \dot{b}(t) + \frac{\alpha \beta }{t^2} + \beta \varepsilon (t) + \frac{b^2(t)}{\beta } - \frac{\alpha b(t)}{t} \right) x(t) + \frac{b(t)}{\beta } y(t) = 0, \\ & x(t_0) = x_0, \ y(t_0) = -\beta \left( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x_0) \right) + \left( b(t_0) - \frac{\alpha \beta }{t_0} \right) x_0. \end{aligned}$$

In case \(\beta = 0\) for every \(t \ge t_0\), (1) can be equivalently written as

$$\begin{aligned} & \dot{x}(t) - y(t) = 0, \\ & \dot{y}(t) + \frac{\alpha }{t} y(t) + b(t) \nabla \Phi _{\lambda (t)}(x(t)) + \varepsilon (t) x(t) = 0, \\ & x(t_0) = x_0, \ y(t_0) = \dot{x}(t_0). \end{aligned}$$

Based on the two reformulations of the dynamical system (1) we formulate the following existence and uniqueness result, which is a consequence of Cauchy-Lipschitz theorem for strong global solutions. The result can be proved in the lines of the proofs of Theorem 1 in [16] or of Theorem 1.1 in [19] with some small adjustments.

Theorem 3

Suppose that there exists \(\lambda _0 > 0\) such that \(\lambda (t) \ge \lambda _0\) for all \(t \ge t_0\). Then for every \((x_0, \dot{x}(t_0)) \in H \cdot H \) there exists a unique strong global solution \(x: [t_0, +\infty ) \mapsto H\) of the continuous dynamics (1) which satisfies the Cauchy initial conditions \(x(t_0) = x_0\) and \(\dot{x}(t_0) = \dot{x}_0\).

3 Fast Convergence Rates of the Function and Moreau Envelope Values

This chapter is devoted to obtaining the rates of convergence for the Moreau envelope values and for the values of function \(\Phi \) itself. We will heavily rely on the tools and techniques provided by the Lyapunov analysis. We introduce a slightly modified energy function from [23]. For \(2 \le q \le \alpha - 1\) we define

$$\begin{aligned} E_q(t) \ = \ & (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \nonumber \\ & \quad + \ \frac{1}{2} \Vert q(x(t) - x^*) + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \Vert ^2 + \frac{q (\alpha - 1 - q)}{2} \Vert x(t) - x^* \Vert ^2.\nonumber \\ \end{aligned}$$
(10)

The key assumptions which are essential to our analysis are the following: for all \(t \ge t_0\)

figure a

Theorem 4

Suppose \(\alpha \ge 3\) and assume that (11), (12), (13), (14) hold for all \(t \ge t_0\). Then

$$\begin{aligned} \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ = \ O\left( \frac{1}{t^2 b(t)} \right) , \text { as } t \rightarrow +\infty , \\ \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ = \ O\left( \frac{1}{t} \right) , \text { as } t \rightarrow +\infty . \end{aligned}$$

Moreover, one has for all \(a \ge 1\)

$$\begin{aligned}&t \varepsilon (t) \Vert x(t) - x^* \Vert ^2, \ t \varepsilon (t) \Vert x(t) \Vert ^2, \ \left( (\alpha - 3) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \right) \\ &\quad \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \text { and } \\&\left( \left( t^2 b(t) - \beta t \right) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \in L^1 \big ( [t_0, +\infty ), \mathbb {R} \big ). \end{aligned}$$

If, in addition, \(\alpha > 3\) and (15) holds, then the trajectory x is bounded and

$$\begin{aligned} \int _{t_0}^{+\infty } t \Vert \dot{x}(t) \Vert ^2 dt \ < \ +\infty \end{aligned}$$

and

$$\begin{aligned} \int _{t_0}^{+\infty } t b(t) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \ < \ +\infty . \end{aligned}$$

Proof

Let us compute the time derivative of the energy function. For every \(t \ge t_0\) using (3) we derive

$$\begin{aligned} \dot{E}_q(t) \ = \ &\left( 2t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + t^2 \varepsilon (t) \langle x(t), \dot{x}(t) \rangle \\&+ \ q (\alpha - 1 - q) \langle \dot{x}(t), x(t) - x^* \rangle \\&+ \ \left( t^2 b(t) - \beta (q + 2 - \alpha )t \right) \left( \langle \nabla \Phi _{\lambda (t)} (x(t)), \dot{x}(t) \rangle - \frac{\dot{\lambda }(t)}{2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \right) \\&+ \ \frac{2t \varepsilon (t) + t^2 \dot{\varepsilon }(t)}{2} \Vert x(t) \Vert ^2 \\&+ \ \Bigg \langle q(x(t) - x^*) + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \!, (q + 1) \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \\&+ \ t \left( \ddot{x}(t) + \beta \frac{d}{dt} \nabla \Phi _{\lambda (t)} (x(t)) \right) \Bigg \rangle . \end{aligned}$$

Define \(v(t) = q(x(t) - x^*) + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \!.\) Using (1) to replace \(\ddot{x}(t) + \beta \frac{d}{dt} \nabla \Phi _{\lambda (t)} (x(t))\) we obtain

$$\begin{aligned} \langle v(t), \dot{v}(t) \rangle \ = \ &\Bigg \langle q(x(t) - x^*) + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \!, (q + 1 - \alpha ) \dot{x}(t) \\&+ \ \left( \beta - t b(t) \right) \nabla \Phi _{\lambda (t)} (x(t)) - t \varepsilon (t) x(t) \Bigg \rangle \\ = \ &q (q + 1 - \alpha ) \left\langle x(t) - x^*, \dot{x}(t) \right\rangle + (q + 1 - \alpha ) t \Vert \dot{x}(t) \Vert ^2 \\&+ \ \left( \beta (q + 2 - \alpha )t - t^2 b(t) \right) \left\langle \nabla \Phi _{\lambda (t)} (x(t)), \dot{x}(t) \right\rangle \\ &+ \left( \beta ^2 t - \beta t^2 b(t) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&- \ t^2 \varepsilon (t) \langle x(t), \dot{x}(t) \rangle - \beta t^2 \varepsilon (t) \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \rangle \\&- \ q t \left\langle \left( b(t) - \frac{\beta }{t} \right) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \right\rangle . \end{aligned}$$

By (14) one has \(b(t) - \frac{\beta }{t} > 0\) for all \(t \ge t_0\), and thus for a strongly convex function \(\varphi _t (x) \ = \ \left( b(t) - \frac{\beta }{t} \right) \Phi _{\lambda (t)} (x) + \frac{\varepsilon (t)}{2} \Vert x \Vert ^2\) we have

$$\begin{aligned} \varphi _t(x^*) - \varphi _t(x) \ \ge \ \left\langle \nabla \varphi _t(x), x^* - x \right\rangle + \frac{\varepsilon (t)}{2} \Vert x^* - x \Vert ^2 \end{aligned}$$

or

$$\begin{aligned} \begin{aligned}&- q t \left\langle \left( b(t) - \frac{\beta }{t} \right) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \right\rangle \\ \le \ &- q t \left( b(t) - \frac{\beta }{t} \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&- q t \frac{\varepsilon (t)}{2} \Vert x(t) \Vert ^2 - q t \frac{\varepsilon (t)}{2} \Vert x(t) - x^* \Vert ^2 + q t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2. \end{aligned} \end{aligned}$$

Therefore, for every \(t \ge t_0\)

$$\begin{aligned} \begin{aligned} \dot{E}_q(t) \ \le \ &\left( (2 - q) t b(t) + t^2 \dot{b}(t) - \beta (2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + (q + 1 - \alpha ) t \Vert \dot{x}(t) \Vert ^2 \\&- \ \left( \left( t^2 b(t) - \beta (q + 2 - \alpha )t \right) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 b(t) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&+ \ \frac{(2 - q)t \varepsilon (t) + t^2 \dot{\varepsilon }(t)}{2} \Vert x(t) \Vert ^2 - q t \frac{\varepsilon (t)}{2} \Vert x(t) - x^* \Vert ^2 + q t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2 \\&- \ \beta t^2 \varepsilon (t) \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \rangle . \end{aligned} \end{aligned}$$

Notice that for \(a \ge 1\)

$$\begin{aligned} -\beta t^2 \varepsilon (t) \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \rangle \ \le \ \frac{\beta t^2}{a} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + \frac{a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2, \end{aligned}$$

which leads to

$$\begin{aligned} \begin{aligned} \dot{E}_q(t)&\le \left( (2 - q) t b(t) + t^2 \dot{b}(t) - \beta (2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + (q + 1 - \alpha ) t \Vert \dot{x}(t) \Vert ^2 \\&\quad - \left( \Big ( t^2 b(t) - \beta (q + 2 - \alpha )t \Big ) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&\quad + \frac{2 (2 - q) t \varepsilon (t) + 2 t^2 \dot{\varepsilon }(t) + a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 - q t \frac{\varepsilon (t)}{2} \Vert x(t) - x^* \Vert ^2 + q t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2 \end{aligned} \end{aligned}$$
(16)

for every \(t \ge t_0\). Note that \(b(t) - \frac{1}{a} > 0\) for all \(t \ge t_0\). Then, due to the properties of b, there exists \(t^* \ge t_0\) such that \(t^2 b(t) - \beta (q + 2 - \alpha )t \ \ge \ 0\) for all \(t \ge t^*\) and all \(q \in (2, \alpha - 1]\). Therefore, since \(\dot{\lambda }(t) \ge 0\) for all \(t \ge t_0\), there exists \(t^{**}\), namely, \(t^{**} = \max \left\{ t^*, \frac{\beta }{b(t_0) - \frac{1}{a}} \right\} \), such that

$$\begin{aligned} \left( \Big ( t^2 b(t) - \beta (q + 2 - \alpha )t \Big ) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \ \ge \ 0 \text { for all } t \ge t^{**}. \end{aligned}$$

Consider now two cases with \(t \ge t^{**}\). First, take \(q = \alpha - 1\) to obtain from (16)

$$\begin{aligned} \begin{aligned} \dot{E}_{\alpha - 1}(t) \ \le \ &\left( (3 - \alpha ) t b(t) + t^2 \dot{b}(t) - \beta (2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&- \ \left( \Big ( t^2 b(t) - \beta t \Big ) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&+ \ \frac{2 (3 - \alpha ) t \varepsilon (t) + 2 t^2 \dot{\varepsilon }(t) + a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 - (\alpha - 1) t \frac{\varepsilon (t)}{2} \Vert x(t) - x^* \Vert ^2 \\&+ \ (\alpha - 1) t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2 \end{aligned} \end{aligned}$$
(17)

for every \(t \ge t_0\). Under the assumptions (11) and (12) we conclude starting from \(t^{**}\) that

$$\begin{aligned} \dot{E}_{\alpha - 1}(t) \ \le \ \frac{(\alpha - 1) t \varepsilon (t)}{2} \Vert x^* \Vert ^2. \end{aligned}$$
(18)

Under the assumption () using the fact that \(t \mapsto E_{\alpha - 1}(t)\) is bounded from below we deduce the existence of the limit \(\lim _{t \rightarrow +\infty } E_{\alpha - 1}(t)\) due to the Lemma A.1 and, therefore, \(t \mapsto E_{\alpha - 1}(t)\) is bounded, which leads to

$$\begin{aligned} \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ = \ O\left( \frac{1}{t^2 b(t)} \right) , \text { as } t \rightarrow +\infty . \end{aligned}$$

From the boundedness of \(t \mapsto \Vert (\alpha - 1)(x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2\) we obtain

$$\begin{aligned} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ = \ O\left( \frac{1}{t} \right) , \text { as } t \rightarrow +\infty , \end{aligned}$$

using the following inequality, which is true for every \(t \ge t_0\)

$$\begin{aligned} & t^2 \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \ \le \ 2\Vert (\alpha - 1)(x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\ & + 2 (\alpha - 1)^2 \Vert x(t) - x^* \Vert ^2. \end{aligned}$$

Moreover, integrating (17) one may obtain the integrability of \(t \varepsilon (t) \Vert x(t) - x^* \Vert ^2\) as well as the other terms in (17). Consider now \(q = \alpha - 1 - \delta \), where \(\delta \) is defined by (15). Thus, (16) becomes

$$\begin{aligned} \dot{E}_{\alpha - 1 - \delta }(t) \ \le \ &\left( (3 - \alpha + \delta ) t b(t) + t^2 \dot{b}(t) - \beta (2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) - \delta t \Vert \dot{x}(t) \Vert ^2 \nonumber \\&- \ \left( \Big ( t^2 b(t) - \beta (1 - \delta )t \Big ) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \nonumber \\&+ \ \frac{2 (3 - \alpha + \delta ) t \varepsilon (t) + 2 t^2 \dot{\varepsilon }(t) + a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 - (\alpha - 1 - \delta ) t \frac{\varepsilon (t)}{2} \Vert x(t) - x^* \Vert ^2 \nonumber \\&+ \ (\alpha - 1 - \delta ) t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2. \end{aligned}$$
(19)

Under the assumptions (12) and (15) we deduce \(\dot{E}_{\alpha - 1 - \delta }(t) \ \le \ \frac{(\alpha - 1 - \delta ) t \varepsilon (t)}{2} \Vert x^* \Vert ^2\) starting from \(t^{**}\). Repeating the same argument we derive that \(t \mapsto E_{\alpha - 1 - \delta }(t)\) is bounded. The function \(t \mapsto \Vert x(t) - x^* \Vert \) is also bounded and so is the trajectory x. Integrating (19) one may additionally obtain the integrability of \(t \Vert \dot{x}(t) \Vert ^2\). From the integrability of \(\left( (\alpha - 3) t b(t) - t^2 \dot{b}(t) - \beta (\alpha - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \) and (15) we deduce

$$\begin{aligned} \int _{t_0}^{+\infty } t b(t) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \ < \ +\infty . \end{aligned}$$

\(\square \)

The next theorem shows that we can actually improve the rates of convergence of the function values in case \(\alpha > 3\).

Theorem 5

Assume that \(\alpha > 3\) and (12), (13),(14) and (15) hold. Then

$$\begin{aligned} t \left\langle \left( b(t) - \frac{\beta }{t} \right) \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \in L^1 \big ( [t_0, +\infty ), \mathbb {R} \big ). \end{aligned}$$
(20)

In addition, \(\lim _{t \rightarrow +\infty } \psi (t) = 0\), where for \(2 \le q \le \alpha - 1\)

$$\begin{aligned} \psi (t)&= (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \\&+ \frac{t^2}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2, \end{aligned}$$

which in particular means

$$\begin{aligned} \begin{aligned} \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ &= \ o \left( \frac{1}{t^2 b(t)} \right) \text { as } t \rightarrow +\infty , \\ \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ &= \ o\left( \frac{1}{t} \right) \text { as } t \rightarrow +\infty \end{aligned} \end{aligned}$$
(21)

and moreover,

$$\begin{aligned} \Phi (\mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t))) - \Phi ^* \ = \ o \left( \frac{1}{t^2 b(t)} \right) \text { as } t \rightarrow +\infty , \\ \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ o \left( \frac{\sqrt{\lambda (t)}}{t \sqrt{b(t)}} \right) \text { as } t \rightarrow +\infty \end{aligned}$$

and

$$\begin{aligned} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert \ = \ o \left( \frac{1}{t \sqrt{b(t) \lambda (t)}} \right) \text { as } t \rightarrow +\infty . \end{aligned}$$

Proof

(i) Let us first prove an auxiliary estimate (20), which will allow us to obtain the rest of the desired results. We return to

$$\begin{aligned} \dot{E}_q(t) \ \le \ &\left( 2t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + (q + 1 - \alpha ) t \Vert \dot{x}(t) \Vert ^2 \\&- \ \left( \left( t^2 b(t) - \beta (q + 2 - \alpha )t \right) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&+ \ \frac{4 t \varepsilon (t) + 2 t^2 \dot{\varepsilon }(t) + a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 \\&- \ q t \left\langle \left( b(t) - \frac{\beta }{t} \right) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \right\rangle . \end{aligned}$$

Under condition (12) we deduce starting from \(t^{**}\)

$$\begin{aligned} \dot{E}_q(t) \ \le \ &\left( 2t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&+ \ t \varepsilon (t) \Vert x(t) \Vert ^2 - q t \left\langle \left( b(t) - \frac{\beta }{t} \right) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \right\rangle . \end{aligned}$$

Integrating the last inequality on \([t_0, t]\) we obtain

$$\begin{aligned}&\int _{t_0}^t q s \left\langle \left( b(s) - \frac{\beta }{s} \right) \nabla \Phi _{\lambda (s)} (x(s)), x(s) - x^* \right\rangle ds \ \le \ E_q(t_0) - E_q(t) + \int _{t_0}^t s \varepsilon (s) \Vert x(s) \Vert ^2 ds \nonumber \\&+ \ \int _{t_0}^t \left( 2s b(s) + s^2 \dot{b}(s) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) ds - \int _{t_0}^t q s \langle \varepsilon (s) x(s), x(s) - x^* \rangle . \end{aligned}$$
(22)

Since the gradient \( \nabla \Phi _{\lambda } \) is monotone, we know that \(\left\langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \ge 0\). Moreover,

$$\begin{aligned} -q t \langle \varepsilon (t) x(t), x(t) - x^* \rangle \ \le \ \frac{q t \varepsilon (t)}{2} \left( \Vert x(t) \Vert ^2 + \Vert x(t) - x^* \Vert ^2 \right) . \end{aligned}$$
(23)

Notice that by (15) we have

$$\begin{aligned} (\alpha - 3 - \delta ) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \ > \ 0 \end{aligned}$$

or

$$\begin{aligned} (\alpha - 3 - \delta ) t b(t) - t^2 \dot{b}(t) \ > \ \beta (\alpha - 2). \end{aligned}$$

Obviuosly,

$$\begin{aligned} (\alpha - 3 - \delta ) t b(t) - t^2 \dot{b}(t) \> \ \beta (\alpha - 2) \ > \ \beta (\alpha - 2 - q) \text { for every } q \in (2, \alpha - 1). \end{aligned}$$

Introducing \(\delta _1 = \alpha - 1 - \delta > 0\) (by the choice of \(\delta \)) we obtain

$$\begin{aligned} (\delta _1 - 2) t b(t) - t^2 \dot{b}(t) \ > \ -\beta (q + 2 - \alpha ) \end{aligned}$$

or

$$\begin{aligned} 2 t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \ < \ \delta _1 t b(t). \end{aligned}$$

From Theorem 4 we know that \( t b(t) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \) is integrable and therefore so is \(\left( 2 t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \). Since the function \(t \mapsto E_q(t)\) is bounded and the rest of the right hand side of (22) belongs to \(L^1 \big ( [t_0, +\infty ), \mathbb {R} \big )\) by Theorem 4 and (23), we conclude with (20) due to (14).

(ii) In order to derive the convergence rates for the quantities of our interest we require some additional results. Our nearest goal is to establish the existence of the limits

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert \text { and } \lim _{t \rightarrow +\infty } t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle . \end{aligned}$$

Consider (as was done in [23, 24]) for two different \(q_1, q_2 \in (2, \alpha - 1)\) and for every \(t \ge t_0\) the difference

$$\begin{aligned}&E_{q_1}(t) - E_{q_2}(t) \ = \ (t^2 b(t) - \beta (q_1 + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \\&+ \ \frac{1}{2} \Vert q_1 (x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + \frac{q_1 (\alpha - 1 - q_1)}{2} \Vert x(t) - x^* \Vert ^2 \\&- (t^2 b(t) - \beta (q_2 + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) - \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \\&- \ \frac{1}{2} \Vert q_2 (x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 - \frac{q_2 (\alpha - 1 - q_2)}{2} \Vert x(t) - x^* \Vert ^2 \\&= \ (q_1 - q_2) \Bigg ( -\beta t \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \\&+ \ \frac{\alpha - 1}{2} \Vert x(t) - x^* \Vert ^2 \Bigg ). \end{aligned}$$

As we have established earlier in Theorem 4 the limits of \(E_{q_1}(t) - E_{q_2}(t)\) and \(t \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \) exists (the latter is actually zero). Therefore, the limit

$$\begin{aligned} \lim _{t \rightarrow +\infty } \left( t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle + \frac{\alpha - 1}{2} \Vert x(t) - x^* \Vert ^2 \right) \text { also exists}. \end{aligned}$$

Let us introduce for every \(t \ge t_0\) two auxiliary functions

$$\begin{aligned} k(t) \ = \ t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle + \frac{\alpha - 1}{2} \Vert x(t) - x^* \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} r(t) \ = \ \frac{1}{2} \Vert x(t) - x^* \Vert ^2 + \beta \int _{t_0}^t \left\langle \nabla \Phi _{\lambda (s)} (x(s)), x(s) - x^* \right\rangle ds. \end{aligned}$$

Noticing that

$$\begin{aligned} \dot{r}(t) \ = \ \langle x(t) - x^*, \dot{x}(t) \rangle + \beta \left\langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \end{aligned}$$

we may write for every \(t \ge t_0\)

$$\begin{aligned} (\alpha - 1) r(t) + t \dot{r}(t) = k(t) + \beta (\alpha - 1) \int _{t_0}^t \left\langle \nabla \Phi _{\lambda (s)} (x(s)), x(s) - x^* \right\rangle ds. \end{aligned}$$

From the fact that \(\lim _{t \rightarrow +\infty } k(t)\) exists using (20) we obtain that \( \lim _{t \rightarrow +\infty } (\alpha - 1) r(t) + t \dot{r}(t) \) also exists. Applying Lemma A.2 we deduce the existence of the limit \(\lim _{t \rightarrow +\infty } r(t)\). Using (20) again we obtain the existence of the limits \(\lim _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert \) and \(\lim _{t \rightarrow +\infty } t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \).

(iii) Finally, we are in position to prove (21) and the rest of the convergence rates. The key idea is to show that the limit

$$\begin{aligned} & \lim _{t \rightarrow +\infty } \left( (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \right. \\ & \left. + \frac{t^2}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \right) \end{aligned}$$

exists and is actually zero. Let us return to the definition of our energy functional and rewrite it as

$$\begin{aligned}&E_q(t) \ = \ (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \\&+ \ \frac{t^2}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t) \Vert ^2 + q t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \\&+ \frac{q (\alpha - 1)}{2} \Vert x(t) - x^* \Vert ^2. \end{aligned}$$

Since the limits

$$\begin{aligned} \lim _{t \rightarrow +\infty } E_q(t) \text { and } \lim _{t \rightarrow +\infty } \left( q t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t), x(t) - x^* \right\rangle + \frac{q (\alpha - 1)}{2} \Vert x(t) - x^* \Vert ^2 \right) \text { exist,} \end{aligned}$$

it follows that

$$\begin{aligned} & \lim _{t \rightarrow +\infty } \left( (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \right. \\ & \left. + \frac{t^2}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \right) \end{aligned}$$

exists as well. Denote

$$\begin{aligned} & \psi (t) = (t^2 b(t) - \beta (q + 2 - \alpha )t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \\ & + \frac{t^2}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \end{aligned}$$

and consider

$$\begin{aligned} & 0 \ \le \ \frac{\psi (t)}{t} \ \le \ 2t b(t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t \varepsilon (t)}{2} \Vert x(t) \Vert ^2 \nonumber \\ & + \frac{t}{2} \Vert \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2. \end{aligned}$$
(24)

Let us show that the right hand side of (24) is integrable. Indeed, the first term is integrable by Theorem 4. As we have also established in Theorem 4, starting from \(t^{**}\)

$$\begin{aligned} \Big ( t b(t) - \beta \Big ) \frac{\dot{\lambda }(t)}{2} + \beta t \left( b(t) - \frac{1}{a} \right) \ \ge \ \beta ^2, \end{aligned}$$

where \(a \ge 1\). Then, by (14) and \(\dot{\lambda }(t) \ge 0\) for all \(t \ge t_0\), we deduce that there exists \(t_1 \ge t^{**}\) such that for all \(t \ge t_1\)

$$\begin{aligned} \Big ( t b(t) - \beta \Big ) \frac{\dot{\lambda }(t)}{2} + \beta t \left( b(t) - \frac{1}{a} \right) \ \ge \ \frac{3 \beta ^2}{2} \end{aligned}$$

or

$$\begin{aligned} t^2 b(t) \left( \frac{\dot{\lambda }(t)}{2} + \beta \left( 1 - \frac{1}{a b(t)} \right) \right) \ \ge \ \left( 3 \beta + \dot{\lambda }(t) \right) \frac{\beta t}{2} \end{aligned}$$

or

$$\begin{aligned} \left( t^2 b(t) - \beta t \right) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b(t) - \frac{1}{a} \right) \ \ge \ \frac{\beta ^2 t}{2}, \end{aligned}$$

So, by Theorem 4 the right hand side of (24) belongs to \(L^1 \big ( [t_1, +\infty ), \mathbb {R} \big )\). Therefore, \(\frac{\psi (t)}{t}\) also belongs to \(L^1 \big ( [t_1, +\infty ), \mathbb {R} \big )\) and since the limit \(\lim _{t \rightarrow +\infty } \psi (t)\) exists we deduce that it should be actually zero, which gives us (21). To complete the proof notice that by the definition of the proximal mapping, we have

$$\begin{aligned} \Phi _{\lambda (t)}(x(t)) - \Phi ^* = \ \Phi (\mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t))) - \Phi ^* + \frac{1}{2\lambda (t)} \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert ^2 \quad \forall t \ge t_0. \end{aligned}$$

The conclusion follows immediately from (2) and (21). \(\square \)

4 Strong Convergence of the Trajectories

In this chapter we will establish the strong convergence of the trajectories to the minimal norm element of \(\mathop {\textrm{argmin}}\limits \Phi \).

In order to do so, we will need to modify assumption (13) from the previous chapter:

figure b

Before moving to the main point of the section, let us prove an auxiliary result first.

Theorem 6

Suppose that \(\alpha > 3\), the function \(\lambda \) is bounded for all \(t \ge t_0\) and (11), (12), (14) and (25) hold. Then

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \right) - \Phi ^* \ = \ 0. \end{aligned}$$

Proof

Let us return to (18):

$$\begin{aligned} \dot{E}_{\alpha - 1}(t) \ \le \ (\alpha - 1) t \frac{\varepsilon (t)}{2} \Vert x^* \Vert ^2. \end{aligned}$$

Let us integrate the last inequality on [T, t]

$$\begin{aligned} E_{\alpha - 1}(t) \ \le \ E_{\alpha - 1}(T) + \frac{(\alpha - 1) \Vert x^* \Vert ^2}{2} \int _T^t s \varepsilon (s) ds. \end{aligned}$$

On the other hand, for every \(t \ge t_0\)

$$\begin{aligned} E_{\alpha - 1}(t) \ \ge \ (t^2 b(t) - \beta t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) . \end{aligned}$$

Thus,

$$\begin{aligned} \Phi _{\lambda (t)}(x(t)) - \Phi ^* \ \le \ \frac{E_{\alpha - 1}(T)}{t^2 b(t) - \beta t} + \frac{(\alpha - 1) \Vert x^* \Vert ^2}{2(t^2 b(t) - \beta t)} \int _T^t s \varepsilon (s) ds. \end{aligned}$$

We deduce due to (25) and the Lemma A.3 that

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{1}{t^2 b(t)} \int _T^t s^2 b(s) \frac{\varepsilon (s)}{s b(s)} ds \ = \ 0. \end{aligned}$$

Therefore,

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{(\alpha - 1) \Vert x^* \Vert ^2}{2(t^2 b(t) - \beta t)} \int _T^t s \varepsilon (s) ds \ = \ 0 \end{aligned}$$

and clearly

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{E_{\alpha - 1}(T)}{t^2 b(t) - \beta t} \ = \ 0. \end{aligned}$$

Thus, we establish

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ = \ 0. \end{aligned}$$

By the definition of the proximal mapping

$$\begin{aligned} \Phi _{\lambda (t)}(x(t)) - \Phi ^* = \ \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t)) \right) - \Phi ^* + \frac{1}{2\lambda (t)} \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert ^2 \quad \forall t \ge t_0. \end{aligned}$$

Using the fact that \(\lambda \) is bounded for all \(t \ge t_0\) we deduce

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \right) - \Phi ^* \ = \ 0. \end{aligned}$$

\(\square \)

For the remaining part of this section we will use a different energy functional. Inspired by [23] we introduce the following functional, which we will heavily rely on throughout this section

$$\begin{aligned} E_{p, q}(t)= & t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \nonumber \\ & + \frac{\varepsilon (t) t^{p+2}}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) + \frac{t^p}{2} \left\| v(t) \right\| ^2, \end{aligned}$$
(26)

where \(v(t) = q (x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \) and \(p, \ q \ge 0\).

The proof of the following theorem draws inspiration from [10, 15, 23].

Theorem 7

Suppose that \(\lambda \) is bounded for all \(t \ge t_0\), \(\alpha > 3\), \(b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0}\) and (11), (12) and (25) are fulfilled. Suppose additionally that for all \(t \ge t_0\)

$$\begin{aligned} \left( \frac{\alpha }{3} - 1 \right) t b(t) - t^2 \dot{b}(t) + \frac{\alpha \beta }{3} \ \ge \ 0 \end{aligned}$$
(27)

and moreover that for all \(t \ge t_0\)

$$\begin{aligned} 2 \alpha (\alpha - 3) - 9 t^2 \varepsilon (t) + 6 \alpha \beta \ \le \ 0, \end{aligned}$$
(28)
$$\begin{aligned} 18 \beta t + 9 \beta \dot{\lambda }(t) - 9 t b(t) \left( \dot{\lambda }(t) + 2 \beta \right) + 3(\alpha + 3) \beta ^2 + \alpha ^2 \beta \ \le \ 0 \end{aligned}$$
(29)

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{\beta }{t^{\frac{\alpha }{3} + 1} \varepsilon (t)} \int _{t_0}^t s^{\frac{\alpha }{3} + 1} \varepsilon ^2(s) ds \ = \ 0. \end{aligned}$$
(30)

If \(x: [t_0,+\infty ) \mapsto H\) is a solution to (1) and the trajectory x(t) stays either inside or outside the ball \(B(0, \Vert x^* \Vert )\), then x(t) converges to minimal norm solution \(x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0)\), as \(t \rightarrow +\infty \). Otherwise, \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\).

Proof

As in [23] we will consider several cases with respect to the trajectory x staying either inside or outside the ball \(B \left( 0, \Vert x^* \Vert \right) \).

Case I.

Assume that the trajectory x stays in the complement of the ball B for all \(t \ge t_0\). This means nothing but \(\Vert x(t) \Vert \ge \Vert x^* \Vert \) for every \(t \ge t_0\).

(i) Our nearest goal is to obtain the upper bound for the derivative of \(E_{p, q}\). In order to do so, let us evaluate its time derivative for every \(t \ge t_0\) first.

$$\begin{aligned} \begin{aligned}&\frac{d}{dt} E_{p, q}(t) \ = \ t^p \left( (p + 2) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&+ \ t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) \left( \left\langle \nabla \Phi _{\lambda (t)} (x(t)), \dot{x}(t) \right\rangle - \frac{\dot{\lambda }(t)}{2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \right) \\&+ \ \frac{(p+2) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) + t^{p+2} \varepsilon (t) \langle \dot{x}(t), x(t) \rangle \\&+ \ \frac{p t^{p-1}}{2} \Vert q(x(t) - x^*) + t ( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + t^p \langle \dot{v}(t), v(t) \rangle . \end{aligned} \end{aligned}$$
(31)

Consider for every \(t \ge t_0\) the inner product \(\langle \dot{v}(t), v(t) \rangle \):

$$\begin{aligned}&\Bigg \langle (q + 1) \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) + t \left( \ddot{x}(t) + \beta \frac{d}{dt} \nabla \Phi _{\lambda (t)} (x(t)) \right) , q(x(t) - x^*) + t ( \dot{x}(t) \\&\qquad + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Bigg \rangle \\&\quad = \Bigg \langle (q + 1 - \alpha ) \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) - t \Big ( b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t) \Big ), q(x(t) - x^*) + t ( \dot{x}(t) \\&\qquad + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Bigg \rangle \\&\quad = q (q + 1 - \alpha ) \langle \dot{x}(t), x(t) - x^* \rangle + (q + 1 - \alpha ) t \left( \Vert \dot{x}(t) \Vert ^2 + \left\langle \beta \nabla \Phi _{\lambda (t)} (x(t)),\dot{x}(t) \right\rangle \right) \\&\qquad + \beta q \langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \rangle + \beta t \langle \nabla \Phi _{\lambda (t)} (x(t)), \dot{x}(t) \rangle + \beta ^2 t \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&\qquad - q t \left\langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \right\rangle - t^2 \left\langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), \dot{x}(t) \right\rangle \\&\qquad - \beta t^2 \big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), \nabla \Phi _{\lambda (t)} (x(t)) \big \rangle , \end{aligned}$$

where above we used (1). Consider now for every \(t \ge t_0\),

$$\begin{aligned}&\Vert q(x(t) - x^*) + t ( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) ) \Vert ^2 \ = \ q^2 \Vert x(t) - x^* \Vert ^2 + 2q t \langle \dot{x}(t), x(t) - x^* \rangle \\&\quad + 2q \beta t \langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \rangle + t^2 \Vert \dot{x}(t) \Vert ^2 + 2 \beta t^2 \langle \nabla \Phi _{\lambda (t)} (x(t)), \dot{x}(t) \rangle \\ &\quad + \beta ^2 t^2 \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2. \end{aligned}$$

The two estimates that we made above lead to (31) becoming

$$\begin{aligned} \frac{d}{dt} E_{p, q}(t) \ = \ &t^p \left( (p + 2) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&+ \ \frac{(p + 2) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) + \frac{p q^2 t^{p-1}}{2} \Vert x(t) - x^* \Vert ^2 \\&+ \ \frac{(p + 2) \beta ^2 t^{p+1}}{2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + \left( q + 1 - \alpha + \frac{p}{2} \right) t^{p+1} \Vert \dot{x}(t) \Vert ^2 \\&+ \ q (q + 1 - \alpha + p) t^p \big \langle \dot{x}(t), x(t) - x^* \big \rangle + q \beta (p + 1) t^p \big \langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \big \rangle \\&- \ q t^{p+1} \Big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \Big \rangle \\&- \ \beta t^{p+2} \Big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), \nabla \Phi _{\lambda (t)} (x(t)) \Big \rangle \\&- \ \frac{\dot{\lambda }(t) t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) }{2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2. \end{aligned}$$

Let us apply the gradient inequality to the strongly convex function \(x \mapsto b(t) \Phi _{\lambda (t)} (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2} \):

$$\begin{aligned}&-\Big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \Big \rangle + \frac{\varepsilon (t) \Vert x(t) - x^* \Vert ^2}{2} \\&\qquad \le \ \left( b(t) \Phi ^* + \frac{\varepsilon (t) \Vert x^* \Vert ^2}{2} \right) - \left( b(t) \Phi _{\lambda (t)} (x(t)) + \frac{\varepsilon (t) \Vert x(t) \Vert ^2}{2} \right) \end{aligned}$$

and thus

$$\begin{aligned}&-q t^{p+1} \Big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), x(t) - x^* \Big \rangle \ \le \ -q t^{p+1} b(t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&- q t^{p+1}\frac{\varepsilon (t)}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) - q t^{p+1} \frac{\varepsilon (t) \Vert x(t)-x^* \Vert ^2}{2} \end{aligned}$$

for every \(t \ge t_0\). So, noticing that

$$\begin{aligned}&-\beta t^{p+2} \Big \langle b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t), \nabla \Phi _{\lambda (t)} (x(t)) \Big \rangle \\ = \ &-\beta t^{p+2} b(t) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 - \beta t^{p+2} \varepsilon (t) \Big \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \Big \rangle \end{aligned}$$

we deduce

$$\begin{aligned} \frac{d}{dt} E_{p, q}(t)&\le t^p \left( (p + 2 - q) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&\quad + \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) \\&\quad + \left( \frac{p q^2 t^{p-1}}{2} - \frac{q t^{p+1} \varepsilon (t)}{2} \right) \Vert x(t) - x^* \Vert ^2 \\&\quad + \frac{(p+2) \beta ^2 t^{p+1} - 2 \beta t^{p+2} b(t) - \dot{\lambda }(t) t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) }{2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&\quad + \left( q + 1 - \alpha + \frac{p}{2} \right) t^{p+1} \Vert \dot{x}(t) \Vert ^2 + q (q + 1 - \alpha + p) t^p \big \langle \dot{x}(t), x(t) - x^* \big \rangle \\&\quad + q \beta (p + 1) t^p \big \langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \big \rangle - \beta t^{p+2} \varepsilon (t) \Big \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \Big \rangle . \end{aligned}$$

In order to proceed further we will need the following estimates:

$$\begin{aligned}&q \beta (p + 1) t^p \big \langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \big \rangle \\ &\quad \le \ \frac{q \beta (p + 1) t^{p+1}}{4 c^2} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + q \beta (p + 1) c^2 t^{p-1} \Vert x(t) - x^* \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} -\beta t^{p+2} \varepsilon (t) \Big \langle x(t), \nabla \Phi _{\lambda (t)} (x(t)) \Big \rangle \ \le \ \frac{\beta t^{p+2}}{a} \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 \end{aligned}$$

for every \(t \ge t_0\), some \(c \ge 1\) and \(a \ge 1\). Thus,

$$\begin{aligned} \frac{d}{dt} E_{p, q}(t)&\le t^p \left( (p + 2 - q) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&\quad + \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \right) \Vert x(t) \Vert ^2 \\&\quad + \left( \frac{p q^2 t^{p-1}}{2} - \frac{q t^{p+1} \varepsilon (t)}{2} + q \beta (p + 1) c^2 t^{p-1} \right) \Vert x(t) - x^* \Vert ^2 \\&\quad + \Bigg ( \frac{(p+2) \beta ^2 t^{p+1} - 2 \beta t^{p+2} b(t) - \dot{\lambda }(t) t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) }{2} + \frac{q \beta (p + 1) t^{p+1}}{4 c^2} \\&\quad + \frac{\beta t^{p+2}}{a} \Bigg ) \cdot \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&\quad + \left( q + 1 - \alpha + \frac{p}{2} \right) t^{p+1} \Vert \dot{x}(t) \Vert ^2 + q (q + 1 - \alpha + p) t^p \big \langle \dot{x}(t), x(t) - x^* \big \rangle \\&\quad - \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \right) \Vert x^* \Vert ^2. \end{aligned}$$

Let us fix

$$\begin{aligned} q = \frac{2 \alpha }{3} \text { and } p = \frac{\alpha - 3}{3}. \end{aligned}$$

First of all, due to this choice

$$\begin{aligned} q + 1 - \alpha + p \ = \ 0 \end{aligned}$$

and thus we get rid of the term \(\big \langle \dot{x}(t), x(t) - x^* \big \rangle \). Secondly,

$$\begin{aligned} q + 1 - \alpha + \frac{p}{2} \ = \ -\frac{p}{2} \ \le \ 0. \end{aligned}$$
(32)

Then

$$\begin{aligned} p + 2 - q \ = \ 1 - \frac{\alpha }{3} \ \le \ 0. \end{aligned}$$
(33)

So,

$$\begin{aligned} \frac{d}{dt} E_{p, q}(t)&\le t^p \left( (p + 2 - q) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \\&\quad + \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \right) \Vert x(t) \Vert ^2 \\&\quad + \left( \frac{p q^2 t^{p-1}}{2} - \frac{q t^{p+1} \varepsilon (t)}{2} + q \beta (p + 1) c^2 t^{p-1} \right) \Vert x(t) - x^* \Vert ^2 \\&\quad + \Bigg ( \frac{(p+2) \beta ^2 t^{p+1} - 2 \beta t^{p+2} b(t) - \dot{\lambda }(t) t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) }{2} + \frac{q \beta (p + 1) t^{p+1}}{4 c^2} \\&\quad + \frac{\beta t^{p+2}}{a} \Bigg ) \cdot \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&\quad + \left( q + 1 - \alpha + \frac{p}{2} \right) t^{p+1} \Vert \dot{x}(t) \Vert ^2 - \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \right) \Vert x^* \Vert ^2. \end{aligned}$$

Obviously, for t large enough, say, \(t \ge t_2 \ge t_0\) the following expression is non-positive due to (27) and \(p + 1 = \frac{\alpha }{3} > 0\) and \(\alpha - p - q - 2 = -1\)

$$\begin{aligned}&(p + 2 - q) t b(t) + t^2 \dot{b}(t) + (p + 1) \beta (\alpha - p - q - 2) \\ &\quad = \left( 1 - \frac{\alpha }{3} \right) t b(t) + t^2 \dot{b}(t) - \frac{\alpha \beta }{3} \ \le \ 0. \end{aligned}$$

Moreover, from (28) it follows that for \(c = 1\)

$$\begin{aligned} \frac{p q^2 t^{p-1}}{2} - \frac{q t^{p+1} \varepsilon (t)}{2} + q \beta (p + 1) c^2 t^{p-1} \ = \ \frac{\alpha t^{\frac{\alpha - 6}{3}}}{27} \left( 2 \alpha (\alpha - 3) - 9 t^2 \varepsilon (t) + 6 \alpha \beta \right) \ \le \ 0 \end{aligned}$$

for all \(t \ge t_0\). Furthermore,

$$\begin{aligned}&\left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \right) \Vert x(t) \Vert ^2\\ &\qquad - \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} \right) \Vert x^* \Vert ^2 \\&\quad = \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \right) \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) \\ &\quad + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \Vert x^* \Vert ^2. \end{aligned}$$

So, under the assumption (12) and the fact that \(\Vert x(t) \Vert \ge \Vert x^* \Vert \) for all \(t \ge t_0\) we deduce due to (33)

$$\begin{aligned} \left( \frac{(p + 2 - q) t^{p+1} \varepsilon (t) + t^{p+2} \dot{\varepsilon }(t)}{2} + \frac{a \beta t^{p+2} \varepsilon ^2(t)}{4} \right) \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) \ \le \ 0. \end{aligned}$$

Thus, under the assumptions (12), (27), (28) and (29) (the latest leads to the non-positivity of the coefficient of \(\Vert \nabla \Phi _{\lambda }(x) \Vert ^2\)) we conclude due to (32) that for every \(t \ge t_2\)

$$\begin{aligned} \frac{d}{dt} E_{p, q}(t) \ \le \ \frac{a \beta t^{\frac{\alpha }{3} + 1} \varepsilon ^2(t)}{4} \Vert x^* \Vert ^2. \end{aligned}$$
(34)

(ii) Let us obtain now the lower bound for \(E_{p,q}\). Notice that for \(p = \frac{\alpha - 3}{3}\) and \(q = \frac{2 \alpha }{3}\) we have \(\alpha - p - q = 1\) and

$$\begin{aligned} E_{p, q}(t) \ &\ge \ t^{p+1} \left( t b(t) + \beta (\alpha - p - q - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \nonumber \\ &\quad + \frac{\varepsilon (t) t^{p+2}}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) \nonumber \\&= \ t^{p+1} \left( t b(t) - \beta \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{\varepsilon (t) t^{p+2}}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) \nonumber \\&\ge \ \frac{t^{p+2}}{2} \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{\varepsilon (t) t^{p+2}}{2} \left( \Vert x(t) \Vert ^2 - \Vert x^* \Vert ^2 \right) ,\nonumber \\ \end{aligned}$$
(35)

since \(t b(t) - \beta \ge \frac{t}{2}\) for every \(t \ge t_0\) by \(b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0}\) and b being non-decreasing. On the other hand, applying the gradient inequality to the strongly convex function \(\varphi _{\varepsilon (t), \lambda (t)}(x) \ = \ \frac{\Phi _{\lambda (t)}(x)}{2} + \frac{\varepsilon (t)}{2} \Vert x \Vert ^2\) we deduce for \(x_{\varepsilon (t), \lambda (t)} = \mathop {\textrm{argmin}}\limits _{H} \varphi _{\varepsilon (t), \lambda (t)}(x)\)

$$\begin{aligned} \varphi _{\varepsilon (t), \lambda (t)} (x) - \varphi _{\varepsilon (t), \lambda (t)} (x_{\varepsilon (t), \lambda (t)}) \ \ge \ \frac{\varepsilon (t)}{2} \Vert x - x_{\varepsilon (t), \lambda (t)} \Vert ^2 \text { for every } x \in H. \end{aligned}$$

By the definition of \(\varphi _{\varepsilon (t), \lambda (t)} (x)\) we deduce

$$\begin{aligned}&\varphi _{\varepsilon (t), \lambda (t)} (x_{\varepsilon (t), \lambda (t)}) - \varphi _{\varepsilon (t), \lambda (t)} (x^*) \\&\quad = \ \frac{1}{2} \left( \Phi _{\lambda (t)} (x_{\varepsilon (t), \lambda (t)}) - \Phi ^* \right) + \frac{\varepsilon (t)}{2} \left( \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 - \Vert x^* \Vert ^2 \right) \\&\quad \ge \ \frac{\varepsilon (t)}{2} \left( \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 - \Vert x^* \Vert ^2 \right) . \end{aligned}$$

We may now add the last two inequalities to obtain

$$\begin{aligned} & \varphi _{\varepsilon (t), \lambda (t)} (x) - \varphi _{\varepsilon (t), \lambda (t)} (x^*) \nonumber \\ & \quad \ge \ \frac{\varepsilon (t)}{2} \left( \Vert x - x_{\varepsilon (t), \lambda (t)} \Vert ^2 + \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 - \Vert x^* \Vert ^2 \right) \text { for every } x \in H. \end{aligned}$$
(36)

Plugging (36) into (35) we conclude that for every \(t \ge t_2\)

$$\begin{aligned} E_{p, q}(t) \ \ge \ \frac{t^{p+2} \varepsilon (t)}{2} \left( \Vert x(t) - x_{\varepsilon (t), \lambda (t)} \Vert ^2 + \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 - \Vert x^* \Vert ^2 \right) . \end{aligned}$$
(37)

(iii) Finally, using the lower and upper bounds for \(E_{p, q}\) we can prove the strong convergence of the trajectories to a minimal norm solution. Integrating (34) on \([t_2, t]\) we obtain

$$\begin{aligned} E_{p, q}(t) \ \le \ E_{p, q}(t_2) + \frac{a \beta \Vert x^* \Vert ^2}{4} \int _{t_2}^t s^{\frac{\alpha }{3} + 1} \varepsilon ^2(s) ds \end{aligned}$$

and using (37) we deduce for every \(t \ge t_2\)

$$\begin{aligned} \Vert x(t) - x_{\varepsilon (t), \lambda (t)} \Vert ^2 \ \le \ \Vert x^* \Vert ^2 - \Vert x_{\varepsilon (t), \lambda (t)} \Vert ^2 + \frac{2 E_{p, q} (t_2)}{t^{\frac{\alpha }{3} + 1} \varepsilon (t)} + \frac{a \beta \Vert x^* \Vert ^2}{2 t^{\frac{\alpha }{3} + 1} \varepsilon (t)} \int _{t_2}^t s^{\frac{\alpha }{3} + 1} \varepsilon ^2(s) ds. \end{aligned}$$

Note that due to (28)

$$\begin{aligned} t^2 \varepsilon (t) \ \ge \ \frac{2 \alpha (\alpha - 3) + 6 \alpha \beta }{9} \ = \ \hat{C} \ \ge \ 0 \end{aligned}$$

and

$$\begin{aligned} t^{\frac{\alpha }{3} + 1} \varepsilon (t) \ = \ t^2 \varepsilon (t) t^{\frac{\alpha }{3} - 1} \ \ge \ \hat{C} t^{\frac{\alpha }{3} - 1}. \end{aligned}$$

Since \(\alpha > 3\) we deduce

$$\begin{aligned} \lim _{t \rightarrow +\infty } t^{\frac{\alpha }{3} + 1} \varepsilon (t) = +\infty \end{aligned}$$

and thus

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{2 E_{p, q} (t_2)}{t^{\frac{\alpha }{3} + 1} \varepsilon (t)} \ = \ 0. \end{aligned}$$

Finally, by (9) and (30) we conclude

$$\begin{aligned} \lim _{t \rightarrow +\infty } x(t) = x^*. \end{aligned}$$

Case II.

Assume now the opposite to the first case, namely, \(\Vert x(t) \Vert < \Vert x^* \Vert \) for every \(t \ge t_0\). According to Theorem 6

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \right) - \Phi ^* \ = \ 0. \end{aligned}$$

Denote \(\xi (t) = \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t))\). Considering a sequence \( \{ t_k \}_{k \in {\mathbb {N}}} \) such that \( \{x(t_k)\}_{k \in {\mathbb {N}}} \) converges weakly to an element \( \hat{x} \in H \) as \( k \rightarrow \infty \), we notice that \(\{\xi (t_k)\}_{k \in {\mathbb {N}}} \) converges weakly to \({\hat{x}}\) as \( k \rightarrow \infty \). Now, the function \( \Phi \) being convex and lower semicontinuous in the weak topology, allows us to write

$$\begin{aligned} \Phi ({\hat{x}}) \ \le \ \liminf _{k \rightarrow \infty } \Phi (\xi (t_k)) \ = \ \lim _{t \rightarrow +\infty } \Phi (\xi (t)) \ = \ \Phi ^* \end{aligned}$$

and hence, \( {\hat{x}} \in \mathop {\textrm{argmin}}\limits \Phi \). The norm is weakly semicontinuous, so

$$\begin{aligned} \Vert {\hat{x}} \Vert \ \le \ \liminf _{k \rightarrow \infty } \Vert \xi (t_k) \Vert \ \le \ \Vert x^* \Vert , \end{aligned}$$

which means that \({\hat{x}} = x^*\) by the uniqueness of the element of the minimum norm in \(\mathop {\textrm{argmin}}\limits \Phi _{\lambda }\). Therefore, the trajectory x converges weakly to \(x^*\) and

$$\begin{aligned} \Vert x^* \Vert \ \le \ \liminf _{t \rightarrow +\infty } \Vert x(t) \Vert \ \le \ \limsup _{t \rightarrow +\infty } \Vert x(t) \Vert \ \le \ \Vert x^* \Vert \end{aligned}$$

and thus

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert x(t) \Vert \ = \ \Vert x^* \Vert . \end{aligned}$$

From this and the weak convergence of the trajectory x follows the strong one: \(\lim _{t \rightarrow +\infty } x(t) = x^*\).

Case III.

Assume that for \(t \ge t_0\) the trajectory x finds itself both inside and outside the ball \(B(0, \Vert x^* \Vert )\). Since x is continuous, there exists a sequence \(\{ t_n \}_{n \in \mathbb {N}} \subseteq [t_0, +\infty )\) such that \(t_n \rightarrow \infty \) as \(n \rightarrow \infty \) and \(\Vert x(t_n) \Vert = \Vert x^* \Vert \) for every \(n \in \mathbb {N}\). Consider again a weak sequential cluster point \({\hat{x}}\) of the sequence \(\{ x(t_n) \}_{n \in \mathbb {N}}\). By repeating the same argument as in the previous case we deduce the weak convergence of \(\{ x(t_n) \}_{n \in \mathbb {N}}\) to \(x^*\), as \(n \rightarrow \infty \). Since \(\Vert x(t_n) \Vert \rightarrow \Vert x^* \Vert \), as \(n \rightarrow \infty \), we obtain that \(\Vert x(t_n) - x^* \Vert \rightarrow 0\), as \(n \rightarrow \infty \), which means \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\). \(\square \)

Remark 1

In this section the condition \(\dot{b}(t) \ge 0\) for all \(t \ge t_0\) is not necessary. Our conjecture is that we can weaken the setting by omitting this condition and thus widen the range for b, including the functions that decay not faster than \(\frac{1}{t^2}\) for the polynomial choice of parameters.

Remark 2

There is no setting which guarantees both fast rates for the values and strong convergence of the trajectories. One of the future goal would be to develop a new approach (based on [6]), which would help us deduce these two results simultaneously.

4.1 Strong Convergence of the Tajectories in Cse \(\alpha = 3\)

Throughout this section we no longer require that b is non-decreasing. In this case the analogue of Theorem 6 looks as follows.

Theorem 8

Suppose that for all \(t \ge t_0\) the function \(\lambda \) is bounded, \(b(t) \equiv b > 0\) is a constant function and (12) and (14) hold. Suppose additionally that (25) holds for constant b, namely

$$\begin{aligned} \int _{t_0}^{+\infty } \frac{\varepsilon (t)}{t} dt \ < \ +\infty . \end{aligned}$$

Then

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \right) - \Phi ^* \ = \ 0. \end{aligned}$$

Proof

In this case the energy functional becomes

$$\begin{aligned} E_2(t)= & (b t^2 - \beta t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + \frac{t^2 \varepsilon (t)}{2} \Vert x(t) + \frac{1}{2} \Vert 2 (x(t) - x^*) \\ & + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \Vert ^2. \end{aligned}$$

Relation (16) thus becomes for all \(t \ge t_0\)

$$\begin{aligned} \dot{E}_2(t) \ \le \ &\beta \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) - \left( \Big ( b t^2 - \beta t \Big ) \frac{\dot{\lambda }(t)}{2} - \beta ^2 t + \beta t^2 \left( b - \frac{1}{a} \right) \right) \Vert \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2 \\&+ \ \frac{2 t^2 \dot{\varepsilon }(t) + a \beta t^2 \varepsilon ^2(t)}{4} \Vert x(t) \Vert ^2 - t \varepsilon (t) \Vert x(t) - x^* \Vert ^2 + t \varepsilon (t) \Vert x^* \Vert ^2. \end{aligned}$$

Thus, repeating the same arguments as in Theorem 4 we obtain

$$\begin{aligned} \dot{E}_2(t) \ \le \ \beta \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + t \varepsilon (t) \Vert x^* \Vert ^2. \end{aligned}$$

Let us multiply this expression with \(t (b t - \beta )\) to obtain

$$\begin{aligned} t (b t - \beta ) \dot{E}_2(t)\le & \beta t (b t - \beta ) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) + t^2 (b t - \beta ) \varepsilon (t) \Vert x^* \Vert ^2 \\ \le & \beta E_2(t) + t^2 (b t - \beta ) \varepsilon (t) \Vert x^* \Vert ^2. \end{aligned}$$

Now, we will divide by \((b t - \beta )^2\) to conclude

$$\begin{aligned} \frac{t}{(b t - \beta )} \dot{E}_2(t) \ \le \ \frac{\beta }{(b t - \beta )^2} E_2(t) + \frac{t^2}{(b t - \beta )} \varepsilon (t) \Vert x^* \Vert ^2 \end{aligned}$$

or

$$\begin{aligned} \frac{d}{dt} \left( \frac{t}{b t - \beta } E_2(t) \right) \ \le \ \frac{t^2}{(b t - \beta )} \varepsilon (t) \Vert x^* \Vert ^2. \end{aligned}$$

Integrating the last inequality on [Tt], where \(T \ge t_0\), we deduce

$$\begin{aligned} \frac{t}{b t - \beta } E_2(t) \ \le \ \frac{T}{b T - \beta } E_2(T) + \Vert x^* \Vert ^2 \int _T^t \frac{s^2}{(b s - \beta )} \varepsilon (s) ds. \end{aligned}$$

By the definition of \(E_2\) we know

$$\begin{aligned} E_2(t) \ \ge \ (b t^2 - \beta t) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) . \end{aligned}$$

Combining these two inequalities, we deduce

$$\begin{aligned} \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ \le \ \frac{T}{t^2 (b T - \beta )} E_2(T) + \frac{\Vert x^* \Vert ^2}{t^2} \int _T^t \frac{s^2}{(b s - \beta )} \varepsilon (s) ds. \end{aligned}$$

Now,

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{T}{t^2 (b T - \beta )} E_2(T) \ = \ 0. \end{aligned}$$

Applying Lemma A.3 we deduce due to (25)

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{b t - \beta }{t^3} \int _T^t \frac{s^3}{(b s - \beta )} \frac{\varepsilon (s)}{s} ds \ = \ 0 \end{aligned}$$

and thus

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{\Vert x^* \Vert ^2}{t^2} \int _T^t \frac{s^2}{(b s - \beta )} \varepsilon (s) ds \ = \ 0. \end{aligned}$$

Therefore, we establish

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi _{\lambda (t)} (x(t)) - \Phi ^* \ = \ 0. \end{aligned}$$

Again, by the definition of the proximal mapping

$$\begin{aligned} \Phi _{\lambda (t)}(x(t)) - \Phi ^* = \ \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t)) \right) - \Phi ^* + \frac{1}{2\lambda (t)} \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert ^2 \quad \forall t \ge t_0. \end{aligned}$$

Using the fact that \(\lambda \) is bounded for all \(t \ge t_0\) we deduce

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Vert \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) - x(t) \Vert \ = \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \Phi \left( \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi } (x(t)) \right) - \Phi ^* \ = \ 0. \end{aligned}$$

\(\square \)

We are in position now to formulate the analogue of Theorem 7.

Theorem 9

Suppose that \(\lambda \) is bounded for all \(t \ge t_0\), \(b(t) \equiv b \ge \frac{1}{2} + \frac{\beta }{t_0}\) and (12) and (25) hold. Assume, in addition, that

$$\begin{aligned} \lim _{t \rightarrow +\infty } t^2 \varepsilon (t) \ = \ +\infty , \end{aligned}$$
(38)
$$\begin{aligned} 2 \beta t + \beta \dot{\lambda }(t) - b t \left( \dot{\lambda }(t) + 2 \beta \right) + 2 \beta ^2 + \beta \ \le \ 0 \text { for all } t \ge t_0 \end{aligned}$$
(39)

and

$$\begin{aligned} \lim _{t \rightarrow +\infty } \frac{\beta }{t^2 \varepsilon (t)} \int _{t_0}^t s^2 \varepsilon ^2(s) ds \ = \ 0. \end{aligned}$$
(40)

If \(x: [t_0,+\infty ) \mapsto H\) is a solution to (1) and the trajectory x(t) stays either inside or outside the ball \(B(0, \Vert x^* \Vert )\), then x(t) converges to minimal norm solution \(x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0)\), as \(t \rightarrow +\infty \). Otherwise, \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\).

Proof

The proof goes in line with the one of Theorem 7 by taking \(\alpha = 3\), \(b(t) \equiv b > 0\), \(q = 2\), \(p = 0\) and referring to Theorem 8 instead of Theorem 6 in the second and third cases. \(\square \)

5 Analysis of the Conditions

Since all the conditions cannot be satisfied simultaneously, let us treat them separately, namely:

  1. 1.

    In order to obtain the fast convergence rates of the function values we require that for all \(t \ge t_0\):

    • \(\alpha \ > \ 3\);

    • the existence of \(a \ge 1\) such that \( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t) \),

    • \( b(t_0) \ \ge \ \frac{\beta }{t_0} \text { and } b(t_0) > \frac{1}{a} \);

    • \( \int _{t_0}^{+\infty } t \varepsilon (t) dt \ < \ +\infty \) and

    • the existence of \(0< \delta < \alpha - 3\) such that \( (\alpha - 3) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \ \ge \ \delta t b(t) \).

  2. 2.

    For the strong convergence of the trajectories we require the following for all \(t \ge t_0\):

    • \(\alpha \ > \ 3\);

    • \(\lambda \) is bounded;

    • \( \frac{\alpha - 3}{3} b(t) - t \dot{b}(t) + \frac{\alpha \beta }{3} \ \ge \ 0 \);

    • \( (\alpha - 3) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \ \ge \ 0 \);

    • the existence of \(a \ge 1\) such that \( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t) \), \( b(t_0) > \frac{1}{a} \text { and } b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0} \);

    • \( \int _{t_0}^{+\infty } \frac{\varepsilon (t)}{t b(t)} dt \ < \ +\infty \);

    • \( 2 \alpha (\alpha - 3) - 9 t^2 \varepsilon (t) + 6 \alpha \beta \ \le \ 0 \);

    • \( 18 \beta t + 9 \beta \dot{\lambda }(t) - 9 t b(t) \left( \dot{\lambda }(t) + 2 \beta \right) + 3 (\alpha + 3) \beta ^2 + \alpha ^2 \beta \ \le \ 0 \);

    • \( \lim _{t \rightarrow +\infty } \frac{\beta }{t^{\frac{\alpha }{3} + 1} \varepsilon (t)} \int _{t_0}^t s^{\frac{\alpha }{3} + 1} \varepsilon ^2(s) ds \ = \ 0 \).

We will analyse these conditions in details for the polynomial choice of functions b and \(\varepsilon \), namely, \(b(t) = b t^n\) and \(\varepsilon (t) = \frac{\varepsilon }{t^d}\), where b is positive, \(n \ge 0\) and \(\varepsilon , d > 0\).

5.1 Setting for the Fast Convergence Rates of the Function Values

The set of the conditions becomes for all \(t \ge t_0\)

  1. 1.

    \(\alpha \ > \ 3\);

  2. 2.

    there exists \(a \ge 1\) such that \( -\frac{2 d \varepsilon }{t^{d+1}} \ \le \ - \frac{a \beta \varepsilon ^2}{t^{2d}} \),

  3. 3.

    \( b(t_0) \ \ge \ \frac{\beta }{t_0} \text { and } b(t_0) > \frac{1}{a} \);

  4. 4.

    \( \int _{t_0}^{+\infty } \frac{\varepsilon }{t^{d-1}} dt \ < \ +\infty \) and

  5. 5.

    there exists \(0< \delta < \alpha - 3\) such that \( (\alpha - 3) b t^{n+1} - b n t^{n+1} + \beta (2 - \alpha ) \ \ge \ \delta b t^{n+1} \).

After some simple algebraic computations one may discover that in order to satisfy all the conditions at the same time it is enough to assume

$$\begin{aligned} \alpha - 3 \ > \ n \ \ge \ 0 \text { (condition 5) } \end{aligned}$$

and

$$\begin{aligned} d \ > \ 2 \text { and } d \ \ge \ \frac{\beta \varepsilon }{2} \text { (conditions 2 and 4) }, \end{aligned}$$

since all the other inequalities could be fulfilled by taking the appropriate \(t_0\), namely,

$$\begin{aligned} t_0 \ \ge \ \max \left\{ \root n+1 \of {\frac{\beta }{b}}, \root n+1 \of {\frac{\beta (\alpha - 2)}{b (\alpha - 3 - n)}} \right\} \text { and } t_0 \ > \ \frac{1}{\root n \of {b}}. \end{aligned}$$

5.2 Setting for the Strong Convergence of the Trajectories

The set of the conditions becomes for all \(t \ge t_0\)

  1. 1.

    \(\alpha \ > \ 3\);

  2. 2.

    \(\lambda \) is bounded;

  3. 3.

    \( \frac{\alpha - 3}{3} b t^{n+1} - b n t^{n+1} + \frac{\alpha \beta }{3} \ \ge \ 0 \);

  4. 4.

    \( (\alpha - 3) b t^{n+1} - b n t^{n+1} + \beta (2 - \alpha ) \ \ge \ 0 \);

  5. 5.

    there exists \(a \ge 1\) such that \( -\frac{2 d \varepsilon }{t^{d+1}} \ \le \ - \frac{a \beta \varepsilon ^2}{t^{2d}} \), \( b(t_0) > \frac{1}{a} \text { and } b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0} \);

  6. 6.

    \( \int _{t_0}^{+\infty } \frac{\varepsilon }{b t^{n+d+1}} dt \ < \ +\infty \);

  7. 7.

    \( 2 \alpha (\alpha - 3) - \frac{9 \varepsilon }{t^{d-2}} + 6 \alpha \beta \ \le \ 0 \);

  8. 8.

    \( 18 \beta t + 9 \beta \dot{\lambda }(t) - 9 b t^{n+1} \left( \dot{\lambda }(t) + 2 \beta \right) + 3(\alpha + 3) \beta ^2 + \alpha ^2 \beta \ \le \ 0 \);

  9. 9.

    \( \lim _{t \rightarrow +\infty } \frac{\beta }{\varepsilon t^{\frac{\alpha }{3} - d + 1}} \int _{t_0}^t \varepsilon ^2 s^{\frac{\alpha }{3} - 2d + 1} ds \ = \ 0 \).

Again, analysis of the set of conditions leads to the following conclusion:

  • \(\lambda \) is bounded (condition 2) ;

  • \(0 \ \le \ n \ \le \ \frac{\alpha - 3}{3}\) and \( \alpha \ > \ 3\) (condition 3) ;

  • \(\max \left\{ 1, \frac{\beta \varepsilon }{2} \right\} \ \le \ d \ \le \ 2\) (conditions 5, 7, 8, 9) .

As before, \(t_0\) should be chosen appropriately.

5.3 The Case \(\alpha = 3\)

In this case the following has to be assumed: there exists \(a \ge 1\) such that for all \(t \ge t_0\)

  1. 1.

    \( \lambda (t) \) is bounded;

  2. 2.

    \( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t), \ b \ > \ \frac{1}{a}\) and \(b \ge \frac{1}{2} + \frac{\beta }{t_0} \);

  3. 3.

    \( \int _{t_0}^{+\infty } \frac{\varepsilon (t)}{t} dt \ < \ +\infty \);

  4. 4.

    \( \lim _{t \rightarrow +\infty } t^2 \varepsilon (t) \ = \ +\infty \);

  5. 5.

    \( 2 \beta t + \beta \dot{\lambda }(t) - b t \left( \dot{\lambda }(t) + 2 \beta \right) + 2 \beta ^2 + \beta \ \le \ 0 \);

  6. 6.

    \( \lim _{t \rightarrow +\infty } \frac{\beta }{t^2 \varepsilon (t)} \int _{t_0}^t s^2 \varepsilon ^2(s) ds \ = \ 0 \).

Essentially, for the polynomial choice of parameters that means \(b \ \ge \ 1\) and

  • \( \lambda (t) \) is bounded (condition 2) ;

  • \( \max \left\{ 1, \frac{\beta \varepsilon }{2} \right\} \ \le \ d \ < \ 2 \) (conditions 4, 5, 6) ,

so with the appropriate choice of \(t_0\) the whole set of conditions is fulfilled.

6 Numerical Examples

6.1 The Rates of Convergence of the Moreau Envelope Values

Consider the objective function \(\Phi : \mathbb {R} \rightarrow \mathbb {R}\), \(\Phi (x) = |x| + \frac{x^2}{2}\) and let us plot the values of its Moreau envelope as well as the gradient of its Moreau envelope for different polynomial functions \(\lambda \), \(\varepsilon \) and b to illustrate the theoretical results with some numerical examples. We take \(\lambda (t) = t^l\), \(\varepsilon (t) = \frac{1}{t^d}\), \(b(t) = t^n\) with \(x(t_0) = x_0 = 10\), \(\dot{x}(t_0) = 0\), \(\alpha = 10\) and \(t_0 = 1.4\).

First, let us take different time scaling parameter b with \(l = 0\) and \(d = 3\) and see how it affects the behaviour of the system (1) (see Fig. 1).

Fig. 1
figure 1

\(l = 0\) and \(d = 3\)

As expected, the faster b grows, the faster the convergence is.

Consider now different Moreau envelope parameter \(\lambda \) with \(d = 3\) and \(n = 0\) (see Fig. 2).

Fig. 2
figure 2

\(d = 3\) and \(n = 0\)

Note that the difference in the starting point comes from the fact that \(t_0 \ne 1\), and for different exponents l the value \(t_0^l\) is also different. As predicted by theory, a faster growing function \(\lambda \) leads to faster convergence of not only the gradient of Moreau envelope of the objective function \(\Phi \), but also of the values of the Moreau envelope themselves.

Varying the Tikhonov function \(\varepsilon \) for \(n = 0\) and \(l = 0\) does not affect the system, which is illustrated by the following plot (see Fig. 3).

Fig. 3
figure 3

\(n = 0\) and \(l = 0\)

6.2 Strong Convergence of the Trajectories

For a different objective function let us investigate the strong convergence of the trajectories of (1):

$$\begin{aligned} \Phi (x) \ = \ {\left\{ \begin{array}{ll} & |x - 1|, \ x > 1 \\ & 0, \ x \in [-1, 1] \\ & |x + 1|, \ x < -1. \end{array}\right. } \end{aligned}$$

The set \(\mathop {\textrm{argmin}}\limits \Phi \) is nothing but the segment \([-1, 1]\) and 0 is its element of minimal norm. Let us fix \(\alpha = 6\) and \(n = 0.7\). First we take constant lambda (\(\lambda (t) = 1\) for all \(t \ge t_0\)) and plot the behaviour of the trajectories of (1) with and without Tikhonov term (see Fig. 4.

Fig. 4
figure 4

The role of the Tikhonov term

As we see in case there is no Tikhonov regularization the trajectories converge to the minimizer 1 of \(\Phi \), but the Tikhonov term actually guarantees the convergence towards the minimal norm solution, which is 0.

Another comparison was made for non-constant lambda: \(\lambda (t) = 1 - \frac{1}{t^l}\) for \(l = 1\) (for different l’s the picture is the same), illustrating similar behaviour (see Fig. 5).

Fig. 5
figure 5

The role of the Tikhonov term

Finally, for the same choice of \(\lambda \) let us take different Tikhonov terms to figure out how changing them affects the trajectories of (1) (see Fig. 6).

Fig. 6
figure 6

\(n = 0.7\) and \(\lambda (t) = 1 - \frac{1}{t}\)

We see, that the faster \(\varepsilon \) decays, the slower trajectories converge.