1 Introduction

In practical applications, many nonlinear phenomena can be represented as finding a zero of a monotone operator. This problem arises in various contexts, such as solving variational inequalities related to monotone operators, minimizing convex functions, finding fixed points of nonexpansive mappings, and more. One of the most widely used methods for solving this problem is the proximal point algorithm, which was originally proposed by Martinet and systematically studied by Rockafellar [39] in the context of Hilbert spaces.

Another important problem is to find a zero of the sum of two maximally monotone operators

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in A(x_*)+B(x_*). \end{aligned}$$
(1.1)

Problem (1.1) arises in a wide range of applications such as convex optimization, image processing, and signal processing. A crucial special case of Problem (1.1) is the following variational inequality (VI) problem

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in N_C(x_*) + B(x_*), \end{aligned}$$
(1.2)

where C is a nonempty closed convex subset of \(\mathcal {H}\) and \(N_C(x_*)\) is the normal cone of C at \(x_*.\) When B is single-valued, the VI problem (1.2) is equivalent to finding a point \(x_*\in C\) such that

$$\begin{aligned} \left\langle {B(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$
(1.3)

The Douglas-Rachford splitting algorithm [22], presented by Lions and Mercier [32], is a fundamental method to solve such problems. Under additional assumptions on the involved operators, linear rates of convergence for the algorithm are possible. Some other splitting methods are derived from the Douglas-Rachford algorithm (such as the primal-dual hybrid gradient method [35], Alternating Direction Method of Multiplier (ADMM) [23], and Spingarn’s method of partial inverses [23]). There are many other methods for solving Problem (1.1), especially when one of the operators is single-valued. A popular method for solving this problem is the forward–backward algorithm, which consists of a forward step with one operator and a backward step with another. The algorithm generates a sequence of iterates that converges to a solution under suitable assumptions on the operators. The forward–backward algorithm has been widely studied and applied in both finite-dimensional and infinite-dimensional settings [11, 24, 33, 38].

Nowadays, there is a growing interest in connecting and integrating optimization with other fields. This research direction has become increasingly attractive as it can provide new insights into optimization results and lead to interesting findings. Among the emerging research directions, there is a line of works that uses ordinary differential equations (ODEs) to design algorithms for optimization problems [2, 3, 9, 14], variational inequalities [18, 27, 34, 43], monotone inclusions [1, 5, 6], fixed point problems [15, 17] and equilibrium problems [20, 36, 42, 45]. Using ODE interpretation not only provides a better understanding of Nesterov’s scheme, but also helps design new schemes with similar convergence rates. The readers can refer to [9, 10, 18, 40] and references therein for more examples.

1.1 Some Historical Aspects

The Heavy Ball with Friction method is a popular optimization algorithm based on inertial dynamics. The algorithm was proposed by Polyak to accelerate the gradient method in optimization [37]. It introduces an inertial system with a fixed viscous damping coefficient

$$\begin{aligned} x^{(2)}(t) +\gamma x^{(1)}(t) + \nabla f (x(t)) = 0, \end{aligned}$$
(1.4)

for minimizing a convex and differentiable function f. Note that, when f has a Lipschitz continuous gradient then \(\nabla f\) is a co-coercive operator (see definition in Sect. 2). Attouch and Alvarez extended the heavy ball dynamical system (1.4) for constrained optimization as well as co-coercive operator in [4]. Recently, Boţ and Csetnek [15] studied the second order dynamical system with variable viscous damping coefficient

$$\begin{aligned} x^{(2)}(t) +\gamma (t) x^{(1)}(t) + \lambda (t)B (x(t)) = 0, \end{aligned}$$

for finding a zero of a co-coercive operator B. The results were applied to second order forward–backward dynamical systems for monotone inclusion problems (1.1)

$$\begin{aligned} x^{(2)}(t) +\gamma (t) x^{(1)}(t) + \lambda (t)[(x(t)) - \mathcal {J}_A(x(t) - \eta B(x(t))]= 0, \end{aligned}$$
(1.5)

where A is maximal monotone and B is co-coercive. Here \(\mathcal {J}_A\triangleq (I+A)^{-1}\) is the resolvent of an operator A with I stands for the identity operator. When the operator is only merely monotone but not co-coercive, a second order forward–backward–forward dynamical system and its discretization have been recently proposed and investigated in [19]. In particular, when the operator \(A+B\) is strongly monotone, the exponential convergence rate of the second order dynamical system (1.5) was obtained in [16]. Under suitable conditions on parameters, the authors established the convergence rate of \(O(e^{-t})\) for the trajectories.

Attouch, Chbani and Riahi are the first authors who studied third order dynamical system for minimizing a convex and differentiable function in Hilbert spaces [7, 8]. They proposed and studied the (TOGES) dynamical system [7]

$$\begin{aligned} x^{(3)}(t) +\frac{\alpha }{t} x^{(2)}(t) + \frac{2\alpha -6}{t^2} x^{(1)}(t) + \nabla f(x(t) + t x^{(1)}(t)) = 0. \end{aligned}$$
(1.6)

Using the temporal scaling techniques, the third order dynamical system (1.6) was reformulated as a second order dynamical system and the convergence analysis was obtained using Lyapunov’s energy function techniques developed for second order dynamical system. The authors showed a convergence rate of the values of the order \(\frac{1}{t^3}\), i.e. \( f(x(t) + t x^{(1)}(t)) - \inf _{\mathcal {H}}f \le \frac{C}{t^3}\) for some constant \(C>0\) and obtained the convergence of the trajectories towards optimal solutions of \(\min _{x\in \mathcal {H}} f(x)\). When the objective function f is strongly convex, the authors established an exponential rate of convergence. Proximal-based algorithms obtained by temporal discretization of (TOGES) was also investigated. Nevertheless, the rate of values of f(x(t)) in (TOGES) in only of order \(\frac{1}{t}\), i.e. \( f(x(t) ) - \inf _{\mathcal {H}}f \le \frac{C}{t}\), which is not completely satisfactory from the point of view of fast optimization. Hence, very recently, an improved version of (TOGES), called (TOGES-V) has been proposed and investigated by the same authors in [8]

$$\begin{aligned} x^{(3)}(t) +\frac{\alpha +7}{t} x^{(2)}(t) + \frac{5(\alpha +1)}{t^2} x^{(1)}(t) + \nabla f\left( x(t) + \frac{1}{4}t x^{(1)}(t)\right) = 0, \end{aligned}$$

where they obtained the rate \(\mathcal {O}\left( \frac{1}{t^3}\right) \) for \(f(x(t) ) - \inf _{\mathcal {H}} f\).

1.2 Our Contributions

In this paper, we propose for the first time a third order dynamical system for the monotone inclusion (1.1) and investigate its convergence properties in both continuous time and discrete time settings. The motivation of considering third (or higher) order dynamical system comes from the fact that it can potentially provide faster convergence rate, as seen in optimization problems [7, 8]. This will be also the case of monotone inclusion obtained in this paper. Indeed, we derive the convergence rate of \(O(e^{-\varepsilon t})\) for some \(\varepsilon > 1\) (in particular for \(\varepsilon = 2\)) under suitable choices of parameters, which is significantly faster than the classical results obtained in [16] for second order dynamical systems. In discrete setting, the third order dynamical system provides a new forward backward algorithm with double momentum and a larger range of stepsize.

In contrast to the classical monotone inclusion problem, where each individual operator A and B is required to be (maximally) monotone, we only require A and B to be generalized monotone (see Definitions in Sect. 2). This approach allows us to handle not only the classical monotone inclusion problem but also the problem of finding zeros of the sum of a weakly monotone operator and a strongly monotone operator, and the pseudo-monotone variational inequalities. Applications of these models include minimizing the sum of a weakly convex function and a strongly convex function [21, 26] or minimizing a pseudo-convex function. The convergence analysis developed in this paper is purely relied on Lyapunov’s energy function techniques, in contrast to the temporal scaling technique using in [7, 8]. In summary, our contributions are as follows:

  • Propose a third order dynamical system for the sum of two generalized monotone operators.

  • Establish the existence and uniqueness of the trajectories generated by the proposed dynamical system.

  • Provide the exponential convergence analysis of the trajectories to the unique solution of the inclusion, and show that it is faster than classical results.

  • Investigate the temporal discretization of the system and prove the linear convergence of the corresponding forward–backward algorithm with double inertial effects.

  • Study the third order dynamical system for strongly pseudo-monotone variational inequalities.

The paper is structured as follows. In Sect. 2, we introduce some terminologies and results that are necessary for the analysis presented in the subsequent sections. In Sects. 3 and 4, we focus on solving Problem (1.1) under the assumption of generalized monotonicity of the operators involved. In Sect. 3, we propose a third-order dynamical system and establish its exponential convergence to the zero of Problem (1.1). The explicit discretization of this system leads to a new forward backward algorithm studied in Sect. 4. In Sect. 5, motivated by the third-order dynamical system, we find the solution of Problem (1.2) under the assumption of strong pseudo-monotonicity of the operator B.

2 Preliminaries

We start the section with listing the notations used. The set of integers is denoted by \(\mathbb Z\) and the set of real numbers is denoted by \(\mathbb R\). Let \(\mathbb Z_{\ge 1}=\{j\in \mathbb Z:j\ge 1\}\) and \(\mathbb R_{\ge 0}=\{t\in \mathbb R:t\ge 0\}\). The symbol \(g^{(k)}\) stands for the k-th derivative of the function g.

Throughout this work \(\mathcal {H}\) is a real Hilbert space with inner product \(\left<\cdot , \cdot \right>\) and induced norm \(\Vert \cdot \Vert \). We use the notation \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) to indicate that A is a set-valued operator defined on \(\mathcal {H}\), and \(A:\mathcal {H}\rightarrow \mathcal {H}\) to indicate that A is a single-valued operator on \(\mathcal {H}\).

Let A be an operator on \(\mathcal {H}\). The graph of A is \(\text {Gra}A = \{(x,u) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}\). The inverse of A, denoted by \(A^{-1}\), is the operator with graph \(\text {Gra}A^{-1} = \{(u,x) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}\).

2.1 Generalized Monotone Operators

We first recall some generalized versions of monotone operator defined and studied in [21, 29].

Definition 2.1

The operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is called \(\gamma _A\)-monotone if there exists a scalar \(\gamma _A \in \mathbb {R}\) such that

$$\begin{aligned} \langle u-v,x-y \rangle \ge \gamma _A\Vert x-y\Vert ^2 \quad \forall \, (x,u), (y,v) \in \text {Gra}A. \end{aligned}$$

The constant \(\gamma _A\) is referred to the monotonicity modulus of A. We also say that A is maximally \(\gamma _A\)-monotone if it is \(\gamma _A\)-monotone and there is no \(\gamma _A\)-monotone operator whose graph strictly contains GraA.

Remark 2.2

Note that in the definition of generalized monotonicity, \(\gamma _A\) can be negative. If \(\gamma _A = 0\), then the generalized monotonicity reduces to the classical monotonicity. If \(\gamma _A>0\), then A is strongly monotone. Finally, if \(\gamma _A<0\) then A is called weakly-monotone. For more detailed discussion on (maximally) monotone operators and the connection to optimization problems, we refer the readers to [12, 13, 21].

Definition 2.3

The single-valued operator \(T:\mathcal {H}\rightarrow \mathcal {H}\) is called

  1. 1.

    \(\gamma _T\)-strongly pseudo-monotone if \(\gamma _T>0\) and

    $$\begin{aligned} \left\langle {T(x)}, {y-x} \right\rangle \ge 0\Longrightarrow \left\langle {T(y)}, {y-x} \right\rangle \ge \gamma _T\Vert x-y\Vert ^2 \end{aligned}$$

    for all \(x,y\in \mathcal {H}\).

  2. 2.

    \(\gamma _T\)-cocoercive if \(\gamma _T>0\) and

    $$\begin{aligned} \left\langle {T(x)-T(y)}, {x-y} \right\rangle \ge \gamma _T\Vert T(x)-T(y)\Vert ^2\quad \forall x,y\in \mathcal {H}. \end{aligned}$$
  3. 3.

    \(L_T\)-Lipschitz continuous if \(L_T>0\) and

    $$\begin{aligned} \Vert T(x)-T(y)\Vert \le L_T\Vert x-y\Vert \quad \forall x,y\in \mathcal {H}. \end{aligned}$$

Remark 2.4

It is clear from the Cauchy–Schwartz inequality that if T is \(\gamma _T\)-co-coercive then it is \(1/\gamma _T\)-Lipschitz continuous.

The resolvent of an operator A is denoted as \(\mathcal {J}_A\triangleq (I+A)^{-1}\), where I is the identity operator. We will need the following properties of resolvent operator.

Lemma 2.5

[21] Let \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) be an \(\gamma _A\)-monotone operator and let \(\omega >0 \) such that \(1+\omega \gamma _A > 0\). Then the followings hold:

  1. 1.

    \(\mathcal {J}_{\omega A}\) is single-valued;

  2. 2.

    \(\mathcal {J}_{\omega A}\) is \((1+\omega \gamma _A)\)-co-coercive;

  3. 3.

    dom \(\mathcal {J}_{\omega A} = \mathcal {H}\) if and only if A is maximally \(\gamma _A\)-monotone.

2.2 Absolutely Continuous Functions

Definition 2.6

A function \(h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d\) is called locally absolutely continuous if it is absolutely continuous on every compact interval, which means that for each interval \([t_0,t_1]\) there exists an integrable function \(g:[t_0,t_1)\rightarrow \mathbb R^d\) such that

$$\begin{aligned} h(t)=h(t_0)+\int \limits _{t_0}^t g(s)\,ds\quad \forall t\in [t_0,t_1]. \end{aligned}$$

Remark 2.7

If \(h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d\) is a locally absolutely continuous function, then it is differentiable almost everywhere and its derivative agrees with its distributional derivative almost everywhere.

Proposition 2.8

For \(s,u\ge 0\) and \(m\in \mathbb Z_{\ge 1}\), it holds

$$\begin{aligned} \int \limits _u^se^{t\varepsilon }g^{(m)}(t)\,dt= & {} e^{s\varepsilon }\left( \sum _{j=0}^{m-1}(-\varepsilon )^{m-1-j}g^{(j)}(s)\right) +(-\varepsilon )^m\int \limits _u^se^{t\varepsilon }g(t)\,dt\\{} & {} -e^{u\varepsilon }\left( \sum _{j=0}^{m-1}(-\varepsilon )^{m-1-j}g^{(j)}(u)\right) . \end{aligned}$$

Proof

The case when \(m=1\) is done by using integration by parts. Now we suppose that the conclusion holds for m and prove the case \(m+1.\) Indeed, we have

$$\begin{aligned} \int \limits _u^se^{t\varepsilon }g^{(m+1)}(t)\,dt=\int \limits _u^se^{t\varepsilon }\,dg^{(m)}(t)=e^{t\varepsilon }g^{(m)}(t)\bigg |_u^s-\varepsilon \int \limits _u^se^{t\varepsilon }g^{(m)}(t)\,dt, \end{aligned}$$

which together with the induction assumption completes the proof. \(\square \)

2.3 A Third Order Dynamical System

In this paper, we propose the following dynamical system for Problem (1.1).

$$\begin{aligned} y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-\mathcal {J}_{\omega A}(y(t)-\omega B(y(t)))]=0,\nonumber \\ \end{aligned}$$
(2.1)

where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\) and \(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).

The solution of dynamical system (2.1) is understood in the following sense.

Definition 2.9

A function \(y(\cdot )\) is called a strong global solution of Eq. (2.1) if it holds:

  1. 1.

    For every \(j\in \{0,1,2,3\}\), \(y^{(j)}:[t_0,+\infty )\rightarrow \mathcal {H}\) is locally absolutely continuous; in other words, absolutely continuous on each interval \([\delta ,\eta ]\) for \(\eta>\delta >t_0\).

  2. 2.

    \(y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-\mathcal {J}_{\omega A}(y(t)-\omega B(y(t)))]=0\) for almost every \(t\ge t_0\).

  3. 3.

    \(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).

Proposition 2.10

(Equivalent form) Equation (2.1) is equivalent to the system \(x^{(1)}(t)=G(x(t))\), where \(G:\mathcal {H}\times \mathcal {H}\times \mathcal {H}\rightarrow \mathcal {H}\times \mathcal {H}\times \mathcal {H}\) is defined by

$$\begin{aligned} G(x_1,x_2,x_3)=\left( x_2,x_3,-\alpha _1x_2-\alpha _2x_3-\alpha _0[x_1-\mathcal {J}_{\omega A}(x_1-\omega B(x_1))]\right) , \end{aligned}$$

where \((x_1,x_2,x_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}.\)

Proof

The conclusion follows from doing the change of variables

$$\begin{aligned} (x_1(t),x_2(t),x_3(t))=\left( y(t),y^{(1)}(t),y^{(2)}(t)\right) . \end{aligned}$$

\(\square \)

Theorem 2.11

(Existence and uniqueness of a solution) Consider dynamcial system (2.1), where \(\alpha _0,\alpha _1,\alpha _2,\omega >0\) and the operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is \(\gamma _A\)-maximally monotone, \(B:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma _B\)-monotone and L-Lipschitz such that \(1+\omega \gamma _A > 0\). Then for each \(v_0,v_1,v_2\in \mathcal {H}\) there exists a unique strong global solution of (2.1).

Proof

We endow \(\mathcal {H}\times \mathcal {H}\times \mathcal {H}\) with scalar product

$$\begin{aligned} \left\langle {(y_1,y_2,y_3)}, {(z_1,z_2,z_3)} \right\rangle _{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}= \left\langle {y_1}, {z_1} \right\rangle +\left\langle {y_2}, {z_2} \right\rangle +\left\langle {y_3}, {z_3} \right\rangle . \end{aligned}$$

We show that the operator G is Lipschitz. Indeed, let \(y=(y_1,y_2,y_3),z=(z_1,z_2,z_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}\). We have

$$\begin{aligned}&\Vert \mathcal {J}_{\omega A}(y_1-\omega B(y_1))-\mathcal {J}_{\omega A}(z_1-\omega B(z_1))\Vert ^2\\&\quad \le \frac{1}{(1+\omega \gamma _A)^2} \Vert y_1-z_1-\omega (B(y_1)-B(z_1))\Vert ^2\\&\quad \le \frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\Vert y_1-z_1\Vert ^2, \end{aligned}$$

and so

$$\begin{aligned}{} & {} \Vert G(y)-G(z)\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}} \le \Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2\\{} & {} \quad +(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\Big [\Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2+\Vert y_1-z_1\Vert ^2\\{} & {} \quad +\Vert \mathcal {J}_{\omega A}(y_1-\omega B(y_1))-\mathcal {J}_{\omega A}(z_1-\omega B(z_1))\Vert ^2\Big ]\\{} & {} \quad \le \Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2\\{} & {} \qquad +(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\left[ 1+\frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\right] \Vert y-z\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}\\{} & {} \quad \le \left[ 1+(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\left( 1+\frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\right) \right] \cdot \Vert y-z\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}. \end{aligned}$$

By using the Cauchy–Picard theorem (see, for example, [28, Proposition 6.2.1]), we get the existence and uniqueness of a strong global solution. \(\square \)

2.4 Difference Operators

In the section, we give the discrete counterpart of the dynamical system (2.1). To that aim, we recall the operation of forward difference and its properties used in the convergence analysis. For \(z:\mathbb Z\rightarrow \mathcal {H}\) and \(\kappa \in \mathbb Z_{\ge 1}\), we denote

$$\begin{aligned} z^{\Delta ^{(\kappa +1)}}\triangleq (z^{\Delta ^{(\kappa )}})^\Delta , \quad \text {where }z^\Delta (n)\triangleq z(n+1)-z(n). \end{aligned}$$

Remark 2.12

Let \(f,g,h:\mathbb Z\rightarrow \mathcal {H}\) and \(\theta \in \mathbb R\). It can be proven that

$$\begin{aligned} \left\langle {h}, {g} \right\rangle ^\Delta (n)=\left\langle {h^\Delta (n)}, {g^\Delta (n)} \right\rangle +\left\langle {h^\Delta (n)}, {g(n)} \right\rangle +\left\langle {h(n)}, {g^\Delta (n)} \right\rangle , \end{aligned}$$

and consequently

$$\begin{aligned}{} & {} \theta ^{n+1}g^\Delta (n)=(\theta ^n g)^\Delta (n)+(1-\theta )\theta ^ng(n),\\{} & {} (\Vert f\Vert ^2)^\Delta (n)=\Vert f^\Delta (n)\Vert ^2+2\left\langle {f^\Delta (n)}, {f(n)} \right\rangle . \end{aligned}$$

Consider the difference equation, which is the discrete version of (2.1):

$$\begin{aligned} z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)+\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))]=0,\nonumber \\ \end{aligned}$$
(2.2)

where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\).

Proposition 2.13

(Equivalent form) Equation (2.2) has an equivalent form

$$\begin{aligned} z(n+3)&=(3-\alpha _2)z(n+2)+(2\alpha _2-\alpha _1-3)z(n+1)\nonumber \\&\quad +(\alpha _1+1-\alpha _2)z(n)-\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))]. \end{aligned}$$
(2.3)

Proof

The proof makes use of the facts that

$$\begin{aligned}{} & {} z^{\Delta ^{(2)}}(n)=z(n+2)-2z(n+1)+z(n),\\{} & {} z^{\Delta ^{(3)}}(n)=z(n+3)-3z(n+2)+3z(n+1)-z(n). \end{aligned}$$

\(\square \)

Remark 2.14

The numerical scheme (2.3) can be re-written as

$$\begin{aligned} z(n+3)&=z(n+2)+(2-\alpha _2)(z(n+2)-z(n+1))\nonumber \\&\quad + (\alpha _2-\alpha _1-1)(z(n+1)-z(n))\nonumber \\&\quad -\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))], \end{aligned}$$
(2.4)

which is a forward–backward algorithm with double momentum.

3 Continuous Time Dynamical System

In this section, we will establish the exponential convergence of dynamical system (2.1) under the following assumption and notations.

Assumption 3.1

(i) The coefficients \(\alpha _0,\alpha _1,\alpha _2>0\).

(ii) The operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is maximally \(\gamma _A\)-monotone, \(B:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma _B\)-monotone and L-Lipschitz continuous such that

$$\begin{aligned} \gamma \triangleq \gamma _A+\gamma _B >0. \end{aligned}$$
(3.1)

(iii) The parameter \(\omega >0\) satisfies

$$\begin{aligned}{} & {} 1+\omega \gamma _A > 0,\end{aligned}$$
(3.2)
$$\begin{aligned}{} & {} \frac{1}{\omega }>\frac{L^2}{4\gamma }+L-\gamma . \end{aligned}$$
(3.3)

Remark 3.1

Condition (3.1) implies that the sum operator \(A+B\) is strongly monotone but not the individual operator AB. A similar condition was studied in [21]. An direct application of this model is to minimize a sum of a weakly convex function and a strongly convex function. Condition (3.2) is imposed to ensure that the resolvent operator \(\mathcal {J}_{\omega A}\) is single valued. Finally, condition (3.3) means that the stepsize \(\omega \) must be bounded from above. Note that condition (3.3) gives

$$\begin{aligned} 1>\frac{L^2}{4\gamma }\cdot \frac{1}{\frac{1}{\omega }+\gamma -L}. \end{aligned}$$

Hence, we can find \(\theta >0\) such that

$$\begin{aligned} 1>\theta >\frac{L^2}{4\gamma }\cdot \frac{1}{\frac{1}{\omega }+\gamma -L}. \end{aligned}$$
(3.4)

The following notations are used.

$$\begin{aligned} \ell \triangleq \frac{2\omega }{2\omega \gamma +1}\left( \frac{1}{\omega }+\gamma -L-\frac{L^2}{4\theta \gamma }\right) ,\quad \delta \triangleq \frac{2\omega \gamma (1-\theta )}{2\omega \gamma +1}. \end{aligned}$$
(3.5)

3.1 Global Exponential Convergence

First, we consider the dynamical system (2.1) whose the global convergence relates to the following parameters

$$\begin{aligned} {\left\{ \begin{array}{ll} A_2\triangleq \frac{\ell \alpha _1}{\alpha _0},\\ A_1\triangleq \frac{\ell \alpha _2\alpha _1}{\alpha _0}-3,\\ A_0\triangleq \frac{\ell \alpha _1^2}{\alpha _0}-2\alpha _2, \end{array}\right. } {\left\{ \begin{array}{ll} B_1\triangleq \frac{\ell \alpha _2}{\alpha _0},\\ B_0\triangleq \frac{\ell }{\alpha _0}(\alpha _2^2-2\alpha _1), \end{array}\right. } C_0\triangleq \frac{\ell }{\alpha _0}. \end{aligned}$$
(3.6)

We denote the functions

$$\begin{aligned} a(t)\triangleq \Vert y(t)-x_*\Vert ^2,\quad b_k(t)\triangleq \Vert y^{(k)}(t)\Vert ^2. \end{aligned}$$
(3.7)

Theorem 3.2

Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6). Assume that there exists \(\varepsilon >0\) such that the following conditions hold

$$\begin{aligned}{} & {} -\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\delta \alpha _0\ge 0, \end{aligned}$$
(3.8)
$$\begin{aligned}{} & {} A_2\varepsilon ^2-A_1\varepsilon +A_0\ge 0,\end{aligned}$$
(3.9)
$$\begin{aligned}{} & {} 3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1\ge 0,\end{aligned}$$
(3.10)
$$\begin{aligned}{} & {} -2A_2\varepsilon +A_1\ge 0,\end{aligned}$$
(3.11)
$$\begin{aligned}{} & {} -B_1\varepsilon +B_0\ge 0,\end{aligned}$$
(3.12)
$$\begin{aligned}{} & {} \alpha _2>2\varepsilon . \end{aligned}$$
(3.13)

Then the trajectories \(y(\cdot )\) generated by dynamical system (2.1) converges exponentially to \(x_*\), i.e., there exist positive numbers \(\mu , \eta \) such that

$$\begin{aligned} \Vert y(t) -x^* \Vert \le \mu \, \Vert y(t_0)-x^*\Vert \, e^{- \eta t} \quad \forall t \ge t_0. \end{aligned}$$

Proof

In the next arguments, we often use the identities:

$$\begin{aligned}{} & {} b_1^{(1)}(t)=2\left\langle {y^{(2)}(t)}, {y^{(1)}(t)} \right\rangle ,\\{} & {} b_1^{(2)}(t)=2\left\langle {y^{(3)}(t)}, {y^{(1)}(t)} \right\rangle +2\Vert y^{(2)}(t)\Vert ^2=2\left\langle {y^{(3)}(t)}, {y^{(1)}(t)} \right\rangle +2b_2(t). \end{aligned}$$

Since

$$\begin{aligned} a^{(1)}(t)&=2\left\langle {y^{(1)}(t)}, {y(t)-x_*} \right\rangle ,\\ a^{(2)}(t)&=2\left\langle {y^{(2)}(t)}, {y(t)-x_*} \right\rangle +2b_1(t),\\ a^{(3)}(t)&=2\left\langle {y^{(3)}(t)}, {y(t)-x_*} \right\rangle +3b_1^{(1)}(t), \end{aligned}$$

we have

$$\begin{aligned}{} & {} 2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad =a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)-3b_1^{(1)}(t)-2\alpha _2b_1(t). \end{aligned}$$
(3.14)

We observe

$$\begin{aligned}{} & {} \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \quad =\alpha _1b_1^{(2)}(t)+\alpha _2\alpha _1b_1^{(1)}(t)+\alpha _1^2b_1(t)+\alpha _2b_2^{(1)}(t)+(\alpha _2^2-2\alpha _1)b_2(t)+b_3(t).\nonumber \\ \end{aligned}$$
(3.15)

Using the definition of resolvent, equation (2.1) gives the following

$$\begin{aligned}{} & {} B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))\\{} & {} \quad -\frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\omega \alpha _0}\\{} & {} \quad \in (A+B)\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) , \end{aligned}$$

which combined with \(0\in (A+B)(x_*)\) and the \(\gamma \)-monotonicity of \(A+B\) implies

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad \le \Biggl <B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) \\{} & {} \qquad -B(y(t))-\frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\omega \alpha _0},\\{} & {} \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\Biggl >. \end{aligned}$$

Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }\right) \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\\{} & {} \qquad +\left\langle {B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))}, {y(t)-x_*} \right\rangle \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle . \end{aligned}$$

Note that by the Cauchy–Schwarz inequality

$$\begin{aligned}{} & {} \left\langle {B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))}, {y(t)-x_*} \right\rangle \\{} & {} \quad \le \frac{L}{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert \cdot \Vert y(t)-x_*\Vert \\{} & {} \quad \le \frac{L^2}{4\alpha _0^2\theta \gamma }\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\theta \gamma a(t). \end{aligned}$$

Thus, we get

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\nonumber \\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }+\frac{L^2}{4\theta \gamma }\right) \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle +\theta \gamma a(t). \end{aligned}$$
(3.16)

Note that

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad =\frac{\gamma }{\alpha _0^2}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\gamma a(t)\\{} & {} \qquad +\frac{2\gamma }{\alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle . \end{aligned}$$

Inserting the equality above into (3.16), we obtain

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\delta \alpha _0a(t)\nonumber \\{} & {} \quad +2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \le 0, \end{aligned}$$
(3.17)

which implies, by (3.14) and (3.15), that

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\delta \alpha _0a(t)\\{} & {} \quad +A_2b_1^{(2)}(t)+A_1b_1^{(1)}(t)+A_0b_1(t) +B_1b_2^{(1)}(t)+B_0b_2(t)+C_0b_3(t)\le 0. \end{aligned}$$

By (3.4), we have \(\ell >0\) and so is \(C_0\). Thus, we can write

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\delta \alpha _0a(t)\\{} & {} \quad +A_2b_1^{(2)}(t)+A_1b_1^{(1)}(t)+A_0b_1(t) +B_1b_2^{(1)}(t)+B_0b_2(t)\le 0. \end{aligned}$$

Multiplying both sides by \(e^{\varepsilon (t-t_0)}\) and then using Proposition 2.8, we get

$$\begin{aligned}{} & {} e^{\varepsilon (s-t_0)} [(\varepsilon ^2-\alpha _2\varepsilon +\alpha _1)a(s) +(\alpha _2-\varepsilon )a^{(1)}(s)+a^{(2)}(s)]\\{} & {} \quad +(-\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\delta \alpha _0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}a(t)dt\\{} & {} \quad +e^{\varepsilon (s-t_0)} [(-A_2\varepsilon +A_1)b_1(s)+A_2b_1^{(1)}(s)]\\{} & {} \quad +(A_2\varepsilon ^2-A_1\varepsilon +A_0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}b_1(t)dt\\{} & {} \quad +B_1e^{\varepsilon (s-t_0)}b_2(s) +(-B_1\varepsilon +B_0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}b_2(t)dt\le D_1, \end{aligned}$$

for some constant \(D_1\), which implies, after using (3.8), (3.9), (3.12) and (3.4), that

$$\begin{aligned}{} & {} e^{\varepsilon (s-t_0)} [(\varepsilon ^2-\alpha _2\varepsilon +\alpha _1)a(s) +(\alpha _2-\varepsilon )a^{(1)}(s)+a^{(2)}(s)]\\{} & {} \quad +e^{\varepsilon (s-t_0)} [(-A_2\varepsilon +A_1)b_1(s)+A_2b_1^{(1)}(s)] +B_1e^{\varepsilon (s-t_0)}b_2(s)\le D_1, \end{aligned}$$

Intergrating the above inequality with respect to the variable \(s\in [t_0;t]\) we deduce

$$\begin{aligned}{} & {} e^{\varepsilon (t-t_0)}a^{(1)}(t)+(\alpha _2-2\varepsilon )e^{\varepsilon (t-t_0)}a(t) +(3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1)\int \limits _{t_0}^te^{\varepsilon (s-t_0)}a(s)ds\\{} & {} \quad +A_2e^{\varepsilon (t-t_0)}b_1(t)+(-2A_2\varepsilon +A_1)\int \limits _{t_0}^te^{\varepsilon (s-t_0)}b_1(s)ds \le D_1t+D_2. \end{aligned}$$

for some constant \(D_2\). Using (3.10), (3.11), (3.4), we get

$$\begin{aligned} e^{\varepsilon (t-t_0)}a^{(1)}(t)+(\alpha _2-2\varepsilon )e^{\varepsilon (t-t_0)}a(t) \le D_1t+D_2. \end{aligned}$$
(3.18)

Note that equation (3.18) reduces to the following

$$\begin{aligned}{} & {} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}D_3\\{} & {} \quad +e^{-(\alpha _2-2\varepsilon )(t-t_0)}\int \limits _{t_0}^te^{(\alpha _2-3\varepsilon )(s-t_0)}(D_1s+D_2)\,ds. \end{aligned}$$

for some constant \(D_3\).

  • If \(\alpha _2\ge 3\varepsilon \), then \(e^{(\alpha _2-3\varepsilon )(s-t_0)}\le e^{(\alpha _2-3\varepsilon )(t-t_0)}\), and so

    $$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}D_3 +e^{-\varepsilon (t-t_0)}\int \limits _{t_0}^t(D_1s+D_2)\,ds. \end{aligned}$$
    (3.19)
  • If \(2\varepsilon<\alpha _2<3\varepsilon \), then \(e^{(\alpha _2-3\varepsilon )(s-t_0)}\le 1\), and so

    $$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}\left( D_3 +\int \limits _{t_0}^t(D_1s+D_2)\,ds\right) . \end{aligned}$$
    (3.20)

The arguments above show that \(y(\cdot )\) converges exponentially to \(x_*\). \(\square \)

Remark 3.3

It follows from (3.19) that a(t) converges to 0 with the rate of \(O((Pt^2 +Qt +Rt )e^{-\varepsilon t})\) for some constants PQR, while the rate obtained from (3.20) is \(O((Pt^2 +Qt +Rt )e^{-(\alpha _2 - 2 \varepsilon ) t})\). With the suitable choice of \(\varepsilon \) and \(\alpha _2\), these rates can be controlled so that they are faster than the rate \(O(e^{-t})\) of the second order dynamical systems established in [16].

3.2 Parameters Choices

We now discuss the question “how to find \(\varepsilon \)?”. It can be seen from (3.19) that the larger \(\varepsilon \) implies the faster rate. Finding the maximal value of \(\varepsilon \) is cumbersome as it depends on many other parameters. However, we will discuss how to find a "good enough" \(\varepsilon \) in this section. The following remark offers a way which concerns the coefficients.

Remark 3.4

If \(A_0,A_1,B_0\) satisfy

$$\begin{aligned} A_0,A_1,B_0>0, \end{aligned}$$
(3.21)

then conditions (3.8)–(3.13) can be obtained by letting \(\varepsilon \rightarrow 0^+.\)

In the following result, we simplify the assumption (3.21) in algebraic terms of the coefficients \(\alpha _0,\alpha _1,\alpha _2.\)

Corollary 3.5

Consider equation (2.1), under Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4). Denote (3.5)–(3.6). Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _1<\frac{\alpha _2^2}{2}, \end{aligned}$$
(3.22)
$$\begin{aligned}{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{\alpha _1\alpha _2}{3},\frac{\alpha _1^2}{2\alpha _2}\right\} . \end{aligned}$$
(3.23)

Let us first examine Theorem 3.2 when \(\varepsilon =1\), for which it matches the rate obtained for the second order dynamical system established in [16].

Theorem 3.6

Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and

$$\begin{aligned} \varphi \triangleq \frac{1}{\ell \delta }. \end{aligned}$$
(3.24)

Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy

$$\begin{aligned}{} & {} \alpha _2>\max \{3,3\varphi +2,4\varphi \}, \end{aligned}$$
(3.25)
$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \left\{ 2\alpha _2-3,\varphi (2\alpha _2-3)\right\}<\alpha _1< \overline{\beta }\triangleq 0.5\alpha _2(\alpha _2-1), \end{aligned}$$
(3.26)
$$\begin{aligned}{} & {} \underline{q}\triangleq \dfrac{\alpha _1-\alpha _2+1}{\delta }<\alpha _0< \overline{p}\triangleq \ell \cdot \min \left\{ \dfrac{\alpha _1(\alpha _2-2)}{3}, \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}\right\} .\nonumber \\ \end{aligned}$$
(3.27)

Proof

First, we show that (3.25) ensures the validity of (3.26); that is \(\underline{\beta }<\overline{\beta }\). Indeed, it follows from (3.25) that \(\alpha _2>3\) and so \(2\alpha _2-3<\frac{\alpha _2(\alpha _2-1)}{2}\). Also from (3.25), we have \(\alpha _2>4\varphi \) and then

$$\begin{aligned} \dfrac{\alpha _2(\alpha _2-1)}{2}>\dfrac{\alpha _2(2\alpha _2-3)}{4}>\varphi (2\alpha _2-3). \end{aligned}$$

Next, we show that (3.25)–(3.26) ensure the validity of (3.27); that is \(\underline{q}<\overline{p}\). It results from (3.25) that \(\alpha _2>3\varphi +2\) and so

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\delta }<\frac{\alpha _1}{\delta } <\ell \cdot \dfrac{\alpha _1(\alpha _2-2)}{3}. \end{aligned}$$

Meanwhile, by (3.26), we have \(\alpha _1>\varphi (2\alpha _2-3)\), which gives

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\delta }<\ell \cdot \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}. \end{aligned}$$

Now we can obtain the exponential convergence of \(y(\cdot )\) by using Theorem 3.2 for \(\varepsilon =1\). \(\square \)

Now let us examine Theorem 3.2 when \(\varepsilon =2\). In this case, we will obtain from (3.19) that the convergence rate of a(t) is

$$\begin{aligned} O((Pt^2+Qt+R) e^{-2t}), \end{aligned}$$

which is faster than the rate \(O(e^{-t})\) obtained in [16] for the second order dynamical system.

Theorem 3.7

Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and (3.24). Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy

$$\begin{aligned}{} & {} \alpha _2>\max \{8\varphi ,6,6\varphi +4\}, \end{aligned}$$
(3.28)
$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \{4(\alpha _2-3),4\varphi (\alpha _2-3)\}<\alpha _1<\overline{\beta }\triangleq \frac{1}{2}\alpha _2(\alpha _2-2),\end{aligned}$$
(3.29)
$$\begin{aligned}{} & {} \underline{q}\triangleq \frac{2}{\delta }(\alpha _1-2\alpha _2+4)<\alpha _0<\overline{p}\triangleq \ell \alpha _1\cdot \min \left\{ \frac{\alpha _1-2\alpha _2+4}{2(\alpha _2-3)},\frac{1}{3}(\alpha _2-4)\right\} .\nonumber \\ \end{aligned}$$
(3.30)

Proof

Like in Theorem 3.6, we must check \(\underline{\beta }<\overline{\beta }\). Indeed, we have

$$\begin{aligned} \frac{1}{2}\alpha _2(\alpha _2-2)>\frac{1}{2}\alpha _2(\alpha _2-3)>4\varphi (\alpha _2-3). \end{aligned}$$

Next is to prove \(\underline{q}<\overline{p}\). It follows from (3.29) that

$$\begin{aligned} \alpha _1>4(\alpha _2-3)>2(\alpha _2-2), \end{aligned}$$

which gives

$$\begin{aligned} \alpha _1-2\alpha _2+4>0. \end{aligned}$$
(3.31)

Again using (3.29), we get \(\alpha _1>4\varphi (\alpha _2-3)\), which gives

$$\begin{aligned} \frac{\ell \alpha _1}{2(\alpha _2-3)}>\frac{2}{\delta }. \end{aligned}$$
(3.32)

From (3.31)–(3.32), we obtain

$$\begin{aligned} \ell \alpha _1\cdot \frac{\alpha _1-2\alpha _2+4}{2(\alpha _2-3)}>\frac{2}{\delta }(\alpha _1-2\alpha _2+4). \end{aligned}$$
(3.33)

We observe \(\alpha _1(\alpha _2-6\varphi -4)+12\varphi (\alpha _2-2)>0\), which is equivalent to saying that

$$\begin{aligned} \ell \alpha _1\cdot \frac{1}{3}(\alpha _2-4)>\frac{2}{\delta }(\alpha _1-2\alpha _2+4). \end{aligned}$$
(3.34)

Hence, the inequality \(\underline{q}<\overline{p}\) follows from (3.33)–(3.34). We left the reader to checking (3.8)–(3.13). Thus, \(y(\cdot )\) converges exponentially to \(x_*\). Moreover, it follows from (3.19) that the convergence rate is

$$\begin{aligned} O((Pt^2+Qt+R) e^{-2t}), \end{aligned}$$

for some constants PQR. \(\square \)

4 Discrete Time Dynamical System

In this section, we establish the linear convergence of the numerical scheme (2.3) for solving (1.1) under the following additional assumption.

Assumption 4.1

The coefficients \(\alpha _0,\alpha _1,\alpha _2\) satisfy

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}(1-\alpha _2+\alpha _1)> 1, \end{aligned}$$
(4.1)
$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}(2\alpha _1-\alpha _2)< 3,\end{aligned}$$
(4.2)
$$\begin{aligned}{} & {} \frac{\ell \alpha _1}{\alpha _0}>3, \end{aligned}$$
(4.3)

where \(\ell \) is defined in (3.5).

We denote the following parameters

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} D_2\triangleq \frac{\ell \alpha _1}{\alpha _0}-3,\\ D_1\triangleq \frac{\ell \alpha _2\alpha _1}{\alpha _0}-2\alpha _2-3,\\ D_0\triangleq \frac{\ell \alpha _1^2}{\alpha _0}-2\alpha _2-\alpha _1, \end{array}\right. } {\left\{ \begin{array}{ll} E_1\triangleq \frac{\ell }{\alpha _0}(\alpha _2-2\alpha _1)+3,\\ E_0\triangleq \frac{\ell }{\alpha _0}(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)+\alpha _2+3, \end{array}\right. }\end{aligned}$$
(4.4)
$$\begin{aligned}{} & {} F_0\triangleq \frac{\ell }{\alpha _0}(1-\alpha _2+\alpha _1)-1 \end{aligned}$$
(4.5)

and

$$\begin{aligned} u(n)\triangleq \Vert z(n)-x_*\Vert ^2,\quad c_k(t)\triangleq \Vert z^{\Delta ^{(k)}}(n)\Vert ^2. \end{aligned}$$
(4.6)

Remark 4.1

Under Assumption 4.1, we have \(F_0,E_1,D_2>0.\) Note also that under Assumption 3.1, the stepsize \(\omega \) must be bounded from above, i.e.

$$\begin{aligned} \omega < \frac{4\gamma }{L^2 + 4L\gamma - 4 \gamma ^2}. \end{aligned}$$

This upper bound of \(\omega \) is larger than that of the classical forward–backward algorithm, which is \(\omega < \frac{2\gamma }{L^2}\) (see e.g. [13, Proposition 25.9]) when A is maximal monotone and B is \(\gamma -\) strongly monotone and L-Lipschitz continuous.

4.1 Global Linear Convergence

Theorem 4.2

Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5), (4.4), (4.5) and Assumption 4.1 holds. Assume that there exists \(\xi >0,\xi \ne 1\) such that the following conditions hold

$$\begin{aligned}{} & {} -\xi ^3+\alpha _2\xi ^2-\alpha _1\xi +\delta \alpha _0\ge 0, \end{aligned}$$
(4.7)
$$\begin{aligned}{} & {} D_2\xi ^2-D_1\xi +D_0\ge 0,\end{aligned}$$
(4.8)
$$\begin{aligned}{} & {} 3\xi ^2-2\alpha _2\xi +\alpha _1\ge 0,\end{aligned}$$
(4.9)
$$\begin{aligned}{} & {} -2D_2\xi +D_1\ge 0,\end{aligned}$$
(4.10)
$$\begin{aligned}{} & {} -E_1\xi +E_0\ge 0,\end{aligned}$$
(4.11)
$$\begin{aligned}{} & {} \alpha _2>3\xi . \end{aligned}$$
(4.12)

Then z(n) converges linearly to \(x_*\), i.e. there exist \(M>0\) and \(q \in (0,1)\) such that

$$\begin{aligned} \Vert z(n) - x_*\Vert \le M q^n \quad \forall n. \end{aligned}$$

Proof

Since

$$\begin{aligned}{} & {} u^\Delta (n)=2\left\langle {z^\Delta (n)}, {z(n)-x_*} \right\rangle +c_1(n),\\{} & {} u^{\Delta ^{(2)}}(n)=2\left\langle {z^{\Delta ^{(2)}}(n)}, {z(n)-x_*} \right\rangle +2c_1^\Delta (n)+2c_1(n)-c_2(n),\\{} & {} u^{\Delta ^{(3)}}(n)=2\left\langle {z^{\Delta ^{(3)}}(n)}, {z(n)-x_*} \right\rangle \\{} & {} \quad +3c_1^{\Delta ^{(2)}}(n)+3c_1^\Delta (n)-3c_2^\Delta (n)-3c_2(n) +c_3(n), \end{aligned}$$

we have

$$\begin{aligned}{} & {} 2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)}, {z(n)-x_*} \right\rangle \nonumber \\{} & {} \quad =u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)\nonumber \\{} & {} \qquad -3c_1^{\Delta ^{(2)}}(n)-(2\alpha _2+3)c_1^\Delta (n)-(2\alpha _2+\alpha _1)c_1(n)\nonumber \\{} & {} \qquad +3c_2^\Delta (n)+(\alpha _2+3)c_2(n)-c_3(n). \end{aligned}$$
(4.13)

We observe

$$\begin{aligned}{} & {} \Vert z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)\Vert ^2\nonumber \\{} & {} \quad =\alpha _1 c_1^{\Delta ^{(2)}}(n)+\alpha _2\alpha _1 c_1^\Delta (n)+\alpha _1^2 c_1(n)\nonumber \\{} & {} \qquad +(\alpha _2-2\alpha _1)c_2^{\Delta }(n)+(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)c_2(n)+(1-\alpha _2+\alpha _1)c_3(n).\nonumber \\ \end{aligned}$$
(4.14)

Using the definition of resolvent, equation (2.2) gives

$$\begin{aligned}{} & {} B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) \\{} & {} \quad -B(z(n))-\frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\omega \alpha _0}\\{} & {} \quad \in (A+B)\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) , \end{aligned}$$

which combined with \(0\in (A+B)(x_*)\) and the \(\gamma \)-monotonicity of \(A+B\) implies

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad \le \Biggl <B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) \\{} & {} \qquad -B(z(n))-\frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\omega \alpha _0},\\{} & {} \qquad \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\Biggl >. \end{aligned}$$

Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad \le \left\langle {B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) -B(z(n))}, {z(n)-x_*} \right\rangle \\{} & {} \qquad +\frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }\right) \Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2\\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$

Note that by the Cauchy–Schwarz inequality

$$\begin{aligned}{} & {} \left\langle {B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) -B(z(n))}, {z(n)-x_*} \right\rangle \\{} & {} \quad \le \frac{L}{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert \cdot \Vert z(n)-x_*\Vert \\{} & {} \quad \le \frac{L^2}{4\theta \gamma \alpha _0^2}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2 +\theta \gamma u(n). \end{aligned}$$

Thus, we get

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\nonumber \\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }+\frac{L^2}{4\theta \gamma }\right) \Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\theta \gamma u(n)\nonumber \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$
(4.15)

Note that

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad =\frac{\gamma }{\alpha _0^2}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\gamma u(n)\\{} & {} \qquad +\frac{2\gamma }{\alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$

Inserting the equality above into (4.15), we get

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\delta \alpha _0u(n)\\{} & {} \quad +2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle \le 0, \end{aligned}$$

which implies, by (4.13) and (4.14), that

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\delta \alpha _0u(n)\\{} & {} \quad +D_2c_1^{(2)}(n)+D_1c_1^{(1)}(n)+D_0c_1(n) +E_1c_2^{(1)}(n)+E_0c_2(n)+F_0c_3(n)\le 0. \end{aligned}$$

By (4.1), the inequality above gives

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\delta \alpha _0u(n)\\{} & {} \quad +D_2c_1^{\Delta ^{(2)}}(n)+D_1c_1^\Delta (n)+D_0c_1(n) +E_1c_2^\Delta (n)+E_0c_2(n)\le 0. \end{aligned}$$

Setting

$$\begin{aligned} \varepsilon \triangleq \frac{1}{1-\xi }. \end{aligned}$$

Then \(\varepsilon >1\) and conditions (4.7)–(4.12) can be written as

$$\begin{aligned}{} & {} \delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2\ge 0, \end{aligned}$$
(4.16)
$$\begin{aligned}{} & {} D_0\varepsilon ^2+D_1(1-\varepsilon )\varepsilon +D_2(1-\varepsilon )^2\ge 0,\end{aligned}$$
(4.17)
$$\begin{aligned}{} & {} \alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2\ge 0,\end{aligned}$$
(4.18)
$$\begin{aligned}{} & {} D_1\varepsilon +2D_2(1-\varepsilon )\ge 0,\end{aligned}$$
(4.19)
$$\begin{aligned}{} & {} E_0\varepsilon +E_1(1-\varepsilon )\ge 0,\end{aligned}$$
(4.20)
$$\begin{aligned}{} & {} \varepsilon \alpha _2+3(1-\varepsilon )>0. \end{aligned}$$
(4.21)

Through multiplying both sides by \(\varepsilon ^{n+3}\) and then using Remark 2.12,

$$\begin{aligned}{} & {} (\varepsilon ^{n+2}u^{\Delta ^{(2)}})^\Delta (n)+(\alpha _2\varepsilon +1-\varepsilon )(\varepsilon ^{n+1}u^\Delta )^\Delta (n)\\{} & {} \quad +[\alpha _1\varepsilon ^2+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )](\varepsilon ^nu)^\Delta (n)\\{} & {} \quad +\underbrace{[\delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.16))}}\varepsilon ^nu(n)\\{} & {} \quad +D_2(\varepsilon ^{n+2}c_1^\Delta )^\Delta (n)+[D_1\varepsilon -D_2(\varepsilon -1)](\varepsilon ^{n+1}c_1)^\Delta (n)\\{} & {} \quad +\underbrace{[D_0\varepsilon ^2+D_1(1-\varepsilon )\varepsilon +D_2(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.17))}}\varepsilon ^{n+1}c_1(n)\\{} & {} \quad +E_1(\varepsilon ^{n+2}c_2)^\Delta (n)+\underbrace{[-E_1(\varepsilon -1)+E_0\varepsilon ]}_{\ge 0\quad \text {(by (4.20))}}\varepsilon ^{n+2}c_2(n)\le 0. \end{aligned}$$

Let \(m\in \mathbb Z_{\ge 1}\). After summing from \(n=0\) to \(n=m-1\),

$$\begin{aligned}{} & {} \varepsilon ^{m+2}u^{\Delta ^{(2)}}(m)+(\alpha _2\varepsilon +1-\varepsilon )\varepsilon ^{m+1}u^\Delta (m) +[\alpha _1\varepsilon ^2+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )]\varepsilon ^mu(m)\\{} & {} \quad +D_2\varepsilon ^{m+2}c_1^\Delta (m)+[D_1\varepsilon -D_2(\varepsilon -1)]\varepsilon ^{m+1}c_1(m) +\underbrace{E_1\varepsilon ^{m+2}c_2(m)}_{\ge 0\quad \text {(by (4.2))}}\le M_1, \end{aligned}$$

where \(M_1\) is some positive constant. Again using Remark 2.12,

$$\begin{aligned}{} & {} (\varepsilon ^{m+1}u^\Delta )^\Delta (m)+[\alpha _2\varepsilon +2(1-\varepsilon )](\varepsilon ^m u)^\Delta (m)\\{} & {} \qquad +\underbrace{[\alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.18))}}\varepsilon ^mu(m)\\{} & {} \quad +D_2(\varepsilon ^{m+1}c_1)^\Delta (m) +\underbrace{[D_1\varepsilon -2D_2(\varepsilon -1)]}_{\ge 0\quad \text {(by (4.19))}}\varepsilon ^{m+1}c_1(m)\le M_1. \end{aligned}$$

Let \(\kappa \in \mathbb Z_{\ge 2}\). After summing from \(m=1\) to \(m=\kappa -1\),

$$\begin{aligned} \varepsilon ^{\kappa +1}u^\Delta (\kappa )+[\alpha _2\varepsilon +2(1-\varepsilon )]\varepsilon ^\kappa u(\kappa )+\underbrace{D_2\varepsilon ^{\kappa +1}c_1(\kappa )}_{\ge 0\quad \text {(by (4.3))}}\le M_1\kappa +M_2, \end{aligned}$$

where \(M_2\) is some positive constant. Again using Remark 2.12,

$$\begin{aligned} (\varepsilon ^\kappa u)^\Delta (\kappa )+\underbrace{[\alpha _2\varepsilon +3(1-\varepsilon )]\varepsilon ^\kappa u(\kappa )}_{\ge 0\quad \text {(by (4.21))}}\le M_1\kappa +M_2, \end{aligned}$$

which implies, after summing from \(\kappa =2\) to \(\kappa =n-1\), that

$$\begin{aligned} \varepsilon ^n u(n)\le M_1n^2+M_2n+M_3\le M_4n^2. \end{aligned}$$

Here \(n\in \mathbb Z_{\ge 3}\) and \(M_3,M_4\) are some positive constants. Let q such that \(1<q<\varepsilon \). We have

$$\begin{aligned} u(n)\le \frac{M_4n^2}{\varepsilon ^n}=\left( \frac{q}{\varepsilon }\right) ^n\cdot \frac{M_4n^2}{q^n}\le M_5\left( \frac{q}{\varepsilon }\right) ^n, \end{aligned}$$

where \(M_5\) is some constant. The inequality above means that z(n) converges linearly to \(x_*\). \(\square \)

4.2 Parameters Choices

Let us discuss now how to choose the parameters fulfilling all Assumptions in Theorem 4.2. Note that if \(D_0,D_1,E_0\) satisfy

$$\begin{aligned} D_0,D_1,E_0>0, \end{aligned}$$
(4.22)

then conditions (4.16)–(4.21) hold by letting \(\xi \rightarrow 0^+.\)

The following result simplifies the assumption (4.22) in algebraic terms of the coefficients \(\alpha _0,\alpha _1,\alpha _2\).

Corollary 4.3

Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5) (4.4), (4.5). Then z(n) converges linearly to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy

$$\begin{aligned}{} & {} \alpha _2<2, \end{aligned}$$
(4.23)
$$\begin{aligned}{} & {} \max \{0,\alpha _2-1\}<\alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\end{aligned}$$
(4.24)
$$\begin{aligned}{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{\alpha _1^2}{\alpha _1+2\alpha _2},1-\alpha _2+\alpha _1\right\} . \end{aligned}$$
(4.25)

Proof

Since \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\), we have \(E_0>\alpha _2+3>0\). Also using \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\) and the fact that \(\alpha _2<2\), we get

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{\alpha _2}{2}, \end{aligned}$$
(4.26)

which gives (4.2). It follows from (4.26) that \(\alpha _1<\alpha _2\) and so

$$\begin{aligned} \frac{\ell \alpha _1}{3}>\frac{\ell \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0. \end{aligned}$$

The last inequality proves (4.3). Thus, Assumption 4.1 holds. Note that

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{2\alpha _2^2}{\alpha _2+3}, \end{aligned}$$

which gives

$$\begin{aligned} \frac{\ell \alpha _1\alpha _2}{2\alpha _2+3}>\frac{\ell \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0 \end{aligned}$$

and then \(D_1>0.\) \(\square \)

Remark 4.4

Note that there are common choices of parameters satisfied both Corollary 3.5 (as \(\varepsilon \rightarrow 0\)) and Corollary 4.3 (as \(\xi \rightarrow 0\)). The reader can check the following selection

$$\begin{aligned}{} & {} \alpha _2<1,\\{} & {} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\\{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{1}{3}\alpha _1\alpha _2,\,\frac{\alpha _1^2}{\alpha _1+2\alpha _2}\right\} . \end{aligned}$$

Remark 4.5

An important application of the monotone inclusion (1.1) is the following important optimization problem

$$\begin{aligned} \min _{x \in \mathcal {H}} f(x) + g(x), \end{aligned}$$
(4.27)

where \(f: \mathcal {H}\rightarrow \mathbb {R} \) is a differentiable function with L-Lipschitz continuous gradient for some \(L>0\) and \(g: \mathcal {H}\rightarrow \mathbb {R} \cup \{+\infty \}\) is a proper and lower semicontinuous function.

Recall that the Fréchet subdifferential of g at x is defined by

$$\begin{aligned} \hat{\partial } g(x):=\left\{ u \in \mathcal {H}, \liminf _{y \rightarrow x}{ \frac{f(y)-f(x) - \langle u, y-x \rangle }{\Vert y-x\Vert } \ge 0} \right\} . \end{aligned}$$

It is well known that if g is differentiable at x, then \(\hat{\partial } g(x) = \{\nabla g(x)\}\). When g is a convex function, the Fréchet subdifferential coincides with the classical convex subdifferential, i.e.

$$\begin{aligned} \hat{\partial } g(x) = \partial g(x)= \{ u \in \mathcal {H}: \, g(y) \ge g(x)+ \left\langle u,y - x \right\rangle \, \forall y \in \mathcal {H}\}. \end{aligned}$$

We notice that, if g is proper, \(\gamma _g\)-convex and lower semicontinuous then \(\hat{\partial } g\) is maximally generalized \(\gamma _g\)-monotone. We assume that f and g are respectively \(\gamma _f\) and \(\gamma _g\) convex functions such that \(\gamma = \gamma _f +\gamma _g >0\). Then the set of minimizers of (4.27) coincides with the solution set of the following monotone inclusion problem

$$\begin{aligned} \text { find } x^* \in \mathcal {H}\, \text {such that} \, \, 0 \in \nabla f(x^*) + \hat{\partial } g (x^*), \end{aligned}$$
(4.28)

for which the results obtained from previous Sections can be applied.

5 Strongly Pseudo-monotone Variational Inequality

Let C be a nonempty and closed convex subset of \(\mathcal {H}\). The normal cone of C at x is defined as

$$\begin{aligned} N_C(x) = \{ u \in \mathcal {H}, \langle u, y-x \rangle \le 0, \quad \forall y \in C\}, \end{aligned}$$

which is maximally monotone [13]. In this section, we focus on the restrictive category of Problem (1.1) of the form

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in A(x_*)+N_C(x_*). \end{aligned}$$
(5.1)

Note that, if A is \(\gamma _A\)-monotone, then (5.1) is a special case of (1.1). Indeed, the sum of two monotone operators is still monotone [13]. However, this is not the case if A is non-monotone (e.g. only pseudo-monotone). For example, the operator

$$\begin{aligned} A(x_1,x_2):= (x_1^2 + x_2^2) (-x_2, x_1)^T \end{aligned}$$

is pseudo-monotone but \(A+\epsilon I\) is not (pseudo)-monotone for any \(\epsilon > 0\) (see [41, Counterexample 2.1]).

In this section, we will consider the case when A is \(\gamma \)-strongly pseudo-monotone and hence the results obtained in the previous Sections cannot be directly applied. Problem (5.1) is equivalent to the variational inequality VI(AC): find \(x_*\in C\) such that

$$\begin{aligned} \left\langle {A(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$
(5.2)

For each \(x\in \mathcal {H}\), there exists a unique point in C (see, e.g., [31]), denoted by \(P_{C}(x)\), such that

$$\begin{aligned} \Vert x-P_{C}(x)\Vert \le \Vert x-y\Vert \quad \forall y\in C. \end{aligned}$$

Some well-known properties of the metric projection \(P_{C}: \mathcal {H}\rightarrow C\) are given in the following lemma [25, 31].

Lemma 5.1

Assume that the set C is a closed convex subset of \(\mathcal {H}\). Then we have the following:

  1. (a)

    \(P_{C}(.)\) is a nonexpansive operator, i.e., for all \(x,y\in \mathcal {H}\), it holds that

    $$\begin{aligned} \Vert P_{C}(x)-P_{C}(y)\Vert \le \Vert x-y\Vert . \end{aligned}$$
  2. (b)

    For any \(x\in \mathcal {H}\) and \(y\in C\), it holds that

    $$\begin{aligned} \left\langle x-P_{C}(x), y-P_{C}(x)\right\rangle \le 0. \end{aligned}$$

Assumption 5.1

(i) The coefficients \(\alpha _0,\alpha _1,\alpha _2>0.\)

(ii) The operator \(A:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma \)-strongly pseudo-monotone and L-Lipschitz continuous.

(iii) The parameter \(\omega >0\) satisfies

$$\begin{aligned} \omega <\frac{4\gamma }{L^2}. \end{aligned}$$
(5.3)

Remark 5.2

Under Assumption 5.1 (ii) and (iii), the problem VI(AC) has a unique solution [30].

We will need the following important estimate and error bounds.

Proposition 5.3

[44] Let \(C\subset \mathcal {H}\) be a nonempty closed convex subset. Let A be an operator that is \(\gamma \)-strongly pseudo-monotone and L-Lipschitz on C. Let \(x_*\) be the unique solution of Problem (5.2). For every \(\omega >0\) and \(x\in \mathcal {H},\) we have

$$\begin{aligned} \left\langle {x-P_C(x-\omega A(x))}, {x-x_*} \right\rangle \ge \left( 1-\frac{\omega L^2}{4\gamma }\right) \Vert x-P_C(x-\omega A(x))\Vert ^2 \end{aligned}$$
(5.4)

and

$$\begin{aligned} \Vert x-x_*\Vert \le \frac{1+\omega \gamma +\omega L}{\omega \gamma }\Vert x-P_C(x-\omega A(x))\Vert . \end{aligned}$$
(5.5)

In the whole section, we denote

$$\begin{aligned} \mu \triangleq 1-\frac{\omega L^2}{4\gamma },\quad \eta \triangleq \left( \frac{\omega \gamma }{1+\omega \gamma +\omega L}\right) ^2. \end{aligned}$$
(5.6)

5.1 Continuous Time

In this case, we consider

$$\begin{aligned} y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-P_C(y(t)-\omega A(y(t)))]=0, \end{aligned}$$
(5.7)

where \(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).

Denote

$$\begin{aligned} {\left\{ \begin{array}{ll} G_2\triangleq \frac{\mu \alpha _1}{\alpha _0},\\ G_1\triangleq \frac{\mu \alpha _2\alpha _1}{\alpha _0}-3,\\ G_0\triangleq \frac{\mu \alpha _1^2}{\alpha _0}-2\alpha _2, \end{array}\right. } {\left\{ \begin{array}{ll} H_1\triangleq \frac{\mu \alpha _2}{\alpha _0},\\ H_0\triangleq \frac{\mu }{\alpha _0}(\alpha _2^2-2\alpha _1), \end{array}\right. } K_0\triangleq \frac{\mu }{\alpha _0}. \end{aligned}$$
(5.8)

5.1.1 Global Exponential Convergence

Theorem 5.4

Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Assume that there exists \(\varepsilon >0\) such that the following conditions hold

$$\begin{aligned}{} & {} -\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\mu \eta \alpha _0\ge 0, \end{aligned}$$
(5.9)
$$\begin{aligned}{} & {} G_2\varepsilon ^2-G_1\varepsilon +G_0\ge 0, \end{aligned}$$
(5.10)
$$\begin{aligned}{} & {} 3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1\ge 0, \end{aligned}$$
(5.11)
$$\begin{aligned}{} & {} -2G_2\varepsilon +G_1\ge 0, \end{aligned}$$
(5.12)
$$\begin{aligned}{} & {} -H_1\varepsilon +H_0\ge 0, \end{aligned}$$
(5.13)
$$\begin{aligned}{} & {} \alpha _2>2\varepsilon . \end{aligned}$$
(5.14)

Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\).

Proof

Consider the functions in (3.7). Similarly as (3.14), we also have

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)-3b_1^{(1)}(t)-2\alpha _2b_1(t)\nonumber \\{} & {} \quad =2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad =2\alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle . \end{aligned}$$
(5.15)

On one hand, by (5.4), we can estimate

$$\begin{aligned}{} & {} \alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad \le -\alpha _0\mu \Vert y(t)-P_C(y(t)-\omega A(y(t)))\Vert ^2 =-\frac{\mu }{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \quad =-\frac{\mu }{\alpha _0}[\alpha _1b_1^{(2)}(t)+\alpha _2\alpha _1b_1^{(1)}(t)+\alpha _1^2b_1(t)+\alpha _2b_2^{(1)}(t)+(\alpha _2^2-2\alpha _1)b_2(t)+b_3(t)].\nonumber \\ \end{aligned}$$
(5.16)

On the other hand, by (5.4) and (5.5), we get

$$\begin{aligned} \alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle \le -\alpha _0\mu \eta a(t). \end{aligned}$$
(5.17)

Thus, using (5.16) and (5.17), we estimate (5.15) as follows

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\alpha _0\mu \eta a(t)\nonumber \\{} & {} \quad +G_2b_1^{(2)}(t)+G_1b_1^{(1)}(t)+G_0b_1(t)+H_1b_2^{(1)}(t)+H_0b_2(t)+K_0b_3(t)\le 0.\nonumber \\ \end{aligned}$$
(5.18)

By arguments similar to those used in Theorem 3.2 but now applied to (5.18); meaning, do integrating after three times, we get the exponential convergence of \(y(\cdot ).\) \(\square \)

5.1.2 Parameters Choices

Remark 5.5

If \(G_0,G_1,H_0\) satisfy

$$\begin{aligned} G_0,G_1,H_0>0, \end{aligned}$$
(5.19)

then conditions (5.9)–(5.14) can be obtained by letting \(\varepsilon \rightarrow 0^+.\)

In the following result, we simplify the assumption (5.19) in the term of the upper and lower bounds of the coefficients \(\alpha _0,\alpha _1,\alpha _2.\)

Corollary 5.6

Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _1<\frac{\alpha _2^2}{2},\\{} & {} \alpha _0<\mu \cdot \min \left\{ \frac{\alpha _1\alpha _2}{3},\frac{\alpha _1^2}{2\alpha _2}\right\} . \end{aligned}$$

Now we examine Theorem 5.4 when \(\varepsilon =1.\)

Corollary 5.7

Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8) and

$$\begin{aligned} \psi \triangleq \frac{1}{\mu ^2\eta }. \end{aligned}$$
(5.20)

Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _2>\max \{3,3\psi +2,4\psi \}, \end{aligned}$$
(5.21)
$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \left\{ 2\alpha _2-3,\psi (2\alpha _2-3)\right\}<\alpha _1< \overline{\beta }\triangleq 0.5\alpha _2(\alpha _2-1), \end{aligned}$$
(5.22)
$$\begin{aligned}{} & {} \underline{q}\triangleq \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\alpha _0< \overline{p}\triangleq \mu \cdot \min \left\{ \dfrac{\alpha _1(\alpha _2-2)}{3}, \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}\right\} .\nonumber \\ \end{aligned}$$
(5.23)

Proof

First, we show that (5.21) ensures the validity of (5.22); that is \(\underline{\beta }<\overline{\beta }\). Indeed, it follows from (5.21) that \(\alpha _2>3\) and so \(2\alpha _2-3\le \frac{\alpha _2(\alpha _2-1)}{2}\). Also from (5.21), we have \(\alpha _2>4\psi \) and then

$$\begin{aligned} \dfrac{\alpha _2(\alpha _2-1)}{2}>\dfrac{\alpha _2(2\alpha _2-3)}{4}>\psi (2\alpha _2-3). \end{aligned}$$

Next, we show that (5.21)–(5.22) ensure the validity of (5.23); that is \(\underline{q}<\overline{p}\). It results from (5.21) that \(\alpha _2>3\psi +2\) and so

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\frac{\alpha _1}{\mu \eta } <\mu \cdot \dfrac{\alpha _1(\alpha _2-2)}{3}. \end{aligned}$$

Meanwhile, by (5.22), we have \(\alpha _1>\psi (2\alpha _2-3)\), which gives

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\mu \cdot \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}. \end{aligned}$$

Now we can prove the exponential convergence of \(y(\cdot )\) by using Theorem 5.4 for \(\varepsilon =1\). \(\square \)

5.2 Discrete Time

We consider the difference equation

$$\begin{aligned} z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)+\alpha _0[z(n)-P_C(z(n)-\omega A(z(n)))]=0,\nonumber \\ \end{aligned}$$
(5.24)

where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\).

Denote

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} S_2\triangleq \frac{\mu \alpha _1}{\alpha _0}-3,\\ S_1\triangleq \frac{\mu \alpha _2\alpha _1}{\alpha _0}-2\alpha _2-3,\\ S_0\triangleq \frac{\mu \alpha _1^2}{\alpha _0}-2\alpha _2-\alpha _1, \end{array}\right. } {\left\{ \begin{array}{ll} T_1\triangleq \frac{\mu }{\alpha _0}(\alpha _2-2\alpha _1)+3,\\ T_0\triangleq \frac{\mu }{\alpha _0}(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)+\alpha _2+3, \end{array}\right. } \end{aligned}$$
(5.25)
$$\begin{aligned}{} & {} R_0\triangleq \frac{\mu }{\alpha _0}(1-\alpha _2+\alpha _1)-1. \end{aligned}$$
(5.26)

5.2.1 Global Exponential Convergence

Assumption 5.2

The coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy

$$\begin{aligned}{} & {} \frac{\mu }{\alpha _0}(1-\alpha _2+\alpha _1)> 1, \end{aligned}$$
(5.27)
$$\begin{aligned}{} & {} \frac{\mu }{\alpha _0}(2\alpha _1-\alpha _2)< 3, \end{aligned}$$
(5.28)
$$\begin{aligned}{} & {} \frac{\mu \alpha _1}{\alpha _0}>3, \end{aligned}$$
(5.29)

where \(\mu \) is defined in (5.6).

Remark 5.8

Under Assumption 5.2, we have \(R_0,T_1,S_2\ge 0.\)

Theorem 5.9

Suppose that Assumptions 5.1 and 5.2 are satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Assume that there exists \(\xi >0,\xi \ne 1\) such that the following conditions hold

$$\begin{aligned}{} & {} -\xi ^3+\alpha _2\xi ^2-\alpha _1\xi +\mu \eta \alpha _0\ge 0, \end{aligned}$$
(5.30)
$$\begin{aligned}{} & {} S_2\xi ^2-S_1\xi +S_0\ge 0,\end{aligned}$$
(5.31)
$$\begin{aligned}{} & {} 3\xi ^2-2\alpha _2\xi +\alpha _1\ge 0,\end{aligned}$$
(5.32)
$$\begin{aligned}{} & {} -2S_2\xi +S_1\ge 0,\end{aligned}$$
(5.33)
$$\begin{aligned}{} & {} -T_1\xi +T_0\ge 0,\end{aligned}$$
(5.34)
$$\begin{aligned}{} & {} \alpha _2>3\xi . \end{aligned}$$
(5.35)

Then the sequence \(z(\cdot )\) generated by (5.24) converges linearly to \(x_*\).

Proof

Consider the functions (4.6). Similarly as (4.13), we also have

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)\nonumber \\{} & {} \qquad -3c_1^{\Delta ^{(2)}}(n)-(2\alpha _2+3)c_1^\Delta (n)-(2\alpha _2+\alpha _1)c_1(n)\nonumber \\{} & {} \qquad +3c_2^\Delta (n)+(\alpha _2+3)c_2(n)-c_3(n)\nonumber \\{} & {} \quad =2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)}, {z(n)-x_*} \right\rangle \nonumber \\{} & {} \quad =2\alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle . \end{aligned}$$
(5.36)

On one hand, by (5.4), we can estimate

$$\begin{aligned}{} & {} \nonumber \alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle \\{} & {} \quad \le -\alpha _0\mu \Vert z(n)-P_C(z(n)-\omega A(z(n)))\Vert ^2 \nonumber \\{} & {} \quad =-\frac{\mu }{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)\Vert ^2\nonumber \\{} & {} \quad \nonumber =-\frac{\mu }{\alpha _0}[\alpha _1 c_1^{\Delta ^{(2)}}(n)+\alpha _2\alpha _1 c_1^\Delta (n)+\alpha _1^2 c_1(n)\\{} & {} \qquad +(\alpha _2-2\alpha _1)c_2^{\Delta }(n)+(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)c_2(n)+(1-\alpha _2+\alpha _1)c_3(n)].\nonumber \\ \end{aligned}$$
(5.37)

On the other hand, by (5.4) and (5.5), we get

$$\begin{aligned} \alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle \le -\alpha _0\mu \eta u(n). \end{aligned}$$
(5.38)

Thus, using (5.37) and (5.38), we estimate (5.36) as follows

$$\begin{aligned}{} & {} \nonumber u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\alpha _0\mu \eta u(n)\\{} & {} \quad +S_2 c_1^{\Delta ^{(2)}}(n)+S_1 c_1^\Delta (n)+S_0c_1(n) +T_1 c_2^\Delta (n)+T_0c_2(n)+R_0c_3(n)\le 0.\nonumber \\ \end{aligned}$$
(5.39)

Setting

$$\begin{aligned} \varepsilon \triangleq \frac{1}{1-\xi }, \end{aligned}$$

then \(\varepsilon >1\) and conditions can be written as

$$\begin{aligned}{} & {} \delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2\ge 0, \\{} & {} S_0\varepsilon ^2+S_1(1-\varepsilon )\varepsilon +S_2(1-\varepsilon )^2\ge 0,\\{} & {} \alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2\ge 0,\\{} & {} S_1\varepsilon +2S_2(1-\varepsilon )\ge 0,\\{} & {} T_0\varepsilon +T_1(1-\varepsilon )\ge 0,\\{} & {} \varepsilon \alpha _2+3(1-\varepsilon )> 0. \end{aligned}$$

By arguments similar to those used in Theorem 4.2 but now applied to (5.39); meaning, do summing after three times, we get the exponential convergence of \(z(\cdot ).\) \(\square \)

5.2.2 Parameters Choices

Remark 5.10

If \(S_0,S_1,T_0\) satisfy

$$\begin{aligned} S_0,S_1,T_0>0, \end{aligned}$$
(5.40)

then we can get conditions (5.2.1)–(5.2.1) by letting \(\xi \rightarrow 0.\)

The following result simplifies condition (5.40) in the term of the lower and upper bounds of the coefficients \(\alpha _0,\alpha _1,\alpha _2\). There are common choices of parameters satisfied both Corollary 5.6 and Corollary 5.11 below.

Corollary 5.11

Suppose that Assumptions 5.1 and 5.2 are satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Then \(z(\cdot )\) converges linearly to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy

$$\begin{aligned}{} & {} \alpha _2<2, \end{aligned}$$
(5.41)
$$\begin{aligned}{} & {} \max \{0,\alpha _2-1\}<\alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\end{aligned}$$
(5.42)
$$\begin{aligned}{} & {} \alpha _0<\mu \cdot \min \left\{ \frac{\alpha _1^2}{\alpha _1+2\alpha _2},1-\alpha _2+\alpha _1\right\} . \end{aligned}$$
(5.43)

Proof

Since \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\), we have \(T_0>\alpha _2+3>0\). Also using \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\) and the fact that \(\alpha _2<2\), we get

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{\alpha _2}{2}, \end{aligned}$$
(5.44)

which gives (5.28). It follows from (5.44) that \(\alpha _1<\alpha _2\) and so

$$\begin{aligned} \frac{\mu \alpha _1}{3}>\frac{\mu \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0. \end{aligned}$$

The last inequality proves (5.29). Thus, Assumption 5.2 holds. Note that

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{2\alpha _2^2}{\alpha _2+3}, \end{aligned}$$

which gives

$$\begin{aligned} \frac{\mu \alpha _1\alpha _2}{2\alpha _2+3}>\frac{\mu \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0 \end{aligned}$$

and then \(S_1>0.\) \(\square \)

Remark 5.12

We consider the following optimization problem

$$\begin{aligned} \min _{x \in C} f(x), \end{aligned}$$
(5.45)

where C is a nonempty and closed subset of \(\mathcal {H}\), \(f: \mathcal {H}\rightarrow \mathbb {R} \) is a \(\gamma \)-strongly pseudo-convex on C and differentiable function with L-Lipschitz continuous gradient for some \(L>0\). Recall that the differentiable function f is called \(\gamma \)-strongly pseudo convex if there exists \(\gamma >0\) such that

$$\begin{aligned} \left\langle {\nabla f(x)}, {y-x} \right\rangle \ge 0\Longrightarrow \left\langle {\nabla f(y)}, {y-x} \right\rangle \ge \gamma \Vert x-y\Vert ^2 \end{aligned}$$

for all \(x,y\in C\). For more details on generalized convexity functions and their characterization, the readers are referred to [29]. The optimization problem (5.45) is equivalent to the following strongly pseudo-monotone variational inequality

$$\begin{aligned} \left\langle {\nabla f(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$
(5.46)

As a consequence, all the results presented in this section can be applied directly to the pseudo-convex optimization problem (5.45).