Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Hai, Pham Viet; Vuong, Phan Tu

doi:10.1007/s10957-024-02437-y

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Open access
Published: 03 June 2024

Volume 202, pages 519–553, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Download PDF

1068 Accesses
Explore all metrics

Abstract

In this paper, we propose and analyze a third-order dynamical system for finding zeros of the sum of two generalized operators in a Hilbert space $\mathcal {H}$. We establish the existence and uniqueness of the trajectories generated by the system under appropriate continuity conditions, and prove exponential convergence to the unique zero when the sum of the operators is strongly monotone. Additionally, we derive an explicit discretization of the dynamical system, which results in a forward–backward algorithm with double inertial effects and larger range of stepsize. We establish the linear convergence of the iterates to the unique solution using this algorithm. Furthermore, we provide convergence analysis for the class of strongly pseudo-monotone variational inequalities. We illustrate the effectiveness of our approach by applying it to structured optimization and pseudo-convex optimization problems.

Regularized dynamics for monotone inverse variational inequalities in hilbert spaces

Article 18 March 2024

The zeros of monotone operators for the variational inclusion problem in Hilbert spaces

Article Open access 26 July 2021

Dynamical Systems for Solving Variational Inequalities

Article 17 March 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In practical applications, many nonlinear phenomena can be represented as finding a zero of a monotone operator. This problem arises in various contexts, such as solving variational inequalities related to monotone operators, minimizing convex functions, finding fixed points of nonexpansive mappings, and more. One of the most widely used methods for solving this problem is the proximal point algorithm, which was originally proposed by Martinet and systematically studied by Rockafellar [39] in the context of Hilbert spaces.

Another important problem is to find a zero of the sum of two maximally monotone operators

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in A(x_*)+B(x_*). \end{aligned}$$

(1.1)

Problem (1.1) arises in a wide range of applications such as convex optimization, image processing, and signal processing. A crucial special case of Problem (1.1) is the following variational inequality (VI) problem

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in N_C(x_*) + B(x_*), \end{aligned}$$

(1.2)

where C is a nonempty closed convex subset of $\mathcal {H}$ and $N_C(x_*)$ is the normal cone of C at $x_*.$ When B is single-valued, the VI problem (1.2) is equivalent to finding a point $x_*\in C$ such that

$$\begin{aligned} \left\langle {B(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$

(1.3)

The Douglas-Rachford splitting algorithm [22], presented by Lions and Mercier [32], is a fundamental method to solve such problems. Under additional assumptions on the involved operators, linear rates of convergence for the algorithm are possible. Some other splitting methods are derived from the Douglas-Rachford algorithm (such as the primal-dual hybrid gradient method [35], Alternating Direction Method of Multiplier (ADMM) [23], and Spingarn’s method of partial inverses [23]). There are many other methods for solving Problem (1.1), especially when one of the operators is single-valued. A popular method for solving this problem is the forward–backward algorithm, which consists of a forward step with one operator and a backward step with another. The algorithm generates a sequence of iterates that converges to a solution under suitable assumptions on the operators. The forward–backward algorithm has been widely studied and applied in both finite-dimensional and infinite-dimensional settings [11, 24, 33, 38].

Nowadays, there is a growing interest in connecting and integrating optimization with other fields. This research direction has become increasingly attractive as it can provide new insights into optimization results and lead to interesting findings. Among the emerging research directions, there is a line of works that uses ordinary differential equations (ODEs) to design algorithms for optimization problems [2, 3, 9, 14], variational inequalities [18, 27, 34, 43], monotone inclusions [1, 5, 6], fixed point problems [15, 17] and equilibrium problems [20, 36, 42, 45]. Using ODE interpretation not only provides a better understanding of Nesterov’s scheme, but also helps design new schemes with similar convergence rates. The readers can refer to [9, 10, 18, 40] and references therein for more examples.

1.1 Some Historical Aspects

The Heavy Ball with Friction method is a popular optimization algorithm based on inertial dynamics. The algorithm was proposed by Polyak to accelerate the gradient method in optimization [37]. It introduces an inertial system with a fixed viscous damping coefficient

$$\begin{aligned} x^{(2)}(t) +\gamma x^{(1)}(t) + \nabla f (x(t)) = 0, \end{aligned}$$

(1.4)

for minimizing a convex and differentiable function f. Note that, when f has a Lipschitz continuous gradient then $\nabla f$ is a co-coercive operator (see definition in Sect. 2). Attouch and Alvarez extended the heavy ball dynamical system (1.4) for constrained optimization as well as co-coercive operator in [4]. Recently, Boţ and Csetnek [15] studied the second order dynamical system with variable viscous damping coefficient

$$\begin{aligned} x^{(2)}(t) +\gamma (t) x^{(1)}(t) + \lambda (t)B (x(t)) = 0, \end{aligned}$$

for finding a zero of a co-coercive operator B. The results were applied to second order forward–backward dynamical systems for monotone inclusion problems (1.1)

$$\begin{aligned} x^{(2)}(t) +\gamma (t) x^{(1)}(t) + \lambda (t)[(x(t)) - \mathcal {J}_A(x(t) - \eta B(x(t))]= 0, \end{aligned}$$

(1.5)

where A is maximal monotone and B is co-coercive. Here $\mathcal {J}_A\triangleq (I+A)^{-1}$ is the resolvent of an operator A with I stands for the identity operator. When the operator is only merely monotone but not co-coercive, a second order forward–backward–forward dynamical system and its discretization have been recently proposed and investigated in [19]. In particular, when the operator $A+B$ is strongly monotone, the exponential convergence rate of the second order dynamical system (1.5) was obtained in [16]. Under suitable conditions on parameters, the authors established the convergence rate of $O(e^{-t})$ for the trajectories.

Attouch, Chbani and Riahi are the first authors who studied third order dynamical system for minimizing a convex and differentiable function in Hilbert spaces [7, 8]. They proposed and studied the (TOGES) dynamical system [7]

$$\begin{aligned} x^{(3)}(t) +\frac{\alpha }{t} x^{(2)}(t) + \frac{2\alpha -6}{t^2} x^{(1)}(t) + \nabla f(x(t) + t x^{(1)}(t)) = 0. \end{aligned}$$

(1.6)

Using the temporal scaling techniques, the third order dynamical system (1.6) was reformulated as a second order dynamical system and the convergence analysis was obtained using Lyapunov’s energy function techniques developed for second order dynamical system. The authors showed a convergence rate of the values of the order $\frac{1}{t^3}$, i.e. $ f(x(t) + t x^{(1)}(t)) - \inf _{\mathcal {H}}f \le \frac{C}{t^3}$ for some constant $C>0$ and obtained the convergence of the trajectories towards optimal solutions of $\min _{x\in \mathcal {H}} f(x)$. When the objective function f is strongly convex, the authors established an exponential rate of convergence. Proximal-based algorithms obtained by temporal discretization of (TOGES) was also investigated. Nevertheless, the rate of values of f(x(t)) in (TOGES) in only of order $\frac{1}{t}$, i.e. $ f(x(t) ) - \inf _{\mathcal {H}}f \le \frac{C}{t}$, which is not completely satisfactory from the point of view of fast optimization. Hence, very recently, an improved version of (TOGES), called (TOGES-V) has been proposed and investigated by the same authors in [8]

$$\begin{aligned} x^{(3)}(t) +\frac{\alpha +7}{t} x^{(2)}(t) + \frac{5(\alpha +1)}{t^2} x^{(1)}(t) + \nabla f\left( x(t) + \frac{1}{4}t x^{(1)}(t)\right) = 0, \end{aligned}$$

where they obtained the rate $\mathcal {O}\left( \frac{1}{t^3}\right) $ for $f(x(t) ) - \inf _{\mathcal {H}} f$.

1.2 Our Contributions

In this paper, we propose for the first time a third order dynamical system for the monotone inclusion (1.1) and investigate its convergence properties in both continuous time and discrete time settings. The motivation of considering third (or higher) order dynamical system comes from the fact that it can potentially provide faster convergence rate, as seen in optimization problems [7, 8]. This will be also the case of monotone inclusion obtained in this paper. Indeed, we derive the convergence rate of $O(e^{-\varepsilon t})$ for some $\varepsilon > 1$ (in particular for $\varepsilon = 2$) under suitable choices of parameters, which is significantly faster than the classical results obtained in [16] for second order dynamical systems. In discrete setting, the third order dynamical system provides a new forward backward algorithm with double momentum and a larger range of stepsize.

In contrast to the classical monotone inclusion problem, where each individual operator A and B is required to be (maximally) monotone, we only require A and B to be generalized monotone (see Definitions in Sect. 2). This approach allows us to handle not only the classical monotone inclusion problem but also the problem of finding zeros of the sum of a weakly monotone operator and a strongly monotone operator, and the pseudo-monotone variational inequalities. Applications of these models include minimizing the sum of a weakly convex function and a strongly convex function [21, 26] or minimizing a pseudo-convex function. The convergence analysis developed in this paper is purely relied on Lyapunov’s energy function techniques, in contrast to the temporal scaling technique using in [7, 8]. In summary, our contributions are as follows:

Propose a third order dynamical system for the sum of two generalized monotone operators.
Establish the existence and uniqueness of the trajectories generated by the proposed dynamical system.
Provide the exponential convergence analysis of the trajectories to the unique solution of the inclusion, and show that it is faster than classical results.
Investigate the temporal discretization of the system and prove the linear convergence of the corresponding forward–backward algorithm with double inertial effects.
Study the third order dynamical system for strongly pseudo-monotone variational inequalities.

The paper is structured as follows. In Sect. 2, we introduce some terminologies and results that are necessary for the analysis presented in the subsequent sections. In Sects. 3 and 4, we focus on solving Problem (1.1) under the assumption of generalized monotonicity of the operators involved. In Sect. 3, we propose a third-order dynamical system and establish its exponential convergence to the zero of Problem (1.1). The explicit discretization of this system leads to a new forward backward algorithm studied in Sect. 4. In Sect. 5, motivated by the third-order dynamical system, we find the solution of Problem (1.2) under the assumption of strong pseudo-monotonicity of the operator B.

2 Preliminaries

We start the section with listing the notations used. The set of integers is denoted by $\mathbb Z$ and the set of real numbers is denoted by $\mathbb R$. Let $\mathbb Z_{\ge 1}=\{j\in \mathbb Z:j\ge 1\}$ and $\mathbb R_{\ge 0}=\{t\in \mathbb R:t\ge 0\}$. The symbol $g^{(k)}$ stands for the k-th derivative of the function g.

Throughout this work $\mathcal {H}$ is a real Hilbert space with inner product $\left<\cdot , \cdot \right>$ and induced norm $\Vert \cdot \Vert $. We use the notation $A:\mathcal {H}\rightrightarrows \mathcal {H}$ to indicate that A is a set-valued operator defined on $\mathcal {H}$, and $A:\mathcal {H}\rightarrow \mathcal {H}$ to indicate that A is a single-valued operator on $\mathcal {H}$.

Let A be an operator on $\mathcal {H}$. The graph of A is $\text {Gra}A = \{(x,u) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}$. The inverse of A, denoted by $A^{-1}$, is the operator with graph $\text {Gra}A^{-1} = \{(u,x) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}$.

2.1 Generalized Monotone Operators

We first recall some generalized versions of monotone operator defined and studied in [21, 29].

Definition 2.1

The operator $A:\mathcal {H}\rightrightarrows \mathcal {H}$ is called $\gamma _A$-monotone if there exists a scalar $\gamma _A \in \mathbb {R}$ such that

$$\begin{aligned} \langle u-v,x-y \rangle \ge \gamma _A\Vert x-y\Vert ^2 \quad \forall \, (x,u), (y,v) \in \text {Gra}A. \end{aligned}$$

The constant $\gamma _A$ is referred to the monotonicity modulus of A. We also say that A is maximally $\gamma _A$-monotone if it is $\gamma _A$-monotone and there is no $\gamma _A$-monotone operator whose graph strictly contains GraA.

Remark 2.2

Note that in the definition of generalized monotonicity, $\gamma _A$ can be negative. If $\gamma _A = 0$, then the generalized monotonicity reduces to the classical monotonicity. If $\gamma _A>0$, then A is strongly monotone. Finally, if $\gamma _A<0$ then A is called weakly-monotone. For more detailed discussion on (maximally) monotone operators and the connection to optimization problems, we refer the readers to [12, 13, 21].

Definition 2.3

The single-valued operator $T:\mathcal {H}\rightarrow \mathcal {H}$ is called

1.
$\gamma _T$-strongly pseudo-monotone if $\gamma _T>0$ and
$$\begin{aligned} \left\langle {T(x)}, {y-x} \right\rangle \ge 0\Longrightarrow \left\langle {T(y)}, {y-x} \right\rangle \ge \gamma _T\Vert x-y\Vert ^2 \end{aligned}$$
for all $x,y\in \mathcal {H}$.
2.
$\gamma _T$-cocoercive if $\gamma _T>0$ and
$$\begin{aligned} \left\langle {T(x)-T(y)}, {x-y} \right\rangle \ge \gamma _T\Vert T(x)-T(y)\Vert ^2\quad \forall x,y\in \mathcal {H}. \end{aligned}$$
3.
$L_T$-Lipschitz continuous if $L_T>0$ and
$$\begin{aligned} \Vert T(x)-T(y)\Vert \le L_T\Vert x-y\Vert \quad \forall x,y\in \mathcal {H}. \end{aligned}$$

Remark 2.4

It is clear from the Cauchy–Schwartz inequality that if T is $\gamma _T$-co-coercive then it is $1/\gamma _T$-Lipschitz continuous.

The resolvent of an operator A is denoted as $\mathcal {J}_A\triangleq (I+A)^{-1}$, where I is the identity operator. We will need the following properties of resolvent operator.

Lemma 2.5

[21] Let $A:\mathcal {H}\rightrightarrows \mathcal {H}$ be an $\gamma _A$-monotone operator and let $\omega >0 $ such that $1+\omega \gamma _A > 0$. Then the followings hold:

1.
$\mathcal {J}_{\omega A}$ is single-valued;
2.
$\mathcal {J}_{\omega A}$ is $(1+\omega \gamma _A)$-co-coercive;
3.
dom $\mathcal {J}_{\omega A} = \mathcal {H}$ if and only if A is maximally $\gamma _A$-monotone.

2.2 Absolutely Continuous Functions

Definition 2.6

A function $h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d$ is called locally absolutely continuous if it is absolutely continuous on every compact interval, which means that for each interval $[t_0,t_1]$ there exists an integrable function $g:[t_0,t_1)\rightarrow \mathbb R^d$ such that

$$\begin{aligned} h(t)=h(t_0)+\int \limits _{t_0}^t g(s)\,ds\quad \forall t\in [t_0,t_1]. \end{aligned}$$

Remark 2.7

If $h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d$ is a locally absolutely continuous function, then it is differentiable almost everywhere and its derivative agrees with its distributional derivative almost everywhere.

Proposition 2.8

For $s,u\ge 0$ and $m\in \mathbb Z_{\ge 1}$, it holds

$$\begin{aligned} \int \limits _u^se^{t\varepsilon }g^{(m)}(t)\,dt= & {} e^{s\varepsilon }\left( \sum _{j=0}^{m-1}(-\varepsilon )^{m-1-j}g^{(j)}(s)\right) +(-\varepsilon )^m\int \limits _u^se^{t\varepsilon }g(t)\,dt\\{} & {} -e^{u\varepsilon }\left( \sum _{j=0}^{m-1}(-\varepsilon )^{m-1-j}g^{(j)}(u)\right) . \end{aligned}$$

Proof

The case when $m=1$ is done by using integration by parts. Now we suppose that the conclusion holds for m and prove the case $m+1.$ Indeed, we have

$$\begin{aligned} \int \limits _u^se^{t\varepsilon }g^{(m+1)}(t)\,dt=\int \limits _u^se^{t\varepsilon }\,dg^{(m)}(t)=e^{t\varepsilon }g^{(m)}(t)\bigg |_u^s-\varepsilon \int \limits _u^se^{t\varepsilon }g^{(m)}(t)\,dt, \end{aligned}$$

which together with the induction assumption completes the proof. $\square $

2.3 A Third Order Dynamical System

In this paper, we propose the following dynamical system for Problem (1.1).

$$\begin{aligned} y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-\mathcal {J}_{\omega A}(y(t)-\omega B(y(t)))]=0,\nonumber \\ \end{aligned}$$

(2.1)

where $\alpha _2,\alpha _1,\alpha _0,\omega >0$ and $y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}$.

The solution of dynamical system (2.1) is understood in the following sense.

Definition 2.9

A function $y(\cdot )$ is called a strong global solution of Eq. (2.1) if it holds:

1.
For every $j\in \{0,1,2,3\}$, $y^{(j)}:[t_0,+\infty )\rightarrow \mathcal {H}$ is locally absolutely continuous; in other words, absolutely continuous on each interval $[\delta ,\eta ]$ for $\eta>\delta >t_0$.
2.
$y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-\mathcal {J}_{\omega A}(y(t)-\omega B(y(t)))]=0$ for almost every $t\ge t_0$.
3.
$y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}$.

Proposition 2.10

(Equivalent form) Equation (2.1) is equivalent to the system $x^{(1)}(t)=G(x(t))$, where $G:\mathcal {H}\times \mathcal {H}\times \mathcal {H}\rightarrow \mathcal {H}\times \mathcal {H}\times \mathcal {H}$ is defined by

$$\begin{aligned} G(x_1,x_2,x_3)=\left( x_2,x_3,-\alpha _1x_2-\alpha _2x_3-\alpha _0[x_1-\mathcal {J}_{\omega A}(x_1-\omega B(x_1))]\right) , \end{aligned}$$

where $(x_1,x_2,x_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}.$

Proof

The conclusion follows from doing the change of variables

$$\begin{aligned} (x_1(t),x_2(t),x_3(t))=\left( y(t),y^{(1)}(t),y^{(2)}(t)\right) . \end{aligned}$$

$\square $

Theorem 2.11

(Existence and uniqueness of a solution) Consider dynamcial system (2.1), where $\alpha _0,\alpha _1,\alpha _2,\omega >0$ and the operator $A:\mathcal {H}\rightrightarrows \mathcal {H}$ is $\gamma _A$-maximally monotone, $B:\mathcal {H}\rightarrow \mathcal {H}$ is $\gamma _B$-monotone and L-Lipschitz such that $1+\omega \gamma _A > 0$. Then for each $v_0,v_1,v_2\in \mathcal {H}$ there exists a unique strong global solution of (2.1).

Proof

We endow $\mathcal {H}\times \mathcal {H}\times \mathcal {H}$ with scalar product

$$\begin{aligned} \left\langle {(y_1,y_2,y_3)}, {(z_1,z_2,z_3)} \right\rangle _{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}= \left\langle {y_1}, {z_1} \right\rangle +\left\langle {y_2}, {z_2} \right\rangle +\left\langle {y_3}, {z_3} \right\rangle . \end{aligned}$$

We show that the operator G is Lipschitz. Indeed, let $y=(y_1,y_2,y_3),z=(z_1,z_2,z_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}$. We have

$$\begin{aligned}&\Vert \mathcal {J}_{\omega A}(y_1-\omega B(y_1))-\mathcal {J}_{\omega A}(z_1-\omega B(z_1))\Vert ^2\\&\quad \le \frac{1}{(1+\omega \gamma _A)^2} \Vert y_1-z_1-\omega (B(y_1)-B(z_1))\Vert ^2\\&\quad \le \frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\Vert y_1-z_1\Vert ^2, \end{aligned}$$

and so

$$\begin{aligned}{} & {} \Vert G(y)-G(z)\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}} \le \Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2\\{} & {} \quad +(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\Big [\Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2+\Vert y_1-z_1\Vert ^2\\{} & {} \quad +\Vert \mathcal {J}_{\omega A}(y_1-\omega B(y_1))-\mathcal {J}_{\omega A}(z_1-\omega B(z_1))\Vert ^2\Big ]\\{} & {} \quad \le \Vert y_2-z_2\Vert ^2+\Vert y_3-z_3\Vert ^2\\{} & {} \qquad +(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\left[ 1+\frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\right] \Vert y-z\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}\\{} & {} \quad \le \left[ 1+(\alpha _1^2+\alpha _2^2+2\alpha _0^2)\left( 1+\frac{(1+\omega L)^2}{(1+\omega \gamma _A)^2}\right) \right] \cdot \Vert y-z\Vert ^2_{\mathcal {H}\times \mathcal {H}\times \mathcal {H}}. \end{aligned}$$

By using the Cauchy–Picard theorem (see, for example, [28, Proposition 6.2.1]), we get the existence and uniqueness of a strong global solution. $\square $

2.4 Difference Operators

In the section, we give the discrete counterpart of the dynamical system (2.1). To that aim, we recall the operation of forward difference and its properties used in the convergence analysis. For $z:\mathbb Z\rightarrow \mathcal {H}$ and $\kappa \in \mathbb Z_{\ge 1}$, we denote

$$\begin{aligned} z^{\Delta ^{(\kappa +1)}}\triangleq (z^{\Delta ^{(\kappa )}})^\Delta , \quad \text {where }z^\Delta (n)\triangleq z(n+1)-z(n). \end{aligned}$$

Remark 2.12

Let $f,g,h:\mathbb Z\rightarrow \mathcal {H}$ and $\theta \in \mathbb R$. It can be proven that

$$\begin{aligned} \left\langle {h}, {g} \right\rangle ^\Delta (n)=\left\langle {h^\Delta (n)}, {g^\Delta (n)} \right\rangle +\left\langle {h^\Delta (n)}, {g(n)} \right\rangle +\left\langle {h(n)}, {g^\Delta (n)} \right\rangle , \end{aligned}$$

and consequently

$$\begin{aligned}{} & {} \theta ^{n+1}g^\Delta (n)=(\theta ^n g)^\Delta (n)+(1-\theta )\theta ^ng(n),\\{} & {} (\Vert f\Vert ^2)^\Delta (n)=\Vert f^\Delta (n)\Vert ^2+2\left\langle {f^\Delta (n)}, {f(n)} \right\rangle . \end{aligned}$$

Consider the difference equation, which is the discrete version of (2.1):

$$\begin{aligned} z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)+\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))]=0,\nonumber \\ \end{aligned}$$

(2.2)

where $\alpha _2,\alpha _1,\alpha _0,\omega >0$.

Proposition 2.13

(Equivalent form) Equation (2.2) has an equivalent form

$$\begin{aligned} z(n+3)&=(3-\alpha _2)z(n+2)+(2\alpha _2-\alpha _1-3)z(n+1)\nonumber \\&\quad +(\alpha _1+1-\alpha _2)z(n)-\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))]. \end{aligned}$$

(2.3)

Proof

The proof makes use of the facts that

$$\begin{aligned}{} & {} z^{\Delta ^{(2)}}(n)=z(n+2)-2z(n+1)+z(n),\\{} & {} z^{\Delta ^{(3)}}(n)=z(n+3)-3z(n+2)+3z(n+1)-z(n). \end{aligned}$$

$\square $

Remark 2.14

The numerical scheme (2.3) can be re-written as

$$\begin{aligned} z(n+3)&=z(n+2)+(2-\alpha _2)(z(n+2)-z(n+1))\nonumber \\&\quad + (\alpha _2-\alpha _1-1)(z(n+1)-z(n))\nonumber \\&\quad -\alpha _0[z(n)-\mathcal {J}_{\omega A}(z(n)-\omega B(z(n)))], \end{aligned}$$

(2.4)

which is a forward–backward algorithm with double momentum.

3 Continuous Time Dynamical System

In this section, we will establish the exponential convergence of dynamical system (2.1) under the following assumption and notations.

Assumption 3.1

(i) The coefficients $\alpha _0,\alpha _1,\alpha _2>0$.

(ii) The operator $A:\mathcal {H}\rightrightarrows \mathcal {H}$ is maximally $\gamma _A$-monotone, $B:\mathcal {H}\rightarrow \mathcal {H}$ is $\gamma _B$-monotone and L-Lipschitz continuous such that

$$\begin{aligned} \gamma \triangleq \gamma _A+\gamma _B >0. \end{aligned}$$

(3.1)

(iii) The parameter $\omega >0$ satisfies

$$\begin{aligned}{} & {} 1+\omega \gamma _A > 0,\end{aligned}$$

(3.2)

$$\begin{aligned}{} & {} \frac{1}{\omega }>\frac{L^2}{4\gamma }+L-\gamma . \end{aligned}$$

(3.3)

Remark 3.1

Condition (3.1) implies that the sum operator $A+B$ is strongly monotone but not the individual operator A, B. A similar condition was studied in [21]. An direct application of this model is to minimize a sum of a weakly convex function and a strongly convex function. Condition (3.2) is imposed to ensure that the resolvent operator $\mathcal {J}_{\omega A}$ is single valued. Finally, condition (3.3) means that the stepsize $\omega $ must be bounded from above. Note that condition (3.3) gives

$$\begin{aligned} 1>\frac{L^2}{4\gamma }\cdot \frac{1}{\frac{1}{\omega }+\gamma -L}. \end{aligned}$$

Hence, we can find $\theta >0$ such that

$$\begin{aligned} 1>\theta >\frac{L^2}{4\gamma }\cdot \frac{1}{\frac{1}{\omega }+\gamma -L}. \end{aligned}$$

(3.4)

The following notations are used.

$$\begin{aligned} \ell \triangleq \frac{2\omega }{2\omega \gamma +1}\left( \frac{1}{\omega }+\gamma -L-\frac{L^2}{4\theta \gamma }\right) ,\quad \delta \triangleq \frac{2\omega \gamma (1-\theta )}{2\omega \gamma +1}. \end{aligned}$$

(3.5)

3.1 Global Exponential Convergence

First, we consider the dynamical system (2.1) whose the global convergence relates to the following parameters

$$\begin{aligned} {\left\{ \begin{array}{ll} A_2\triangleq \frac{\ell \alpha _1}{\alpha _0},\\ A_1\triangleq \frac{\ell \alpha _2\alpha _1}{\alpha _0}-3,\\ A_0\triangleq \frac{\ell \alpha _1^2}{\alpha _0}-2\alpha _2, \end{array}\right. } {\left\{ \begin{array}{ll} B_1\triangleq \frac{\ell \alpha _2}{\alpha _0},\\ B_0\triangleq \frac{\ell }{\alpha _0}(\alpha _2^2-2\alpha _1), \end{array}\right. } C_0\triangleq \frac{\ell }{\alpha _0}. \end{aligned}$$

(3.6)

We denote the functions

$$\begin{aligned} a(t)\triangleq \Vert y(t)-x_*\Vert ^2,\quad b_k(t)\triangleq \Vert y^{(k)}(t)\Vert ^2. \end{aligned}$$

(3.7)

Theorem 3.2

Suppose that the operators A and B satisfy Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4) and denote the parameters as in (3.5)–(3.6). Assume that there exists $\varepsilon >0$ such that the following conditions hold

$$\begin{aligned}{} & {} -\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\delta \alpha _0\ge 0, \end{aligned}$$

(3.8)

$$\begin{aligned}{} & {} A_2\varepsilon ^2-A_1\varepsilon +A_0\ge 0,\end{aligned}$$

(3.9)

$$\begin{aligned}{} & {} 3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1\ge 0,\end{aligned}$$

(3.10)

$$\begin{aligned}{} & {} -2A_2\varepsilon +A_1\ge 0,\end{aligned}$$

(3.11)

$$\begin{aligned}{} & {} -B_1\varepsilon +B_0\ge 0,\end{aligned}$$

(3.12)

$$\begin{aligned}{} & {} \alpha _2>2\varepsilon . \end{aligned}$$

(3.13)

Then the trajectories $y(\cdot )$ generated by dynamical system (2.1) converges exponentially to $x_*$, i.e., there exist positive numbers $\mu , \eta $ such that

$$\begin{aligned} \Vert y(t) -x^* \Vert \le \mu \, \Vert y(t_0)-x^*\Vert \, e^{- \eta t} \quad \forall t \ge t_0. \end{aligned}$$

Proof

In the next arguments, we often use the identities:

$$\begin{aligned}{} & {} b_1^{(1)}(t)=2\left\langle {y^{(2)}(t)}, {y^{(1)}(t)} \right\rangle ,\\{} & {} b_1^{(2)}(t)=2\left\langle {y^{(3)}(t)}, {y^{(1)}(t)} \right\rangle +2\Vert y^{(2)}(t)\Vert ^2=2\left\langle {y^{(3)}(t)}, {y^{(1)}(t)} \right\rangle +2b_2(t). \end{aligned}$$

Since

$$\begin{aligned} a^{(1)}(t)&=2\left\langle {y^{(1)}(t)}, {y(t)-x_*} \right\rangle ,\\ a^{(2)}(t)&=2\left\langle {y^{(2)}(t)}, {y(t)-x_*} \right\rangle +2b_1(t),\\ a^{(3)}(t)&=2\left\langle {y^{(3)}(t)}, {y(t)-x_*} \right\rangle +3b_1^{(1)}(t), \end{aligned}$$

we have

$$\begin{aligned}{} & {} 2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad =a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)-3b_1^{(1)}(t)-2\alpha _2b_1(t). \end{aligned}$$

(3.14)

We observe

$$\begin{aligned}{} & {} \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \quad =\alpha _1b_1^{(2)}(t)+\alpha _2\alpha _1b_1^{(1)}(t)+\alpha _1^2b_1(t)+\alpha _2b_2^{(1)}(t)+(\alpha _2^2-2\alpha _1)b_2(t)+b_3(t).\nonumber \\ \end{aligned}$$

(3.15)

Using the definition of resolvent, equation (2.1) gives the following

$$\begin{aligned}{} & {} B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))\\{} & {} \quad -\frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\omega \alpha _0}\\{} & {} \quad \in (A+B)\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) , \end{aligned}$$

which combined with $0\in (A+B)(x_*)$ and the $\gamma $-monotonicity of $A+B$ implies

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad \le \Biggl <B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) \\{} & {} \qquad -B(y(t))-\frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\omega \alpha _0},\\{} & {} \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\Biggl >. \end{aligned}$$

Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }\right) \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\\{} & {} \qquad +\left\langle {B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))}, {y(t)-x_*} \right\rangle \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle . \end{aligned}$$

Note that by the Cauchy–Schwarz inequality

$$\begin{aligned}{} & {} \left\langle {B\left( \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)\right) -B(y(t))}, {y(t)-x_*} \right\rangle \\{} & {} \quad \le \frac{L}{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert \cdot \Vert y(t)-x_*\Vert \\{} & {} \quad \le \frac{L^2}{4\alpha _0^2\theta \gamma }\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\theta \gamma a(t). \end{aligned}$$

Thus, we get

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\nonumber \\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }+\frac{L^2}{4\theta \gamma }\right) \Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle +\theta \gamma a(t). \end{aligned}$$

(3.16)

Note that

$$\begin{aligned}{} & {} \gamma \left\| \frac{y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}{\alpha _0}+y(t)-x_*\right\| ^2\\{} & {} \quad =\frac{\gamma }{\alpha _0^2}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\gamma a(t)\\{} & {} \qquad +\frac{2\gamma }{\alpha _0}\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle . \end{aligned}$$

Inserting the equality above into (3.16), we obtain

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2+\delta \alpha _0a(t)\nonumber \\{} & {} \quad +2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \le 0, \end{aligned}$$

(3.17)

which implies, by (3.14) and (3.15), that

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\delta \alpha _0a(t)\\{} & {} \quad +A_2b_1^{(2)}(t)+A_1b_1^{(1)}(t)+A_0b_1(t) +B_1b_2^{(1)}(t)+B_0b_2(t)+C_0b_3(t)\le 0. \end{aligned}$$

By (3.4), we have $\ell >0$ and so is $C_0$. Thus, we can write

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\delta \alpha _0a(t)\\{} & {} \quad +A_2b_1^{(2)}(t)+A_1b_1^{(1)}(t)+A_0b_1(t) +B_1b_2^{(1)}(t)+B_0b_2(t)\le 0. \end{aligned}$$

Multiplying both sides by $e^{\varepsilon (t-t_0)}$ and then using Proposition 2.8, we get

$$\begin{aligned}{} & {} e^{\varepsilon (s-t_0)} [(\varepsilon ^2-\alpha _2\varepsilon +\alpha _1)a(s) +(\alpha _2-\varepsilon )a^{(1)}(s)+a^{(2)}(s)]\\{} & {} \quad +(-\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\delta \alpha _0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}a(t)dt\\{} & {} \quad +e^{\varepsilon (s-t_0)} [(-A_2\varepsilon +A_1)b_1(s)+A_2b_1^{(1)}(s)]\\{} & {} \quad +(A_2\varepsilon ^2-A_1\varepsilon +A_0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}b_1(t)dt\\{} & {} \quad +B_1e^{\varepsilon (s-t_0)}b_2(s) +(-B_1\varepsilon +B_0)\int \limits _{t_0}^se^{\varepsilon (t-t_0)}b_2(t)dt\le D_1, \end{aligned}$$

for some constant $D_1$, which implies, after using (3.8), (3.9), (3.12) and (3.4), that

$$\begin{aligned}{} & {} e^{\varepsilon (s-t_0)} [(\varepsilon ^2-\alpha _2\varepsilon +\alpha _1)a(s) +(\alpha _2-\varepsilon )a^{(1)}(s)+a^{(2)}(s)]\\{} & {} \quad +e^{\varepsilon (s-t_0)} [(-A_2\varepsilon +A_1)b_1(s)+A_2b_1^{(1)}(s)] +B_1e^{\varepsilon (s-t_0)}b_2(s)\le D_1, \end{aligned}$$

Intergrating the above inequality with respect to the variable $s\in [t_0;t]$ we deduce

$$\begin{aligned}{} & {} e^{\varepsilon (t-t_0)}a^{(1)}(t)+(\alpha _2-2\varepsilon )e^{\varepsilon (t-t_0)}a(t) +(3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1)\int \limits _{t_0}^te^{\varepsilon (s-t_0)}a(s)ds\\{} & {} \quad +A_2e^{\varepsilon (t-t_0)}b_1(t)+(-2A_2\varepsilon +A_1)\int \limits _{t_0}^te^{\varepsilon (s-t_0)}b_1(s)ds \le D_1t+D_2. \end{aligned}$$

for some constant $D_2$. Using (3.10), (3.11), (3.4), we get

$$\begin{aligned} e^{\varepsilon (t-t_0)}a^{(1)}(t)+(\alpha _2-2\varepsilon )e^{\varepsilon (t-t_0)}a(t) \le D_1t+D_2. \end{aligned}$$

(3.18)

Note that equation (3.18) reduces to the following

$$\begin{aligned}{} & {} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}D_3\\{} & {} \quad +e^{-(\alpha _2-2\varepsilon )(t-t_0)}\int \limits _{t_0}^te^{(\alpha _2-3\varepsilon )(s-t_0)}(D_1s+D_2)\,ds. \end{aligned}$$

for some constant $D_3$.

If $\alpha _2\ge 3\varepsilon $, then $e^{(\alpha _2-3\varepsilon )(s-t_0)}\le e^{(\alpha _2-3\varepsilon )(t-t_0)}$, and so
$$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}D_3 +e^{-\varepsilon (t-t_0)}\int \limits _{t_0}^t(D_1s+D_2)\,ds. \end{aligned}$$
(3.19)
If $2\varepsilon<\alpha _2<3\varepsilon $, then $e^{(\alpha _2-3\varepsilon )(s-t_0)}\le 1$, and so
$$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}\left( D_3 +\int \limits _{t_0}^t(D_1s+D_2)\,ds\right) . \end{aligned}$$
(3.20)

The arguments above show that $y(\cdot )$ converges exponentially to $x_*$. $\square $

Remark 3.3

It follows from (3.19) that a(t) converges to 0 with the rate of $O((Pt^2 +Qt +Rt )e^{-\varepsilon t})$ for some constants P, Q, R, while the rate obtained from (3.20) is $O((Pt^2 +Qt +Rt )e^{-(\alpha _2 - 2 \varepsilon ) t})$. With the suitable choice of $\varepsilon $ and $\alpha _2$, these rates can be controlled so that they are faster than the rate $O(e^{-t})$ of the second order dynamical systems established in [16].

3.2 Parameters Choices

We now discuss the question “how to find $\varepsilon $?”. It can be seen from (3.19) that the larger $\varepsilon $ implies the faster rate. Finding the maximal value of $\varepsilon $ is cumbersome as it depends on many other parameters. However, we will discuss how to find a "good enough" $\varepsilon $ in this section. The following remark offers a way which concerns the coefficients.

Remark 3.4

If $A_0,A_1,B_0$ satisfy

$$\begin{aligned} A_0,A_1,B_0>0, \end{aligned}$$

(3.21)

then conditions (3.8)–(3.13) can be obtained by letting $\varepsilon \rightarrow 0^+.$

In the following result, we simplify the assumption (3.21) in algebraic terms of the coefficients $\alpha _0,\alpha _1,\alpha _2.$

Corollary 3.5

Consider equation (2.1), under Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4). Denote (3.5)–(3.6). Then $y(\cdot )$ converges exponentially to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _1<\frac{\alpha _2^2}{2}, \end{aligned}$$

(3.22)

$$\begin{aligned}{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{\alpha _1\alpha _2}{3},\frac{\alpha _1^2}{2\alpha _2}\right\} . \end{aligned}$$

(3.23)

Let us first examine Theorem 3.2 when $\varepsilon =1$, for which it matches the rate obtained for the second order dynamical system established in [16].

Theorem 3.6

Suppose that the operators A and B satisfy Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and

$$\begin{aligned} \varphi \triangleq \frac{1}{\ell \delta }. \end{aligned}$$

(3.24)

Then $y(\cdot )$ converges exponentially to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy

$$\begin{aligned}{} & {} \alpha _2>\max \{3,3\varphi +2,4\varphi \}, \end{aligned}$$

(3.25)

$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \left\{ 2\alpha _2-3,\varphi (2\alpha _2-3)\right\}<\alpha _1< \overline{\beta }\triangleq 0.5\alpha _2(\alpha _2-1), \end{aligned}$$

(3.26)

$$\begin{aligned}{} & {} \underline{q}\triangleq \dfrac{\alpha _1-\alpha _2+1}{\delta }<\alpha _0< \overline{p}\triangleq \ell \cdot \min \left\{ \dfrac{\alpha _1(\alpha _2-2)}{3}, \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}\right\} .\nonumber \\ \end{aligned}$$

(3.27)

Proof

First, we show that (3.25) ensures the validity of (3.26); that is $\underline{\beta }<\overline{\beta }$. Indeed, it follows from (3.25) that $\alpha _2>3$ and so $2\alpha _2-3<\frac{\alpha _2(\alpha _2-1)}{2}$. Also from (3.25), we have $\alpha _2>4\varphi $ and then

$$\begin{aligned} \dfrac{\alpha _2(\alpha _2-1)}{2}>\dfrac{\alpha _2(2\alpha _2-3)}{4}>\varphi (2\alpha _2-3). \end{aligned}$$

Next, we show that (3.25)–(3.26) ensure the validity of (3.27); that is $\underline{q}<\overline{p}$. It results from (3.25) that $\alpha _2>3\varphi +2$ and so

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\delta }<\frac{\alpha _1}{\delta } <\ell \cdot \dfrac{\alpha _1(\alpha _2-2)}{3}. \end{aligned}$$

Meanwhile, by (3.26), we have $\alpha _1>\varphi (2\alpha _2-3)$, which gives

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\delta }<\ell \cdot \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}. \end{aligned}$$

Now we can obtain the exponential convergence of $y(\cdot )$ by using Theorem 3.2 for $\varepsilon =1$. $\square $

Now let us examine Theorem 3.2 when $\varepsilon =2$. In this case, we will obtain from (3.19) that the convergence rate of a(t) is

$$\begin{aligned} O((Pt^2+Qt+R) e^{-2t}), \end{aligned}$$

which is faster than the rate $O(e^{-t})$ obtained in [16] for the second order dynamical system.

Theorem 3.7

Suppose that the operators A and B satisfy Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and (3.24). Then $y(\cdot )$ converges exponentially to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy

$$\begin{aligned}{} & {} \alpha _2>\max \{8\varphi ,6,6\varphi +4\}, \end{aligned}$$

(3.28)

$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \{4(\alpha _2-3),4\varphi (\alpha _2-3)\}<\alpha _1<\overline{\beta }\triangleq \frac{1}{2}\alpha _2(\alpha _2-2),\end{aligned}$$

(3.29)

$$\begin{aligned}{} & {} \underline{q}\triangleq \frac{2}{\delta }(\alpha _1-2\alpha _2+4)<\alpha _0<\overline{p}\triangleq \ell \alpha _1\cdot \min \left\{ \frac{\alpha _1-2\alpha _2+4}{2(\alpha _2-3)},\frac{1}{3}(\alpha _2-4)\right\} .\nonumber \\ \end{aligned}$$

(3.30)

Proof

Like in Theorem 3.6, we must check $\underline{\beta }<\overline{\beta }$. Indeed, we have

$$\begin{aligned} \frac{1}{2}\alpha _2(\alpha _2-2)>\frac{1}{2}\alpha _2(\alpha _2-3)>4\varphi (\alpha _2-3). \end{aligned}$$

Next is to prove $\underline{q}<\overline{p}$. It follows from (3.29) that

$$\begin{aligned} \alpha _1>4(\alpha _2-3)>2(\alpha _2-2), \end{aligned}$$

which gives

$$\begin{aligned} \alpha _1-2\alpha _2+4>0. \end{aligned}$$

(3.31)

Again using (3.29), we get $\alpha _1>4\varphi (\alpha _2-3)$, which gives

$$\begin{aligned} \frac{\ell \alpha _1}{2(\alpha _2-3)}>\frac{2}{\delta }. \end{aligned}$$

(3.32)

From (3.31)–(3.32), we obtain

$$\begin{aligned} \ell \alpha _1\cdot \frac{\alpha _1-2\alpha _2+4}{2(\alpha _2-3)}>\frac{2}{\delta }(\alpha _1-2\alpha _2+4). \end{aligned}$$

(3.33)

We observe $\alpha _1(\alpha _2-6\varphi -4)+12\varphi (\alpha _2-2)>0$, which is equivalent to saying that

$$\begin{aligned} \ell \alpha _1\cdot \frac{1}{3}(\alpha _2-4)>\frac{2}{\delta }(\alpha _1-2\alpha _2+4). \end{aligned}$$

(3.34)

Hence, the inequality $\underline{q}<\overline{p}$ follows from (3.33)–(3.34). We left the reader to checking (3.8)–(3.13). Thus, $y(\cdot )$ converges exponentially to $x_*$. Moreover, it follows from (3.19) that the convergence rate is

$$\begin{aligned} O((Pt^2+Qt+R) e^{-2t}), \end{aligned}$$

for some constants P, Q, R. $\square $

4 Discrete Time Dynamical System

In this section, we establish the linear convergence of the numerical scheme (2.3) for solving (1.1) under the following additional assumption.

Assumption 4.1

The coefficients $\alpha _0,\alpha _1,\alpha _2$ satisfy

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}(1-\alpha _2+\alpha _1)> 1, \end{aligned}$$

(4.1)

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}(2\alpha _1-\alpha _2)< 3,\end{aligned}$$

(4.2)

$$\begin{aligned}{} & {} \frac{\ell \alpha _1}{\alpha _0}>3, \end{aligned}$$

(4.3)

where $\ell $ is defined in (3.5).

We denote the following parameters

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} D_2\triangleq \frac{\ell \alpha _1}{\alpha _0}-3,\\ D_1\triangleq \frac{\ell \alpha _2\alpha _1}{\alpha _0}-2\alpha _2-3,\\ D_0\triangleq \frac{\ell \alpha _1^2}{\alpha _0}-2\alpha _2-\alpha _1, \end{array}\right. } {\left\{ \begin{array}{ll} E_1\triangleq \frac{\ell }{\alpha _0}(\alpha _2-2\alpha _1)+3,\\ E_0\triangleq \frac{\ell }{\alpha _0}(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)+\alpha _2+3, \end{array}\right. }\end{aligned}$$

(4.4)

$$\begin{aligned}{} & {} F_0\triangleq \frac{\ell }{\alpha _0}(1-\alpha _2+\alpha _1)-1 \end{aligned}$$

(4.5)

and

$$\begin{aligned} u(n)\triangleq \Vert z(n)-x_*\Vert ^2,\quad c_k(t)\triangleq \Vert z^{\Delta ^{(k)}}(n)\Vert ^2. \end{aligned}$$

(4.6)

Remark 4.1

Under Assumption 4.1, we have $F_0,E_1,D_2>0.$ Note also that under Assumption 3.1, the stepsize $\omega $ must be bounded from above, i.e.

$$\begin{aligned} \omega < \frac{4\gamma }{L^2 + 4L\gamma - 4 \gamma ^2}. \end{aligned}$$

This upper bound of $\omega $ is larger than that of the classical forward–backward algorithm, which is $\omega < \frac{2\gamma }{L^2}$ (see e.g. [13, Proposition 25.9]) when A is maximal monotone and B is $\gamma -$ strongly monotone and L-Lipschitz continuous.

4.1 Global Linear Convergence

Theorem 4.2

Suppose that the operators A and B satisfy Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4) and denote the parameters as in (3.5), (4.4), (4.5) and Assumption 4.1 holds. Assume that there exists $\xi >0,\xi \ne 1$ such that the following conditions hold

$$\begin{aligned}{} & {} -\xi ^3+\alpha _2\xi ^2-\alpha _1\xi +\delta \alpha _0\ge 0, \end{aligned}$$

(4.7)

$$\begin{aligned}{} & {} D_2\xi ^2-D_1\xi +D_0\ge 0,\end{aligned}$$

(4.8)

$$\begin{aligned}{} & {} 3\xi ^2-2\alpha _2\xi +\alpha _1\ge 0,\end{aligned}$$

(4.9)

$$\begin{aligned}{} & {} -2D_2\xi +D_1\ge 0,\end{aligned}$$

(4.10)

$$\begin{aligned}{} & {} -E_1\xi +E_0\ge 0,\end{aligned}$$

(4.11)

$$\begin{aligned}{} & {} \alpha _2>3\xi . \end{aligned}$$

(4.12)

Then z(n) converges linearly to $x_*$, i.e. there exist $M>0$ and $q \in (0,1)$ such that

$$\begin{aligned} \Vert z(n) - x_*\Vert \le M q^n \quad \forall n. \end{aligned}$$

Proof

Since

$$\begin{aligned}{} & {} u^\Delta (n)=2\left\langle {z^\Delta (n)}, {z(n)-x_*} \right\rangle +c_1(n),\\{} & {} u^{\Delta ^{(2)}}(n)=2\left\langle {z^{\Delta ^{(2)}}(n)}, {z(n)-x_*} \right\rangle +2c_1^\Delta (n)+2c_1(n)-c_2(n),\\{} & {} u^{\Delta ^{(3)}}(n)=2\left\langle {z^{\Delta ^{(3)}}(n)}, {z(n)-x_*} \right\rangle \\{} & {} \quad +3c_1^{\Delta ^{(2)}}(n)+3c_1^\Delta (n)-3c_2^\Delta (n)-3c_2(n) +c_3(n), \end{aligned}$$

we have

$$\begin{aligned}{} & {} 2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)}, {z(n)-x_*} \right\rangle \nonumber \\{} & {} \quad =u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)\nonumber \\{} & {} \qquad -3c_1^{\Delta ^{(2)}}(n)-(2\alpha _2+3)c_1^\Delta (n)-(2\alpha _2+\alpha _1)c_1(n)\nonumber \\{} & {} \qquad +3c_2^\Delta (n)+(\alpha _2+3)c_2(n)-c_3(n). \end{aligned}$$

(4.13)

We observe

$$\begin{aligned}{} & {} \Vert z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)\Vert ^2\nonumber \\{} & {} \quad =\alpha _1 c_1^{\Delta ^{(2)}}(n)+\alpha _2\alpha _1 c_1^\Delta (n)+\alpha _1^2 c_1(n)\nonumber \\{} & {} \qquad +(\alpha _2-2\alpha _1)c_2^{\Delta }(n)+(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)c_2(n)+(1-\alpha _2+\alpha _1)c_3(n).\nonumber \\ \end{aligned}$$

(4.14)

Using the definition of resolvent, equation (2.2) gives

$$\begin{aligned}{} & {} B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) \\{} & {} \quad -B(z(n))-\frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\omega \alpha _0}\\{} & {} \quad \in (A+B)\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) , \end{aligned}$$

which combined with $0\in (A+B)(x_*)$ and the $\gamma $-monotonicity of $A+B$ implies

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad \le \Biggl <B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) \\{} & {} \qquad -B(z(n))-\frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\omega \alpha _0},\\{} & {} \qquad \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\Biggl >. \end{aligned}$$

Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad \le \left\langle {B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) -B(z(n))}, {z(n)-x_*} \right\rangle \\{} & {} \qquad +\frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }\right) \Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2\\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$

Note that by the Cauchy–Schwarz inequality

$$\begin{aligned}{} & {} \left\langle {B\left( \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)\right) -B(z(n))}, {z(n)-x_*} \right\rangle \\{} & {} \quad \le \frac{L}{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert \cdot \Vert z(n)-x_*\Vert \\{} & {} \quad \le \frac{L^2}{4\theta \gamma \alpha _0^2}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2 +\theta \gamma u(n). \end{aligned}$$

Thus, we get

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\nonumber \\{} & {} \quad \le \frac{1}{\alpha _0^2}\left( L-\frac{1}{\omega }+\frac{L^2}{4\theta \gamma }\right) \Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\theta \gamma u(n)\nonumber \\{} & {} \qquad -\frac{1}{\omega \alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$

(4.15)

Note that

$$\begin{aligned}{} & {} \gamma \left\| \frac{z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}{\alpha _0}+z(n)-x_*\right\| ^2\\{} & {} \quad =\frac{\gamma }{\alpha _0^2}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\gamma u(n)\\{} & {} \qquad +\frac{2\gamma }{\alpha _0}\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle . \end{aligned}$$

Inserting the equality above into (4.15), we get

$$\begin{aligned}{} & {} \frac{\ell }{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)\Vert ^2+\delta \alpha _0u(n)\\{} & {} \quad +2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2z^{\Delta ^{(2)}}(n)+\alpha _1z^\Delta (n)}, {z(n)-x_*} \right\rangle \le 0, \end{aligned}$$

which implies, by (4.13) and (4.14), that

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\delta \alpha _0u(n)\\{} & {} \quad +D_2c_1^{(2)}(n)+D_1c_1^{(1)}(n)+D_0c_1(n) +E_1c_2^{(1)}(n)+E_0c_2(n)+F_0c_3(n)\le 0. \end{aligned}$$

By (4.1), the inequality above gives

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\delta \alpha _0u(n)\\{} & {} \quad +D_2c_1^{\Delta ^{(2)}}(n)+D_1c_1^\Delta (n)+D_0c_1(n) +E_1c_2^\Delta (n)+E_0c_2(n)\le 0. \end{aligned}$$

Setting

$$\begin{aligned} \varepsilon \triangleq \frac{1}{1-\xi }. \end{aligned}$$

Then $\varepsilon >1$ and conditions (4.7)–(4.12) can be written as

$$\begin{aligned}{} & {} \delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2\ge 0, \end{aligned}$$

(4.16)

$$\begin{aligned}{} & {} D_0\varepsilon ^2+D_1(1-\varepsilon )\varepsilon +D_2(1-\varepsilon )^2\ge 0,\end{aligned}$$

(4.17)

$$\begin{aligned}{} & {} \alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2\ge 0,\end{aligned}$$

(4.18)

$$\begin{aligned}{} & {} D_1\varepsilon +2D_2(1-\varepsilon )\ge 0,\end{aligned}$$

(4.19)

$$\begin{aligned}{} & {} E_0\varepsilon +E_1(1-\varepsilon )\ge 0,\end{aligned}$$

(4.20)

$$\begin{aligned}{} & {} \varepsilon \alpha _2+3(1-\varepsilon )>0. \end{aligned}$$

(4.21)

Through multiplying both sides by $\varepsilon ^{n+3}$ and then using Remark 2.12,

$$\begin{aligned}{} & {} (\varepsilon ^{n+2}u^{\Delta ^{(2)}})^\Delta (n)+(\alpha _2\varepsilon +1-\varepsilon )(\varepsilon ^{n+1}u^\Delta )^\Delta (n)\\{} & {} \quad +[\alpha _1\varepsilon ^2+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )](\varepsilon ^nu)^\Delta (n)\\{} & {} \quad +\underbrace{[\delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.16))}}\varepsilon ^nu(n)\\{} & {} \quad +D_2(\varepsilon ^{n+2}c_1^\Delta )^\Delta (n)+[D_1\varepsilon -D_2(\varepsilon -1)](\varepsilon ^{n+1}c_1)^\Delta (n)\\{} & {} \quad +\underbrace{[D_0\varepsilon ^2+D_1(1-\varepsilon )\varepsilon +D_2(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.17))}}\varepsilon ^{n+1}c_1(n)\\{} & {} \quad +E_1(\varepsilon ^{n+2}c_2)^\Delta (n)+\underbrace{[-E_1(\varepsilon -1)+E_0\varepsilon ]}_{\ge 0\quad \text {(by (4.20))}}\varepsilon ^{n+2}c_2(n)\le 0. \end{aligned}$$

Let $m\in \mathbb Z_{\ge 1}$. After summing from $n=0$ to $n=m-1$,

$$\begin{aligned}{} & {} \varepsilon ^{m+2}u^{\Delta ^{(2)}}(m)+(\alpha _2\varepsilon +1-\varepsilon )\varepsilon ^{m+1}u^\Delta (m) +[\alpha _1\varepsilon ^2+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )]\varepsilon ^mu(m)\\{} & {} \quad +D_2\varepsilon ^{m+2}c_1^\Delta (m)+[D_1\varepsilon -D_2(\varepsilon -1)]\varepsilon ^{m+1}c_1(m) +\underbrace{E_1\varepsilon ^{m+2}c_2(m)}_{\ge 0\quad \text {(by (4.2))}}\le M_1, \end{aligned}$$

where $M_1$ is some positive constant. Again using Remark 2.12,

$$\begin{aligned}{} & {} (\varepsilon ^{m+1}u^\Delta )^\Delta (m)+[\alpha _2\varepsilon +2(1-\varepsilon )](\varepsilon ^m u)^\Delta (m)\\{} & {} \qquad +\underbrace{[\alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2]}_{\ge 0\quad \text {(by (4.18))}}\varepsilon ^mu(m)\\{} & {} \quad +D_2(\varepsilon ^{m+1}c_1)^\Delta (m) +\underbrace{[D_1\varepsilon -2D_2(\varepsilon -1)]}_{\ge 0\quad \text {(by (4.19))}}\varepsilon ^{m+1}c_1(m)\le M_1. \end{aligned}$$

Let $\kappa \in \mathbb Z_{\ge 2}$. After summing from $m=1$ to $m=\kappa -1$,

$$\begin{aligned} \varepsilon ^{\kappa +1}u^\Delta (\kappa )+[\alpha _2\varepsilon +2(1-\varepsilon )]\varepsilon ^\kappa u(\kappa )+\underbrace{D_2\varepsilon ^{\kappa +1}c_1(\kappa )}_{\ge 0\quad \text {(by (4.3))}}\le M_1\kappa +M_2, \end{aligned}$$

where $M_2$ is some positive constant. Again using Remark 2.12,

$$\begin{aligned} (\varepsilon ^\kappa u)^\Delta (\kappa )+\underbrace{[\alpha _2\varepsilon +3(1-\varepsilon )]\varepsilon ^\kappa u(\kappa )}_{\ge 0\quad \text {(by (4.21))}}\le M_1\kappa +M_2, \end{aligned}$$

which implies, after summing from $\kappa =2$ to $\kappa =n-1$, that

$$\begin{aligned} \varepsilon ^n u(n)\le M_1n^2+M_2n+M_3\le M_4n^2. \end{aligned}$$

Here $n\in \mathbb Z_{\ge 3}$ and $M_3,M_4$ are some positive constants. Let q such that $1<q<\varepsilon $. We have

$$\begin{aligned} u(n)\le \frac{M_4n^2}{\varepsilon ^n}=\left( \frac{q}{\varepsilon }\right) ^n\cdot \frac{M_4n^2}{q^n}\le M_5\left( \frac{q}{\varepsilon }\right) ^n, \end{aligned}$$

where $M_5$ is some constant. The inequality above means that z(n) converges linearly to $x_*$. $\square $

4.2 Parameters Choices

Let us discuss now how to choose the parameters fulfilling all Assumptions in Theorem 4.2. Note that if $D_0,D_1,E_0$ satisfy

$$\begin{aligned} D_0,D_1,E_0>0, \end{aligned}$$

(4.22)

then conditions (4.16)–(4.21) hold by letting $\xi \rightarrow 0^+.$

The following result simplifies the assumption (4.22) in algebraic terms of the coefficients $\alpha _0,\alpha _1,\alpha _2$.

Corollary 4.3

Suppose that the operators A and B satisfy Assumption 3.1. Let $x_*$ be the unique solution of Problem (1.1). Let $\theta $ satisfy (3.4) and denote the parameters as in (3.5) (4.4), (4.5). Then z(n) converges linearly to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy

$$\begin{aligned}{} & {} \alpha _2<2, \end{aligned}$$

(4.23)

$$\begin{aligned}{} & {} \max \{0,\alpha _2-1\}<\alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\end{aligned}$$

(4.24)

$$\begin{aligned}{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{\alpha _1^2}{\alpha _1+2\alpha _2},1-\alpha _2+\alpha _1\right\} . \end{aligned}$$

(4.25)

Proof

Since $\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}$, we have $E_0>\alpha _2+3>0$. Also using $\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}$ and the fact that $\alpha _2<2$, we get

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{\alpha _2}{2}, \end{aligned}$$

(4.26)

which gives (4.2). It follows from (4.26) that $\alpha _1<\alpha _2$ and so

$$\begin{aligned} \frac{\ell \alpha _1}{3}>\frac{\ell \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0. \end{aligned}$$

The last inequality proves (4.3). Thus, Assumption 4.1 holds. Note that

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{2\alpha _2^2}{\alpha _2+3}, \end{aligned}$$

which gives

$$\begin{aligned} \frac{\ell \alpha _1\alpha _2}{2\alpha _2+3}>\frac{\ell \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0 \end{aligned}$$

and then $D_1>0.$ $\square $

Remark 4.4

Note that there are common choices of parameters satisfied both Corollary 3.5 (as $\varepsilon \rightarrow 0$) and Corollary 4.3 (as $\xi \rightarrow 0$). The reader can check the following selection

$$\begin{aligned}{} & {} \alpha _2<1,\\{} & {} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\\{} & {} \alpha _0<\ell \cdot \min \left\{ \frac{1}{3}\alpha _1\alpha _2,\,\frac{\alpha _1^2}{\alpha _1+2\alpha _2}\right\} . \end{aligned}$$

Remark 4.5

An important application of the monotone inclusion (1.1) is the following important optimization problem

$$\begin{aligned} \min _{x \in \mathcal {H}} f(x) + g(x), \end{aligned}$$

(4.27)

where $f: \mathcal {H}\rightarrow \mathbb {R} $ is a differentiable function with L-Lipschitz continuous gradient for some $L>0$ and $g: \mathcal {H}\rightarrow \mathbb {R} \cup \{+\infty \}$ is a proper and lower semicontinuous function.

Recall that the Fréchet subdifferential of g at x is defined by

$$\begin{aligned} \hat{\partial } g(x):=\left\{ u \in \mathcal {H}, \liminf _{y \rightarrow x}{ \frac{f(y)-f(x) - \langle u, y-x \rangle }{\Vert y-x\Vert } \ge 0} \right\} . \end{aligned}$$

It is well known that if g is differentiable at x, then $\hat{\partial } g(x) = \{\nabla g(x)\}$. When g is a convex function, the Fréchet subdifferential coincides with the classical convex subdifferential, i.e.

$$\begin{aligned} \hat{\partial } g(x) = \partial g(x)= \{ u \in \mathcal {H}: \, g(y) \ge g(x)+ \left\langle u,y - x \right\rangle \, \forall y \in \mathcal {H}\}. \end{aligned}$$

We notice that, if g is proper, $\gamma _g$-convex and lower semicontinuous then $\hat{\partial } g$ is maximally generalized $\gamma _g$-monotone. We assume that f and g are respectively $\gamma _f$ and $\gamma _g$ convex functions such that $\gamma = \gamma _f +\gamma _g >0$. Then the set of minimizers of (4.27) coincides with the solution set of the following monotone inclusion problem

$$\begin{aligned} \text { find } x^* \in \mathcal {H}\, \text {such that} \, \, 0 \in \nabla f(x^*) + \hat{\partial } g (x^*), \end{aligned}$$

(4.28)

for which the results obtained from previous Sections can be applied.

5 Strongly Pseudo-monotone Variational Inequality

Let C be a nonempty and closed convex subset of $\mathcal {H}$. The normal cone of C at x is defined as

$$\begin{aligned} N_C(x) = \{ u \in \mathcal {H}, \langle u, y-x \rangle \le 0, \quad \forall y \in C\}, \end{aligned}$$

which is maximally monotone [13]. In this section, we focus on the restrictive category of Problem (1.1) of the form

$$\begin{aligned} \text {find}\,x_*\in \mathcal {H}\hbox { such that}\quad 0\in A(x_*)+N_C(x_*). \end{aligned}$$

(5.1)

Note that, if A is $\gamma _A$-monotone, then (5.1) is a special case of (1.1). Indeed, the sum of two monotone operators is still monotone [13]. However, this is not the case if A is non-monotone (e.g. only pseudo-monotone). For example, the operator

$$\begin{aligned} A(x_1,x_2):= (x_1^2 + x_2^2) (-x_2, x_1)^T \end{aligned}$$

is pseudo-monotone but $A+\epsilon I$ is not (pseudo)-monotone for any $\epsilon > 0$ (see [41, Counterexample 2.1]).

In this section, we will consider the case when A is $\gamma $-strongly pseudo-monotone and hence the results obtained in the previous Sections cannot be directly applied. Problem (5.1) is equivalent to the variational inequality VI(A, C): find $x_*\in C$ such that

$$\begin{aligned} \left\langle {A(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$

(5.2)

For each $x\in \mathcal {H}$, there exists a unique point in C (see, e.g., [31]), denoted by $P_{C}(x)$, such that

$$\begin{aligned} \Vert x-P_{C}(x)\Vert \le \Vert x-y\Vert \quad \forall y\in C. \end{aligned}$$

Some well-known properties of the metric projection $P_{C}: \mathcal {H}\rightarrow C$ are given in the following lemma [25, 31].

Lemma 5.1

Assume that the set C is a closed convex subset of $\mathcal {H}$. Then we have the following:

(a)
$P_{C}(.)$ is a nonexpansive operator, i.e., for all $x,y\in \mathcal {H}$, it holds that
$$\begin{aligned} \Vert P_{C}(x)-P_{C}(y)\Vert \le \Vert x-y\Vert . \end{aligned}$$
(b)
For any $x\in \mathcal {H}$ and $y\in C$, it holds that
$$\begin{aligned} \left\langle x-P_{C}(x), y-P_{C}(x)\right\rangle \le 0. \end{aligned}$$

Assumption 5.1

(i) The coefficients $\alpha _0,\alpha _1,\alpha _2>0.$

(ii) The operator $A:\mathcal {H}\rightarrow \mathcal {H}$ is $\gamma $-strongly pseudo-monotone and L-Lipschitz continuous.

(iii) The parameter $\omega >0$ satisfies

$$\begin{aligned} \omega <\frac{4\gamma }{L^2}. \end{aligned}$$

(5.3)

Remark 5.2

Under Assumption 5.1 (ii) and (iii), the problem VI(A, C) has a unique solution [30].

We will need the following important estimate and error bounds.

Proposition 5.3

[44] Let $C\subset \mathcal {H}$ be a nonempty closed convex subset. Let A be an operator that is $\gamma $-strongly pseudo-monotone and L-Lipschitz on C. Let $x_*$ be the unique solution of Problem (5.2). For every $\omega >0$ and $x\in \mathcal {H},$ we have

$$\begin{aligned} \left\langle {x-P_C(x-\omega A(x))}, {x-x_*} \right\rangle \ge \left( 1-\frac{\omega L^2}{4\gamma }\right) \Vert x-P_C(x-\omega A(x))\Vert ^2 \end{aligned}$$

(5.4)

and

$$\begin{aligned} \Vert x-x_*\Vert \le \frac{1+\omega \gamma +\omega L}{\omega \gamma }\Vert x-P_C(x-\omega A(x))\Vert . \end{aligned}$$

(5.5)

In the whole section, we denote

$$\begin{aligned} \mu \triangleq 1-\frac{\omega L^2}{4\gamma },\quad \eta \triangleq \left( \frac{\omega \gamma }{1+\omega \gamma +\omega L}\right) ^2. \end{aligned}$$

(5.6)

5.1 Continuous Time

In this case, we consider

$$\begin{aligned} y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-P_C(y(t)-\omega A(y(t)))]=0, \end{aligned}$$

(5.7)

where $y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}$.

Denote

$$\begin{aligned} {\left\{ \begin{array}{ll} G_2\triangleq \frac{\mu \alpha _1}{\alpha _0},\\ G_1\triangleq \frac{\mu \alpha _2\alpha _1}{\alpha _0}-3,\\ G_0\triangleq \frac{\mu \alpha _1^2}{\alpha _0}-2\alpha _2, \end{array}\right. } {\left\{ \begin{array}{ll} H_1\triangleq \frac{\mu \alpha _2}{\alpha _0},\\ H_0\triangleq \frac{\mu }{\alpha _0}(\alpha _2^2-2\alpha _1), \end{array}\right. } K_0\triangleq \frac{\mu }{\alpha _0}. \end{aligned}$$

(5.8)

5.1.1 Global Exponential Convergence

Theorem 5.4

Suppose that Assumption 5.1 is satisfied. Let $x_*$ be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Assume that there exists $\varepsilon >0$ such that the following conditions hold

$$\begin{aligned}{} & {} -\varepsilon ^3+\alpha _2\varepsilon ^2-\alpha _1\varepsilon +\mu \eta \alpha _0\ge 0, \end{aligned}$$

(5.9)

$$\begin{aligned}{} & {} G_2\varepsilon ^2-G_1\varepsilon +G_0\ge 0, \end{aligned}$$

(5.10)

$$\begin{aligned}{} & {} 3\varepsilon ^2-2\alpha _2\varepsilon +\alpha _1\ge 0, \end{aligned}$$

(5.11)

$$\begin{aligned}{} & {} -2G_2\varepsilon +G_1\ge 0, \end{aligned}$$

(5.12)

$$\begin{aligned}{} & {} -H_1\varepsilon +H_0\ge 0, \end{aligned}$$

(5.13)

$$\begin{aligned}{} & {} \alpha _2>2\varepsilon . \end{aligned}$$

(5.14)

Then the trajectory $y(\cdot )$ generated by dynamical system (5.7) converges exponentially to $x_*$.

Proof

Consider the functions in (3.7). Similarly as (3.14), we also have

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)-3b_1^{(1)}(t)-2\alpha _2b_1(t)\nonumber \\{} & {} \quad =2\left\langle {y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad =2\alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle . \end{aligned}$$

(5.15)

On one hand, by (5.4), we can estimate

$$\begin{aligned}{} & {} \alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle \nonumber \\{} & {} \quad \le -\alpha _0\mu \Vert y(t)-P_C(y(t)-\omega A(y(t)))\Vert ^2 =-\frac{\mu }{\alpha _0}\Vert y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)\Vert ^2\nonumber \\{} & {} \quad =-\frac{\mu }{\alpha _0}[\alpha _1b_1^{(2)}(t)+\alpha _2\alpha _1b_1^{(1)}(t)+\alpha _1^2b_1(t)+\alpha _2b_2^{(1)}(t)+(\alpha _2^2-2\alpha _1)b_2(t)+b_3(t)].\nonumber \\ \end{aligned}$$

(5.16)

On the other hand, by (5.4) and (5.5), we get

$$\begin{aligned} \alpha _0\left\langle {[-y(t)+P_C(y(t)-\omega A(y(t)))]}, {y(t)-x_*} \right\rangle \le -\alpha _0\mu \eta a(t). \end{aligned}$$

(5.17)

Thus, using (5.16) and (5.17), we estimate (5.15) as follows

$$\begin{aligned}{} & {} a^{(3)}(t)+\alpha _2a^{(2)}(t)+\alpha _1a^{(1)}(t)+\alpha _0\mu \eta a(t)\nonumber \\{} & {} \quad +G_2b_1^{(2)}(t)+G_1b_1^{(1)}(t)+G_0b_1(t)+H_1b_2^{(1)}(t)+H_0b_2(t)+K_0b_3(t)\le 0.\nonumber \\ \end{aligned}$$

(5.18)

By arguments similar to those used in Theorem 3.2 but now applied to (5.18); meaning, do integrating after three times, we get the exponential convergence of $y(\cdot ).$ $\square $

5.1.2 Parameters Choices

Remark 5.5

If $G_0,G_1,H_0$ satisfy

$$\begin{aligned} G_0,G_1,H_0>0, \end{aligned}$$

(5.19)

then conditions (5.9)–(5.14) can be obtained by letting $\varepsilon \rightarrow 0^+.$

In the following result, we simplify the assumption (5.19) in the term of the upper and lower bounds of the coefficients $\alpha _0,\alpha _1,\alpha _2.$

Corollary 5.6

Suppose that Assumption 5.1 is satisfied. Let $x_*$ be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Then the trajectory $y(\cdot )$ generated by dynamical system (5.7) converges exponentially to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _1<\frac{\alpha _2^2}{2},\\{} & {} \alpha _0<\mu \cdot \min \left\{ \frac{\alpha _1\alpha _2}{3},\frac{\alpha _1^2}{2\alpha _2}\right\} . \end{aligned}$$

Now we examine Theorem 5.4 when $\varepsilon =1.$

Corollary 5.7

Suppose that Assumption 5.1 is satisfied. Let $x_*$ be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8) and

$$\begin{aligned} \psi \triangleq \frac{1}{\mu ^2\eta }. \end{aligned}$$

(5.20)

Then the trajectory $y(\cdot )$ generated by dynamical system (5.7) converges exponentially to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy the following conditions

$$\begin{aligned}{} & {} \alpha _2>\max \{3,3\psi +2,4\psi \}, \end{aligned}$$

(5.21)

$$\begin{aligned}{} & {} \underline{\beta }\triangleq \max \left\{ 2\alpha _2-3,\psi (2\alpha _2-3)\right\}<\alpha _1< \overline{\beta }\triangleq 0.5\alpha _2(\alpha _2-1), \end{aligned}$$

(5.22)

$$\begin{aligned}{} & {} \underline{q}\triangleq \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\alpha _0< \overline{p}\triangleq \mu \cdot \min \left\{ \dfrac{\alpha _1(\alpha _2-2)}{3}, \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}\right\} .\nonumber \\ \end{aligned}$$

(5.23)

Proof

First, we show that (5.21) ensures the validity of (5.22); that is $\underline{\beta }<\overline{\beta }$. Indeed, it follows from (5.21) that $\alpha _2>3$ and so $2\alpha _2-3\le \frac{\alpha _2(\alpha _2-1)}{2}$. Also from (5.21), we have $\alpha _2>4\psi $ and then

$$\begin{aligned} \dfrac{\alpha _2(\alpha _2-1)}{2}>\dfrac{\alpha _2(2\alpha _2-3)}{4}>\psi (2\alpha _2-3). \end{aligned}$$

Next, we show that (5.21)–(5.22) ensure the validity of (5.23); that is $\underline{q}<\overline{p}$. It results from (5.21) that $\alpha _2>3\psi +2$ and so

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\frac{\alpha _1}{\mu \eta } <\mu \cdot \dfrac{\alpha _1(\alpha _2-2)}{3}. \end{aligned}$$

Meanwhile, by (5.22), we have $\alpha _1>\psi (2\alpha _2-3)$, which gives

$$\begin{aligned} \dfrac{\alpha _1-\alpha _2+1}{\mu \eta }<\mu \cdot \dfrac{\alpha _1(\alpha _1-\alpha _2+1)}{2\alpha _2-3}. \end{aligned}$$

Now we can prove the exponential convergence of $y(\cdot )$ by using Theorem 5.4 for $\varepsilon =1$. $\square $

5.2 Discrete Time

We consider the difference equation

$$\begin{aligned} z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)+\alpha _0[z(n)-P_C(z(n)-\omega A(z(n)))]=0,\nonumber \\ \end{aligned}$$

(5.24)

where $\alpha _2,\alpha _1,\alpha _0,\omega >0$.

Denote

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} S_2\triangleq \frac{\mu \alpha _1}{\alpha _0}-3,\\ S_1\triangleq \frac{\mu \alpha _2\alpha _1}{\alpha _0}-2\alpha _2-3,\\ S_0\triangleq \frac{\mu \alpha _1^2}{\alpha _0}-2\alpha _2-\alpha _1, \end{array}\right. } {\left\{ \begin{array}{ll} T_1\triangleq \frac{\mu }{\alpha _0}(\alpha _2-2\alpha _1)+3,\\ T_0\triangleq \frac{\mu }{\alpha _0}(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)+\alpha _2+3, \end{array}\right. } \end{aligned}$$

(5.25)

$$\begin{aligned}{} & {} R_0\triangleq \frac{\mu }{\alpha _0}(1-\alpha _2+\alpha _1)-1. \end{aligned}$$

(5.26)

5.2.1 Global Exponential Convergence

Assumption 5.2

The coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy

$$\begin{aligned}{} & {} \frac{\mu }{\alpha _0}(1-\alpha _2+\alpha _1)> 1, \end{aligned}$$

(5.27)

$$\begin{aligned}{} & {} \frac{\mu }{\alpha _0}(2\alpha _1-\alpha _2)< 3, \end{aligned}$$

(5.28)

$$\begin{aligned}{} & {} \frac{\mu \alpha _1}{\alpha _0}>3, \end{aligned}$$

(5.29)

where $\mu $ is defined in (5.6).

Remark 5.8

Under Assumption 5.2, we have $R_0,T_1,S_2\ge 0.$

Theorem 5.9

Suppose that Assumptions 5.1 and 5.2 are satisfied. Let $x_*$ be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Assume that there exists $\xi >0,\xi \ne 1$ such that the following conditions hold

$$\begin{aligned}{} & {} -\xi ^3+\alpha _2\xi ^2-\alpha _1\xi +\mu \eta \alpha _0\ge 0, \end{aligned}$$

(5.30)

$$\begin{aligned}{} & {} S_2\xi ^2-S_1\xi +S_0\ge 0,\end{aligned}$$

(5.31)

$$\begin{aligned}{} & {} 3\xi ^2-2\alpha _2\xi +\alpha _1\ge 0,\end{aligned}$$

(5.32)

$$\begin{aligned}{} & {} -2S_2\xi +S_1\ge 0,\end{aligned}$$

(5.33)

$$\begin{aligned}{} & {} -T_1\xi +T_0\ge 0,\end{aligned}$$

(5.34)

$$\begin{aligned}{} & {} \alpha _2>3\xi . \end{aligned}$$

(5.35)

Then the sequence $z(\cdot )$ generated by (5.24) converges linearly to $x_*$.

Proof

Consider the functions (4.6). Similarly as (4.13), we also have

$$\begin{aligned}{} & {} u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)\nonumber \\{} & {} \qquad -3c_1^{\Delta ^{(2)}}(n)-(2\alpha _2+3)c_1^\Delta (n)-(2\alpha _2+\alpha _1)c_1(n)\nonumber \\{} & {} \qquad +3c_2^\Delta (n)+(\alpha _2+3)c_2(n)-c_3(n)\nonumber \\{} & {} \quad =2\left\langle {z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)}, {z(n)-x_*} \right\rangle \nonumber \\{} & {} \quad =2\alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle . \end{aligned}$$

(5.36)

On one hand, by (5.4), we can estimate

$$\begin{aligned}{} & {} \nonumber \alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle \\{} & {} \quad \le -\alpha _0\mu \Vert z(n)-P_C(z(n)-\omega A(z(n)))\Vert ^2 \nonumber \\{} & {} \quad =-\frac{\mu }{\alpha _0}\Vert z^{\Delta ^{(3)}}(n)+\alpha _2 z^{\Delta ^{(2)}}(n)+\alpha _1 z^{\Delta }(n)\Vert ^2\nonumber \\{} & {} \quad \nonumber =-\frac{\mu }{\alpha _0}[\alpha _1 c_1^{\Delta ^{(2)}}(n)+\alpha _2\alpha _1 c_1^\Delta (n)+\alpha _1^2 c_1(n)\\{} & {} \qquad +(\alpha _2-2\alpha _1)c_2^{\Delta }(n)+(\alpha _2^2-2\alpha _1-\alpha _2\alpha _1)c_2(n)+(1-\alpha _2+\alpha _1)c_3(n)].\nonumber \\ \end{aligned}$$

(5.37)

On the other hand, by (5.4) and (5.5), we get

$$\begin{aligned} \alpha _0\left\langle {-z(n)+P_C(z(n)-\omega A(z(n)))}, {z(n)-x_*} \right\rangle \le -\alpha _0\mu \eta u(n). \end{aligned}$$

(5.38)

Thus, using (5.37) and (5.38), we estimate (5.36) as follows

$$\begin{aligned}{} & {} \nonumber u^{\Delta ^{(3)}}(n)+\alpha _2 u^{\Delta ^{(2)}}(n)+\alpha _1 u^\Delta (n)+\alpha _0\mu \eta u(n)\\{} & {} \quad +S_2 c_1^{\Delta ^{(2)}}(n)+S_1 c_1^\Delta (n)+S_0c_1(n) +T_1 c_2^\Delta (n)+T_0c_2(n)+R_0c_3(n)\le 0.\nonumber \\ \end{aligned}$$

(5.39)

Setting

$$\begin{aligned} \varepsilon \triangleq \frac{1}{1-\xi }, \end{aligned}$$

then $\varepsilon >1$ and conditions can be written as

$$\begin{aligned}{} & {} \delta \alpha _0\varepsilon ^3+\alpha _1\varepsilon ^2(1-\varepsilon )+(\alpha _2\varepsilon +1-\varepsilon )(1-\varepsilon )^2\ge 0, \\{} & {} S_0\varepsilon ^2+S_1(1-\varepsilon )\varepsilon +S_2(1-\varepsilon )^2\ge 0,\\{} & {} \alpha _1\varepsilon ^2+2\alpha _2\varepsilon (1-\varepsilon )+3(1-\varepsilon )^2\ge 0,\\{} & {} S_1\varepsilon +2S_2(1-\varepsilon )\ge 0,\\{} & {} T_0\varepsilon +T_1(1-\varepsilon )\ge 0,\\{} & {} \varepsilon \alpha _2+3(1-\varepsilon )> 0. \end{aligned}$$

By arguments similar to those used in Theorem 4.2 but now applied to (5.39); meaning, do summing after three times, we get the exponential convergence of $z(\cdot ).$ $\square $

5.2.2 Parameters Choices

Remark 5.10

If $S_0,S_1,T_0$ satisfy

$$\begin{aligned} S_0,S_1,T_0>0, \end{aligned}$$

(5.40)

then we can get conditions (5.2.1)–(5.2.1) by letting $\xi \rightarrow 0.$

The following result simplifies condition (5.40) in the term of the lower and upper bounds of the coefficients $\alpha _0,\alpha _1,\alpha _2$. There are common choices of parameters satisfied both Corollary 5.6 and Corollary 5.11 below.

Corollary 5.11

Suppose that Assumptions 5.1 and 5.2 are satisfied. Let $x_*$ be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Then $z(\cdot )$ converges linearly to $x_*$ provided that coefficients $\alpha _0,\alpha _1,\alpha _2,\omega $ satisfy

$$\begin{aligned}{} & {} \alpha _2<2, \end{aligned}$$

(5.41)

$$\begin{aligned}{} & {} \max \{0,\alpha _2-1\}<\alpha _1<\frac{\alpha _2^2}{\alpha _2+2},\end{aligned}$$

(5.42)

$$\begin{aligned}{} & {} \alpha _0<\mu \cdot \min \left\{ \frac{\alpha _1^2}{\alpha _1+2\alpha _2},1-\alpha _2+\alpha _1\right\} . \end{aligned}$$

(5.43)

Proof

Since $\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}$, we have $T_0>\alpha _2+3>0$. Also using $\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}$ and the fact that $\alpha _2<2$, we get

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{\alpha _2}{2}, \end{aligned}$$

(5.44)

which gives (5.28). It follows from (5.44) that $\alpha _1<\alpha _2$ and so

$$\begin{aligned} \frac{\mu \alpha _1}{3}>\frac{\mu \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0. \end{aligned}$$

The last inequality proves (5.29). Thus, Assumption 5.2 holds. Note that

$$\begin{aligned} \alpha _1<\frac{\alpha _2^2}{\alpha _2+2}<\frac{2\alpha _2^2}{\alpha _2+3}, \end{aligned}$$

which gives

$$\begin{aligned} \frac{\mu \alpha _1\alpha _2}{2\alpha _2+3}>\frac{\mu \alpha _1^2}{\alpha _1+2\alpha _2}>\alpha _0 \end{aligned}$$

and then $S_1>0.$ $\square $

Remark 5.12

We consider the following optimization problem

$$\begin{aligned} \min _{x \in C} f(x), \end{aligned}$$

(5.45)

where C is a nonempty and closed subset of $\mathcal {H}$, $f: \mathcal {H}\rightarrow \mathbb {R} $ is a $\gamma $-strongly pseudo-convex on C and differentiable function with L-Lipschitz continuous gradient for some $L>0$. Recall that the differentiable function f is called $\gamma $-strongly pseudo convex if there exists $\gamma >0$ such that

$$\begin{aligned} \left\langle {\nabla f(x)}, {y-x} \right\rangle \ge 0\Longrightarrow \left\langle {\nabla f(y)}, {y-x} \right\rangle \ge \gamma \Vert x-y\Vert ^2 \end{aligned}$$

for all $x,y\in C$. For more details on generalized convexity functions and their characterization, the readers are referred to [29]. The optimization problem (5.45) is equivalent to the following strongly pseudo-monotone variational inequality

$$\begin{aligned} \left\langle {\nabla f(x_*)}, {y-x_*} \right\rangle \ge 0\quad \forall y\in C. \end{aligned}$$

(5.46)

As a consequence, all the results presented in this section can be applied directly to the pseudo-convex optimization problem (5.45).

References

Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Article MathSciNet Google Scholar
Antipin, A.S.: Continuous and iterative processes with projection operators and projection like operators. AN SSSR, Scientific Counsel on the Complex Problem Cybernetics, Moscow, pp. 5–43 (1989)
Antipin, A.S.: Minimization of convex functions on convex sets by means of differential equations. (Russian) Differentsial’nye Uravneniya 30(9), 1475–1486 (1994); translation in Differential Equations 30, 1365–1375 (1994)
Attouch, H., Alvarez, F.: The heavy ball with friction dynamical system for convex constrained minimization problems. In: Optimization (Namur, 1998), Lecture Notes in Economics and Mathematical Systems 481. Springer, Berlin, pp. 25–35 (2000)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184, 243–287 (2020)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward–backward algorithm for structured monotone inclusions. Appl. Math. Optim. 80, 547–598 (2019)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via a third-order in time evolution equation. Optimization 71, 1275–1304 (2022)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via a third-order in time evolution equation: TOGES-V an improved version of TOGES. Optimization (2022). https://doi.org/10.1080/02331934.2022.2119084
Article Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)
Article MathSciNet Google Scholar
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method. I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2, 1–34 (2000)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Backward–forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457, 1095–1117 (2018)
Article MathSciNet Google Scholar
Avriel, M., Diewert, W.E., Schaible, S., Zang, I.: Generalized Concavity. Society for Industrial and Applied Mathematics (2010)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, Berlin (2011)
Google Scholar
Bolte, J.: Continuous gradient projection method in Hilbert spaces. J. Optim. Theory Appl. 119, 235–259 (2003)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R.: Convergence rates for forward–backward dynamical systems associated with strongly monotone inclusions. J. Math. Anal. Appl. 457, 1135–1152 (2018)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R.: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dyn. Differ. Equat. 29, 155–168 (2017)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R., Vuong, P.T.: The forward–backward–forward method from discrete and continuous perspective for pseudo-monotone variational inequalities in Hilbert spaces. Eur. J. Oper. Res. 287, 49–60 (2020)
Article Google Scholar
Boţ, R.I., Sedlmayer, M., Vuong, P.T.: A relaxed inertial forward–backward–forward algorithm for solving monotone inclusions with application to GANs. J. Mach. Learn. Res. 24, 1–37 (2023)
MathSciNet Google Scholar
Cavazzuti, E., Pappalardo, P., Passacantando, M.: Nash equilibria, variational inequalities, and dynamical systems. J. Optim. Theory Appl. 114, 491–506 (2002)
Article MathSciNet Google Scholar
Dao, M., Phan, H.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators. SIAM J. Optim. 29, 2697–2724 (2019)
Article MathSciNet Google Scholar
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet Google Scholar
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet Google Scholar
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York (2003)
Google Scholar
Goebel, K., Reich, S.: Uniform Convexity, Hyperbolic Geometry, and Nonexpansive Mappings. Marcel Dekker, New York (1984)
Google Scholar
Guo, K., Han, D., Yuan, X.: Convergence analysis of Douglas–Rachford splitting method for strongly+ weakly convex programming. SIAM J. Numer. Anal. 55, 1549–1577 (2017)
Article MathSciNet Google Scholar
Ha, N.T.T., Strodiot, J.J., Vuong, P.T.: On the global exponential stability of a projected dynamical system for strongly pseudomonotone variational inequalities. Opt. Lett. 12, 1625–1638 (2018)
Article MathSciNet Google Scholar
Haraux, A.: Systemes Dynamiques Dissipatifs et Applications, Recherches en Mathematiques Appliquees 17. Masson, Paris (1991)
Google Scholar
Karamardian, S., Schaible, S.: Seven kinds of monotone maps. J. Optim. Theory Appl. 66, 37–46 (1990)
Article MathSciNet Google Scholar
Kim, D.S., Vuong, P.T., Khanh, P.D.: Qualitative properties of strongly pseudomonotone variational inequalities. Opt. Lett. 10, 1669–1679 (2016)
Article MathSciNet Google Scholar
Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Inequalities and Their Applications. Academic, New York (1980)
Google Scholar
Lions, P., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet Google Scholar
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Article MathSciNet Google Scholar
Nagurney, A., Zhang, D.: Projected Dynamical Systems and Variational Inequalities with Applications. Kluwer Academic, Norwell (1996)
Book Google Scholar
O’Connor, D., Vandenberghe, L.: On the equivalence of the primal–dual hybrid gradient method and Douglas–Rachford splitting. Math. Program. 179, 85–108 (2020)
Article MathSciNet Google Scholar
Pappalardo, M., Passacantando, M.: Stability for equilibrium problems: from variational inequalities to dynamical systems. J. Optim. Theory Appl. 113, 567–582 (2002)
Article MathSciNet Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Article Google Scholar
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Article MathSciNet Google Scholar
Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In: Advances in Neural Information Processing Systems (NIPS) 27 (2014)
Tam, N.N., Yao, J.C., Yen, N.D.: Solution methods for pseudomonotone variational inequalities. J. Optim. Theory Appl. 138, 253–273 (2008)
Article MathSciNet Google Scholar
Vinh, L.V., Tran, V.N., Vuong, P.T.: A second-order dynamical system for equilibrium problems. Numer. Algorithms 91, 327–351 (2022)
Article MathSciNet Google Scholar
Vuong, P.T.: The global exponential stability of a dynamical system for solving variational inequalities. Netw. Spat. Econ. 22, 395–407 (2022)
Article Google Scholar
Vuong, P.T.: A second order dynamical system and its discretization for strongly pseudo-monotone variational inequalities. SIAM J. Control Optim. 59, 2875–2897 (2021)
Article MathSciNet Google Scholar
Vuong, P.T., Strodiot, J.J.: A dynamical system for strongly pseudo-monotone equilibrium problems. J. Optim. Theory Appl. 185, 767–784 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers for their constructive comments, which help improving significantly the presentation of the paper. Additionally, the authors would like to extend their appreciation to the Vietnam Institute for Advanced Study in Mathematics (VIASM) for organizing the International Conference "New Trends in Numerical Optimization and Applications" in December, 2021. It was during this event that the authors first met and initiated fruitful discussions that ultimately led to this research project. P. T. Vuong thanks the London Mathematical Society (LMS) for supporting his visit to P.V. Hai at the Hanoi University of Science and Technology in February 2024, when the authors finished the final version of this paper.

Funding

P.V. Hai is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 101.02-2021.24.

Author information

Authors and Affiliations

Faculty of Mathematics and Informatics, Hanoi University of Science and Technology, Khoa Toan-Tin, Dai hoc Bach khoa Hanoi, 1 Dai Co Viet, Hanoi, Vietnam
Pham Viet Hai
School of Mathematical Sciences, University of Southampton, Southampton, SO17 1BJ, UK
Phan Tu Vuong

Authors

Pham Viet Hai
View author publications
You can also search for this author in PubMed Google Scholar
Phan Tu Vuong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phan Tu Vuong.

Additional information

Communicated by Boris S. Mordukhovich.

Dedicate to Professor Pham Ky Anh (Vietnam National University) on the occasion of his 75th birthday.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hai, P.V., Vuong, P.T. Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators. J Optim Theory Appl 202, 519–553 (2024). https://doi.org/10.1007/s10957-024-02437-y

Download citation

Received: 25 May 2023
Accepted: 04 April 2024
Published: 03 June 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s10957-024-02437-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators

Abstract

Similar content being viewed by others

Regularized dynamics for monotone inverse variational inequalities in hilbert spaces

The zeros of monotone operators for the variational inclusion problem in Hilbert spaces

Dynamical Systems for Solving Variational Inequalities

1 Introduction

1.1 Some Historical Aspects

1.2 Our Contributions

2 Preliminaries

2.1 Generalized Monotone Operators

Definition 2.1

Remark 2.2

Definition 2.3

Remark 2.4

Lemma 2.5

2.2 Absolutely Continuous Functions

Definition 2.6

Remark 2.7

Proposition 2.8

Proof

2.3 A Third Order Dynamical System

Definition 2.9

Proposition 2.10

Proof

Theorem 2.11

Proof

2.4 Difference Operators

Remark 2.12

Proposition 2.13

Proof

Remark 2.14

3 Continuous Time Dynamical System

Assumption 3.1

Remark 3.1

3.1 Global Exponential Convergence

Theorem 3.2

Proof

Remark 3.3

3.2 Parameters Choices

Remark 3.4

Corollary 3.5

Theorem 3.6

Proof

Theorem 3.7

Proof

4 Discrete Time Dynamical System

Assumption 4.1

Remark 4.1

4.1 Global Linear Convergence

Theorem 4.2

Proof

4.2 Parameters Choices

Corollary 4.3

Proof

Remark 4.4

Remark 4.5

5 Strongly Pseudo-monotone Variational Inequality

Lemma 5.1

Assumption 5.1

Remark 5.2

Proposition 5.3

5.1 Continuous Time

5.1.1 Global Exponential Convergence

Theorem 5.4

Proof

5.1.2 Parameters Choices

Remark 5.5

Corollary 5.6

Corollary 5.7

Proof

5.2 Discrete Time

5.2.1 Global Exponential Convergence

Assumption 5.2

Remark 5.8

Theorem 5.9

Proof

5.2.2 Parameters Choices

Remark 5.10

Corollary 5.11