Abstract
In this paper, we propose and analyze a third-order dynamical system for finding zeros of the sum of two generalized operators in a Hilbert space \(\mathcal {H}\). We establish the existence and uniqueness of the trajectories generated by the system under appropriate continuity conditions, and prove exponential convergence to the unique zero when the sum of the operators is strongly monotone. Additionally, we derive an explicit discretization of the dynamical system, which results in a forward–backward algorithm with double inertial effects and larger range of stepsize. We establish the linear convergence of the iterates to the unique solution using this algorithm. Furthermore, we provide convergence analysis for the class of strongly pseudo-monotone variational inequalities. We illustrate the effectiveness of our approach by applying it to structured optimization and pseudo-convex optimization problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In practical applications, many nonlinear phenomena can be represented as finding a zero of a monotone operator. This problem arises in various contexts, such as solving variational inequalities related to monotone operators, minimizing convex functions, finding fixed points of nonexpansive mappings, and more. One of the most widely used methods for solving this problem is the proximal point algorithm, which was originally proposed by Martinet and systematically studied by Rockafellar [39] in the context of Hilbert spaces.
Another important problem is to find a zero of the sum of two maximally monotone operators
Problem (1.1) arises in a wide range of applications such as convex optimization, image processing, and signal processing. A crucial special case of Problem (1.1) is the following variational inequality (VI) problem
where C is a nonempty closed convex subset of \(\mathcal {H}\) and \(N_C(x_*)\) is the normal cone of C at \(x_*.\) When B is single-valued, the VI problem (1.2) is equivalent to finding a point \(x_*\in C\) such that
The Douglas-Rachford splitting algorithm [22], presented by Lions and Mercier [32], is a fundamental method to solve such problems. Under additional assumptions on the involved operators, linear rates of convergence for the algorithm are possible. Some other splitting methods are derived from the Douglas-Rachford algorithm (such as the primal-dual hybrid gradient method [35], Alternating Direction Method of Multiplier (ADMM) [23], and Spingarn’s method of partial inverses [23]). There are many other methods for solving Problem (1.1), especially when one of the operators is single-valued. A popular method for solving this problem is the forward–backward algorithm, which consists of a forward step with one operator and a backward step with another. The algorithm generates a sequence of iterates that converges to a solution under suitable assumptions on the operators. The forward–backward algorithm has been widely studied and applied in both finite-dimensional and infinite-dimensional settings [11, 24, 33, 38].
Nowadays, there is a growing interest in connecting and integrating optimization with other fields. This research direction has become increasingly attractive as it can provide new insights into optimization results and lead to interesting findings. Among the emerging research directions, there is a line of works that uses ordinary differential equations (ODEs) to design algorithms for optimization problems [2, 3, 9, 14], variational inequalities [18, 27, 34, 43], monotone inclusions [1, 5, 6], fixed point problems [15, 17] and equilibrium problems [20, 36, 42, 45]. Using ODE interpretation not only provides a better understanding of Nesterov’s scheme, but also helps design new schemes with similar convergence rates. The readers can refer to [9, 10, 18, 40] and references therein for more examples.
1.1 Some Historical Aspects
The Heavy Ball with Friction method is a popular optimization algorithm based on inertial dynamics. The algorithm was proposed by Polyak to accelerate the gradient method in optimization [37]. It introduces an inertial system with a fixed viscous damping coefficient
for minimizing a convex and differentiable function f. Note that, when f has a Lipschitz continuous gradient then \(\nabla f\) is a co-coercive operator (see definition in Sect. 2). Attouch and Alvarez extended the heavy ball dynamical system (1.4) for constrained optimization as well as co-coercive operator in [4]. Recently, Boţ and Csetnek [15] studied the second order dynamical system with variable viscous damping coefficient
for finding a zero of a co-coercive operator B. The results were applied to second order forward–backward dynamical systems for monotone inclusion problems (1.1)
where A is maximal monotone and B is co-coercive. Here \(\mathcal {J}_A\triangleq (I+A)^{-1}\) is the resolvent of an operator A with I stands for the identity operator. When the operator is only merely monotone but not co-coercive, a second order forward–backward–forward dynamical system and its discretization have been recently proposed and investigated in [19]. In particular, when the operator \(A+B\) is strongly monotone, the exponential convergence rate of the second order dynamical system (1.5) was obtained in [16]. Under suitable conditions on parameters, the authors established the convergence rate of \(O(e^{-t})\) for the trajectories.
Attouch, Chbani and Riahi are the first authors who studied third order dynamical system for minimizing a convex and differentiable function in Hilbert spaces [7, 8]. They proposed and studied the (TOGES) dynamical system [7]
Using the temporal scaling techniques, the third order dynamical system (1.6) was reformulated as a second order dynamical system and the convergence analysis was obtained using Lyapunov’s energy function techniques developed for second order dynamical system. The authors showed a convergence rate of the values of the order \(\frac{1}{t^3}\), i.e. \( f(x(t) + t x^{(1)}(t)) - \inf _{\mathcal {H}}f \le \frac{C}{t^3}\) for some constant \(C>0\) and obtained the convergence of the trajectories towards optimal solutions of \(\min _{x\in \mathcal {H}} f(x)\). When the objective function f is strongly convex, the authors established an exponential rate of convergence. Proximal-based algorithms obtained by temporal discretization of (TOGES) was also investigated. Nevertheless, the rate of values of f(x(t)) in (TOGES) in only of order \(\frac{1}{t}\), i.e. \( f(x(t) ) - \inf _{\mathcal {H}}f \le \frac{C}{t}\), which is not completely satisfactory from the point of view of fast optimization. Hence, very recently, an improved version of (TOGES), called (TOGES-V) has been proposed and investigated by the same authors in [8]
where they obtained the rate \(\mathcal {O}\left( \frac{1}{t^3}\right) \) for \(f(x(t) ) - \inf _{\mathcal {H}} f\).
1.2 Our Contributions
In this paper, we propose for the first time a third order dynamical system for the monotone inclusion (1.1) and investigate its convergence properties in both continuous time and discrete time settings. The motivation of considering third (or higher) order dynamical system comes from the fact that it can potentially provide faster convergence rate, as seen in optimization problems [7, 8]. This will be also the case of monotone inclusion obtained in this paper. Indeed, we derive the convergence rate of \(O(e^{-\varepsilon t})\) for some \(\varepsilon > 1\) (in particular for \(\varepsilon = 2\)) under suitable choices of parameters, which is significantly faster than the classical results obtained in [16] for second order dynamical systems. In discrete setting, the third order dynamical system provides a new forward backward algorithm with double momentum and a larger range of stepsize.
In contrast to the classical monotone inclusion problem, where each individual operator A and B is required to be (maximally) monotone, we only require A and B to be generalized monotone (see Definitions in Sect. 2). This approach allows us to handle not only the classical monotone inclusion problem but also the problem of finding zeros of the sum of a weakly monotone operator and a strongly monotone operator, and the pseudo-monotone variational inequalities. Applications of these models include minimizing the sum of a weakly convex function and a strongly convex function [21, 26] or minimizing a pseudo-convex function. The convergence analysis developed in this paper is purely relied on Lyapunov’s energy function techniques, in contrast to the temporal scaling technique using in [7, 8]. In summary, our contributions are as follows:
-
Propose a third order dynamical system for the sum of two generalized monotone operators.
-
Establish the existence and uniqueness of the trajectories generated by the proposed dynamical system.
-
Provide the exponential convergence analysis of the trajectories to the unique solution of the inclusion, and show that it is faster than classical results.
-
Investigate the temporal discretization of the system and prove the linear convergence of the corresponding forward–backward algorithm with double inertial effects.
-
Study the third order dynamical system for strongly pseudo-monotone variational inequalities.
The paper is structured as follows. In Sect. 2, we introduce some terminologies and results that are necessary for the analysis presented in the subsequent sections. In Sects. 3 and 4, we focus on solving Problem (1.1) under the assumption of generalized monotonicity of the operators involved. In Sect. 3, we propose a third-order dynamical system and establish its exponential convergence to the zero of Problem (1.1). The explicit discretization of this system leads to a new forward backward algorithm studied in Sect. 4. In Sect. 5, motivated by the third-order dynamical system, we find the solution of Problem (1.2) under the assumption of strong pseudo-monotonicity of the operator B.
2 Preliminaries
We start the section with listing the notations used. The set of integers is denoted by \(\mathbb Z\) and the set of real numbers is denoted by \(\mathbb R\). Let \(\mathbb Z_{\ge 1}=\{j\in \mathbb Z:j\ge 1\}\) and \(\mathbb R_{\ge 0}=\{t\in \mathbb R:t\ge 0\}\). The symbol \(g^{(k)}\) stands for the k-th derivative of the function g.
Throughout this work \(\mathcal {H}\) is a real Hilbert space with inner product \(\left<\cdot , \cdot \right>\) and induced norm \(\Vert \cdot \Vert \). We use the notation \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) to indicate that A is a set-valued operator defined on \(\mathcal {H}\), and \(A:\mathcal {H}\rightarrow \mathcal {H}\) to indicate that A is a single-valued operator on \(\mathcal {H}\).
Let A be an operator on \(\mathcal {H}\). The graph of A is \(\text {Gra}A = \{(x,u) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}\). The inverse of A, denoted by \(A^{-1}\), is the operator with graph \(\text {Gra}A^{-1} = \{(u,x) \in \mathcal {H}\times \mathcal {H}, u \in A(x)\}\).
2.1 Generalized Monotone Operators
We first recall some generalized versions of monotone operator defined and studied in [21, 29].
Definition 2.1
The operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is called \(\gamma _A\)-monotone if there exists a scalar \(\gamma _A \in \mathbb {R}\) such that
The constant \(\gamma _A\) is referred to the monotonicity modulus of A. We also say that A is maximally \(\gamma _A\)-monotone if it is \(\gamma _A\)-monotone and there is no \(\gamma _A\)-monotone operator whose graph strictly contains GraA.
Remark 2.2
Note that in the definition of generalized monotonicity, \(\gamma _A\) can be negative. If \(\gamma _A = 0\), then the generalized monotonicity reduces to the classical monotonicity. If \(\gamma _A>0\), then A is strongly monotone. Finally, if \(\gamma _A<0\) then A is called weakly-monotone. For more detailed discussion on (maximally) monotone operators and the connection to optimization problems, we refer the readers to [12, 13, 21].
Definition 2.3
The single-valued operator \(T:\mathcal {H}\rightarrow \mathcal {H}\) is called
-
1.
\(\gamma _T\)-strongly pseudo-monotone if \(\gamma _T>0\) and
$$\begin{aligned} \left\langle {T(x)}, {y-x} \right\rangle \ge 0\Longrightarrow \left\langle {T(y)}, {y-x} \right\rangle \ge \gamma _T\Vert x-y\Vert ^2 \end{aligned}$$for all \(x,y\in \mathcal {H}\).
-
2.
\(\gamma _T\)-cocoercive if \(\gamma _T>0\) and
$$\begin{aligned} \left\langle {T(x)-T(y)}, {x-y} \right\rangle \ge \gamma _T\Vert T(x)-T(y)\Vert ^2\quad \forall x,y\in \mathcal {H}. \end{aligned}$$ -
3.
\(L_T\)-Lipschitz continuous if \(L_T>0\) and
$$\begin{aligned} \Vert T(x)-T(y)\Vert \le L_T\Vert x-y\Vert \quad \forall x,y\in \mathcal {H}. \end{aligned}$$
Remark 2.4
It is clear from the Cauchy–Schwartz inequality that if T is \(\gamma _T\)-co-coercive then it is \(1/\gamma _T\)-Lipschitz continuous.
The resolvent of an operator A is denoted as \(\mathcal {J}_A\triangleq (I+A)^{-1}\), where I is the identity operator. We will need the following properties of resolvent operator.
Lemma 2.5
[21] Let \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) be an \(\gamma _A\)-monotone operator and let \(\omega >0 \) such that \(1+\omega \gamma _A > 0\). Then the followings hold:
-
1.
\(\mathcal {J}_{\omega A}\) is single-valued;
-
2.
\(\mathcal {J}_{\omega A}\) is \((1+\omega \gamma _A)\)-co-coercive;
-
3.
dom \(\mathcal {J}_{\omega A} = \mathcal {H}\) if and only if A is maximally \(\gamma _A\)-monotone.
2.2 Absolutely Continuous Functions
Definition 2.6
A function \(h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d\) is called locally absolutely continuous if it is absolutely continuous on every compact interval, which means that for each interval \([t_0,t_1]\) there exists an integrable function \(g:[t_0,t_1)\rightarrow \mathbb R^d\) such that
Remark 2.7
If \(h:\mathbb R_{\ge 0}\rightarrow \mathbb R^d\) is a locally absolutely continuous function, then it is differentiable almost everywhere and its derivative agrees with its distributional derivative almost everywhere.
Proposition 2.8
For \(s,u\ge 0\) and \(m\in \mathbb Z_{\ge 1}\), it holds
Proof
The case when \(m=1\) is done by using integration by parts. Now we suppose that the conclusion holds for m and prove the case \(m+1.\) Indeed, we have
which together with the induction assumption completes the proof. \(\square \)
2.3 A Third Order Dynamical System
In this paper, we propose the following dynamical system for Problem (1.1).
where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\) and \(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).
The solution of dynamical system (2.1) is understood in the following sense.
Definition 2.9
A function \(y(\cdot )\) is called a strong global solution of Eq. (2.1) if it holds:
-
1.
For every \(j\in \{0,1,2,3\}\), \(y^{(j)}:[t_0,+\infty )\rightarrow \mathcal {H}\) is locally absolutely continuous; in other words, absolutely continuous on each interval \([\delta ,\eta ]\) for \(\eta>\delta >t_0\).
-
2.
\(y^{(3)}(t)+\alpha _2y^{(2)}(t)+\alpha _1y^{(1)}(t)+\alpha _0[y(t)-\mathcal {J}_{\omega A}(y(t)-\omega B(y(t)))]=0\) for almost every \(t\ge t_0\).
-
3.
\(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).
Proposition 2.10
(Equivalent form) Equation (2.1) is equivalent to the system \(x^{(1)}(t)=G(x(t))\), where \(G:\mathcal {H}\times \mathcal {H}\times \mathcal {H}\rightarrow \mathcal {H}\times \mathcal {H}\times \mathcal {H}\) is defined by
where \((x_1,x_2,x_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}.\)
Proof
The conclusion follows from doing the change of variables
\(\square \)
Theorem 2.11
(Existence and uniqueness of a solution) Consider dynamcial system (2.1), where \(\alpha _0,\alpha _1,\alpha _2,\omega >0\) and the operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is \(\gamma _A\)-maximally monotone, \(B:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma _B\)-monotone and L-Lipschitz such that \(1+\omega \gamma _A > 0\). Then for each \(v_0,v_1,v_2\in \mathcal {H}\) there exists a unique strong global solution of (2.1).
Proof
We endow \(\mathcal {H}\times \mathcal {H}\times \mathcal {H}\) with scalar product
We show that the operator G is Lipschitz. Indeed, let \(y=(y_1,y_2,y_3),z=(z_1,z_2,z_3)\in \mathcal {H}\times \mathcal {H}\times \mathcal {H}\). We have
and so
By using the Cauchy–Picard theorem (see, for example, [28, Proposition 6.2.1]), we get the existence and uniqueness of a strong global solution. \(\square \)
2.4 Difference Operators
In the section, we give the discrete counterpart of the dynamical system (2.1). To that aim, we recall the operation of forward difference and its properties used in the convergence analysis. For \(z:\mathbb Z\rightarrow \mathcal {H}\) and \(\kappa \in \mathbb Z_{\ge 1}\), we denote
Remark 2.12
Let \(f,g,h:\mathbb Z\rightarrow \mathcal {H}\) and \(\theta \in \mathbb R\). It can be proven that
and consequently
Consider the difference equation, which is the discrete version of (2.1):
where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\).
Proposition 2.13
(Equivalent form) Equation (2.2) has an equivalent form
Proof
The proof makes use of the facts that
\(\square \)
Remark 2.14
The numerical scheme (2.3) can be re-written as
which is a forward–backward algorithm with double momentum.
3 Continuous Time Dynamical System
In this section, we will establish the exponential convergence of dynamical system (2.1) under the following assumption and notations.
Assumption 3.1
(i) The coefficients \(\alpha _0,\alpha _1,\alpha _2>0\).
(ii) The operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is maximally \(\gamma _A\)-monotone, \(B:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma _B\)-monotone and L-Lipschitz continuous such that
(iii) The parameter \(\omega >0\) satisfies
Remark 3.1
Condition (3.1) implies that the sum operator \(A+B\) is strongly monotone but not the individual operator A, B. A similar condition was studied in [21]. An direct application of this model is to minimize a sum of a weakly convex function and a strongly convex function. Condition (3.2) is imposed to ensure that the resolvent operator \(\mathcal {J}_{\omega A}\) is single valued. Finally, condition (3.3) means that the stepsize \(\omega \) must be bounded from above. Note that condition (3.3) gives
Hence, we can find \(\theta >0\) such that
The following notations are used.
3.1 Global Exponential Convergence
First, we consider the dynamical system (2.1) whose the global convergence relates to the following parameters
We denote the functions
Theorem 3.2
Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6). Assume that there exists \(\varepsilon >0\) such that the following conditions hold
Then the trajectories \(y(\cdot )\) generated by dynamical system (2.1) converges exponentially to \(x_*\), i.e., there exist positive numbers \(\mu , \eta \) such that
Proof
In the next arguments, we often use the identities:
Since
we have
We observe
Using the definition of resolvent, equation (2.1) gives the following
which combined with \(0\in (A+B)(x_*)\) and the \(\gamma \)-monotonicity of \(A+B\) implies
Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then
Note that by the Cauchy–Schwarz inequality
Thus, we get
Note that
Inserting the equality above into (3.16), we obtain
which implies, by (3.14) and (3.15), that
By (3.4), we have \(\ell >0\) and so is \(C_0\). Thus, we can write
Multiplying both sides by \(e^{\varepsilon (t-t_0)}\) and then using Proposition 2.8, we get
for some constant \(D_1\), which implies, after using (3.8), (3.9), (3.12) and (3.4), that
Intergrating the above inequality with respect to the variable \(s\in [t_0;t]\) we deduce
for some constant \(D_2\). Using (3.10), (3.11), (3.4), we get
Note that equation (3.18) reduces to the following
for some constant \(D_3\).
-
If \(\alpha _2\ge 3\varepsilon \), then \(e^{(\alpha _2-3\varepsilon )(s-t_0)}\le e^{(\alpha _2-3\varepsilon )(t-t_0)}\), and so
$$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}D_3 +e^{-\varepsilon (t-t_0)}\int \limits _{t_0}^t(D_1s+D_2)\,ds. \end{aligned}$$(3.19) -
If \(2\varepsilon<\alpha _2<3\varepsilon \), then \(e^{(\alpha _2-3\varepsilon )(s-t_0)}\le 1\), and so
$$\begin{aligned} a(t)\le e^{-(\alpha _2-2\varepsilon )(t-t_0)}\left( D_3 +\int \limits _{t_0}^t(D_1s+D_2)\,ds\right) . \end{aligned}$$(3.20)
The arguments above show that \(y(\cdot )\) converges exponentially to \(x_*\). \(\square \)
Remark 3.3
It follows from (3.19) that a(t) converges to 0 with the rate of \(O((Pt^2 +Qt +Rt )e^{-\varepsilon t})\) for some constants P, Q, R, while the rate obtained from (3.20) is \(O((Pt^2 +Qt +Rt )e^{-(\alpha _2 - 2 \varepsilon ) t})\). With the suitable choice of \(\varepsilon \) and \(\alpha _2\), these rates can be controlled so that they are faster than the rate \(O(e^{-t})\) of the second order dynamical systems established in [16].
3.2 Parameters Choices
We now discuss the question “how to find \(\varepsilon \)?”. It can be seen from (3.19) that the larger \(\varepsilon \) implies the faster rate. Finding the maximal value of \(\varepsilon \) is cumbersome as it depends on many other parameters. However, we will discuss how to find a "good enough" \(\varepsilon \) in this section. The following remark offers a way which concerns the coefficients.
Remark 3.4
If \(A_0,A_1,B_0\) satisfy
then conditions (3.8)–(3.13) can be obtained by letting \(\varepsilon \rightarrow 0^+.\)
In the following result, we simplify the assumption (3.21) in algebraic terms of the coefficients \(\alpha _0,\alpha _1,\alpha _2.\)
Corollary 3.5
Consider equation (2.1), under Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4). Denote (3.5)–(3.6). Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions
Let us first examine Theorem 3.2 when \(\varepsilon =1\), for which it matches the rate obtained for the second order dynamical system established in [16].
Theorem 3.6
Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and
Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy
Proof
First, we show that (3.25) ensures the validity of (3.26); that is \(\underline{\beta }<\overline{\beta }\). Indeed, it follows from (3.25) that \(\alpha _2>3\) and so \(2\alpha _2-3<\frac{\alpha _2(\alpha _2-1)}{2}\). Also from (3.25), we have \(\alpha _2>4\varphi \) and then
Next, we show that (3.25)–(3.26) ensure the validity of (3.27); that is \(\underline{q}<\overline{p}\). It results from (3.25) that \(\alpha _2>3\varphi +2\) and so
Meanwhile, by (3.26), we have \(\alpha _1>\varphi (2\alpha _2-3)\), which gives
Now we can obtain the exponential convergence of \(y(\cdot )\) by using Theorem 3.2 for \(\varepsilon =1\). \(\square \)
Now let us examine Theorem 3.2 when \(\varepsilon =2\). In this case, we will obtain from (3.19) that the convergence rate of a(t) is
which is faster than the rate \(O(e^{-t})\) obtained in [16] for the second order dynamical system.
Theorem 3.7
Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5)–(3.6) and (3.24). Then \(y(\cdot )\) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy
Proof
Like in Theorem 3.6, we must check \(\underline{\beta }<\overline{\beta }\). Indeed, we have
Next is to prove \(\underline{q}<\overline{p}\). It follows from (3.29) that
which gives
Again using (3.29), we get \(\alpha _1>4\varphi (\alpha _2-3)\), which gives
We observe \(\alpha _1(\alpha _2-6\varphi -4)+12\varphi (\alpha _2-2)>0\), which is equivalent to saying that
Hence, the inequality \(\underline{q}<\overline{p}\) follows from (3.33)–(3.34). We left the reader to checking (3.8)–(3.13). Thus, \(y(\cdot )\) converges exponentially to \(x_*\). Moreover, it follows from (3.19) that the convergence rate is
for some constants P, Q, R. \(\square \)
4 Discrete Time Dynamical System
In this section, we establish the linear convergence of the numerical scheme (2.3) for solving (1.1) under the following additional assumption.
Assumption 4.1
The coefficients \(\alpha _0,\alpha _1,\alpha _2\) satisfy
where \(\ell \) is defined in (3.5).
We denote the following parameters
and
Remark 4.1
Under Assumption 4.1, we have \(F_0,E_1,D_2>0.\) Note also that under Assumption 3.1, the stepsize \(\omega \) must be bounded from above, i.e.
This upper bound of \(\omega \) is larger than that of the classical forward–backward algorithm, which is \(\omega < \frac{2\gamma }{L^2}\) (see e.g. [13, Proposition 25.9]) when A is maximal monotone and B is \(\gamma -\) strongly monotone and L-Lipschitz continuous.
4.1 Global Linear Convergence
Theorem 4.2
Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5), (4.4), (4.5) and Assumption 4.1 holds. Assume that there exists \(\xi >0,\xi \ne 1\) such that the following conditions hold
Then z(n) converges linearly to \(x_*\), i.e. there exist \(M>0\) and \(q \in (0,1)\) such that
Proof
Since
we have
We observe
Using the definition of resolvent, equation (2.2) gives
which combined with \(0\in (A+B)(x_*)\) and the \(\gamma \)-monotonicity of \(A+B\) implies
Since the operator B is L-Lipschitz, we can estimate the right hand side of the inequality above and then
Note that by the Cauchy–Schwarz inequality
Thus, we get
Note that
Inserting the equality above into (4.15), we get
which implies, by (4.13) and (4.14), that
By (4.1), the inequality above gives
Setting
Then \(\varepsilon >1\) and conditions (4.7)–(4.12) can be written as
Through multiplying both sides by \(\varepsilon ^{n+3}\) and then using Remark 2.12,
Let \(m\in \mathbb Z_{\ge 1}\). After summing from \(n=0\) to \(n=m-1\),
where \(M_1\) is some positive constant. Again using Remark 2.12,
Let \(\kappa \in \mathbb Z_{\ge 2}\). After summing from \(m=1\) to \(m=\kappa -1\),
where \(M_2\) is some positive constant. Again using Remark 2.12,
which implies, after summing from \(\kappa =2\) to \(\kappa =n-1\), that
Here \(n\in \mathbb Z_{\ge 3}\) and \(M_3,M_4\) are some positive constants. Let q such that \(1<q<\varepsilon \). We have
where \(M_5\) is some constant. The inequality above means that z(n) converges linearly to \(x_*\). \(\square \)
4.2 Parameters Choices
Let us discuss now how to choose the parameters fulfilling all Assumptions in Theorem 4.2. Note that if \(D_0,D_1,E_0\) satisfy
then conditions (4.16)–(4.21) hold by letting \(\xi \rightarrow 0^+.\)
The following result simplifies the assumption (4.22) in algebraic terms of the coefficients \(\alpha _0,\alpha _1,\alpha _2\).
Corollary 4.3
Suppose that the operators A and B satisfy Assumption 3.1. Let \(x_*\) be the unique solution of Problem (1.1). Let \(\theta \) satisfy (3.4) and denote the parameters as in (3.5) (4.4), (4.5). Then z(n) converges linearly to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy
Proof
Since \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\), we have \(E_0>\alpha _2+3>0\). Also using \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\) and the fact that \(\alpha _2<2\), we get
which gives (4.2). It follows from (4.26) that \(\alpha _1<\alpha _2\) and so
The last inequality proves (4.3). Thus, Assumption 4.1 holds. Note that
which gives
and then \(D_1>0.\) \(\square \)
Remark 4.4
Note that there are common choices of parameters satisfied both Corollary 3.5 (as \(\varepsilon \rightarrow 0\)) and Corollary 4.3 (as \(\xi \rightarrow 0\)). The reader can check the following selection
Remark 4.5
An important application of the monotone inclusion (1.1) is the following important optimization problem
where \(f: \mathcal {H}\rightarrow \mathbb {R} \) is a differentiable function with L-Lipschitz continuous gradient for some \(L>0\) and \(g: \mathcal {H}\rightarrow \mathbb {R} \cup \{+\infty \}\) is a proper and lower semicontinuous function.
Recall that the Fréchet subdifferential of g at x is defined by
It is well known that if g is differentiable at x, then \(\hat{\partial } g(x) = \{\nabla g(x)\}\). When g is a convex function, the Fréchet subdifferential coincides with the classical convex subdifferential, i.e.
We notice that, if g is proper, \(\gamma _g\)-convex and lower semicontinuous then \(\hat{\partial } g\) is maximally generalized \(\gamma _g\)-monotone. We assume that f and g are respectively \(\gamma _f\) and \(\gamma _g\) convex functions such that \(\gamma = \gamma _f +\gamma _g >0\). Then the set of minimizers of (4.27) coincides with the solution set of the following monotone inclusion problem
for which the results obtained from previous Sections can be applied.
5 Strongly Pseudo-monotone Variational Inequality
Let C be a nonempty and closed convex subset of \(\mathcal {H}\). The normal cone of C at x is defined as
which is maximally monotone [13]. In this section, we focus on the restrictive category of Problem (1.1) of the form
Note that, if A is \(\gamma _A\)-monotone, then (5.1) is a special case of (1.1). Indeed, the sum of two monotone operators is still monotone [13]. However, this is not the case if A is non-monotone (e.g. only pseudo-monotone). For example, the operator
is pseudo-monotone but \(A+\epsilon I\) is not (pseudo)-monotone for any \(\epsilon > 0\) (see [41, Counterexample 2.1]).
In this section, we will consider the case when A is \(\gamma \)-strongly pseudo-monotone and hence the results obtained in the previous Sections cannot be directly applied. Problem (5.1) is equivalent to the variational inequality VI(A, C): find \(x_*\in C\) such that
For each \(x\in \mathcal {H}\), there exists a unique point in C (see, e.g., [31]), denoted by \(P_{C}(x)\), such that
Some well-known properties of the metric projection \(P_{C}: \mathcal {H}\rightarrow C\) are given in the following lemma [25, 31].
Lemma 5.1
Assume that the set C is a closed convex subset of \(\mathcal {H}\). Then we have the following:
-
(a)
\(P_{C}(.)\) is a nonexpansive operator, i.e., for all \(x,y\in \mathcal {H}\), it holds that
$$\begin{aligned} \Vert P_{C}(x)-P_{C}(y)\Vert \le \Vert x-y\Vert . \end{aligned}$$ -
(b)
For any \(x\in \mathcal {H}\) and \(y\in C\), it holds that
$$\begin{aligned} \left\langle x-P_{C}(x), y-P_{C}(x)\right\rangle \le 0. \end{aligned}$$
Assumption 5.1
(i) The coefficients \(\alpha _0,\alpha _1,\alpha _2>0.\)
(ii) The operator \(A:\mathcal {H}\rightarrow \mathcal {H}\) is \(\gamma \)-strongly pseudo-monotone and L-Lipschitz continuous.
(iii) The parameter \(\omega >0\) satisfies
Remark 5.2
Under Assumption 5.1 (ii) and (iii), the problem VI(A, C) has a unique solution [30].
We will need the following important estimate and error bounds.
Proposition 5.3
[44] Let \(C\subset \mathcal {H}\) be a nonempty closed convex subset. Let A be an operator that is \(\gamma \)-strongly pseudo-monotone and L-Lipschitz on C. Let \(x_*\) be the unique solution of Problem (5.2). For every \(\omega >0\) and \(x\in \mathcal {H},\) we have
and
In the whole section, we denote
5.1 Continuous Time
In this case, we consider
where \(y^{(j)}(t_0)=v_j,\,j\in \{0,1,2\}\).
Denote
5.1.1 Global Exponential Convergence
Theorem 5.4
Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Assume that there exists \(\varepsilon >0\) such that the following conditions hold
Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\).
Proof
Consider the functions in (3.7). Similarly as (3.14), we also have
On one hand, by (5.4), we can estimate
On the other hand, by (5.4) and (5.5), we get
Thus, using (5.16) and (5.17), we estimate (5.15) as follows
By arguments similar to those used in Theorem 3.2 but now applied to (5.18); meaning, do integrating after three times, we get the exponential convergence of \(y(\cdot ).\) \(\square \)
5.1.2 Parameters Choices
Remark 5.5
If \(G_0,G_1,H_0\) satisfy
then conditions (5.9)–(5.14) can be obtained by letting \(\varepsilon \rightarrow 0^+.\)
In the following result, we simplify the assumption (5.19) in the term of the upper and lower bounds of the coefficients \(\alpha _0,\alpha _1,\alpha _2.\)
Corollary 5.6
Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8). Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions
Now we examine Theorem 5.4 when \(\varepsilon =1.\)
Corollary 5.7
Suppose that Assumption 5.1 is satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.8) and
Then the trajectory \(y(\cdot )\) generated by dynamical system (5.7) converges exponentially to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy the following conditions
Proof
First, we show that (5.21) ensures the validity of (5.22); that is \(\underline{\beta }<\overline{\beta }\). Indeed, it follows from (5.21) that \(\alpha _2>3\) and so \(2\alpha _2-3\le \frac{\alpha _2(\alpha _2-1)}{2}\). Also from (5.21), we have \(\alpha _2>4\psi \) and then
Next, we show that (5.21)–(5.22) ensure the validity of (5.23); that is \(\underline{q}<\overline{p}\). It results from (5.21) that \(\alpha _2>3\psi +2\) and so
Meanwhile, by (5.22), we have \(\alpha _1>\psi (2\alpha _2-3)\), which gives
Now we can prove the exponential convergence of \(y(\cdot )\) by using Theorem 5.4 for \(\varepsilon =1\). \(\square \)
5.2 Discrete Time
We consider the difference equation
where \(\alpha _2,\alpha _1,\alpha _0,\omega >0\).
Denote
5.2.1 Global Exponential Convergence
Assumption 5.2
The coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy
where \(\mu \) is defined in (5.6).
Remark 5.8
Under Assumption 5.2, we have \(R_0,T_1,S_2\ge 0.\)
Theorem 5.9
Suppose that Assumptions 5.1 and 5.2 are satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Assume that there exists \(\xi >0,\xi \ne 1\) such that the following conditions hold
Then the sequence \(z(\cdot )\) generated by (5.24) converges linearly to \(x_*\).
Proof
Consider the functions (4.6). Similarly as (4.13), we also have
On one hand, by (5.4), we can estimate
On the other hand, by (5.4) and (5.5), we get
Thus, using (5.37) and (5.38), we estimate (5.36) as follows
Setting
then \(\varepsilon >1\) and conditions can be written as
By arguments similar to those used in Theorem 4.2 but now applied to (5.39); meaning, do summing after three times, we get the exponential convergence of \(z(\cdot ).\) \(\square \)
5.2.2 Parameters Choices
Remark 5.10
If \(S_0,S_1,T_0\) satisfy
then we can get conditions (5.2.1)–(5.2.1) by letting \(\xi \rightarrow 0.\)
The following result simplifies condition (5.40) in the term of the lower and upper bounds of the coefficients \(\alpha _0,\alpha _1,\alpha _2\). There are common choices of parameters satisfied both Corollary 5.6 and Corollary 5.11 below.
Corollary 5.11
Suppose that Assumptions 5.1 and 5.2 are satisfied. Let \(x_*\) be the unique solution of Problem (5.2). Let the parameters be denoted by (5.6) and (5.25)–(5.26). Then \(z(\cdot )\) converges linearly to \(x_*\) provided that coefficients \(\alpha _0,\alpha _1,\alpha _2,\omega \) satisfy
Proof
Since \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\), we have \(T_0>\alpha _2+3>0\). Also using \(\alpha _1<\frac{\alpha _2^2}{\alpha _2+2}\) and the fact that \(\alpha _2<2\), we get
which gives (5.28). It follows from (5.44) that \(\alpha _1<\alpha _2\) and so
The last inequality proves (5.29). Thus, Assumption 5.2 holds. Note that
which gives
and then \(S_1>0.\) \(\square \)
Remark 5.12
We consider the following optimization problem
where C is a nonempty and closed subset of \(\mathcal {H}\), \(f: \mathcal {H}\rightarrow \mathbb {R} \) is a \(\gamma \)-strongly pseudo-convex on C and differentiable function with L-Lipschitz continuous gradient for some \(L>0\). Recall that the differentiable function f is called \(\gamma \)-strongly pseudo convex if there exists \(\gamma >0\) such that
for all \(x,y\in C\). For more details on generalized convexity functions and their characterization, the readers are referred to [29]. The optimization problem (5.45) is equivalent to the following strongly pseudo-monotone variational inequality
As a consequence, all the results presented in this section can be applied directly to the pseudo-convex optimization problem (5.45).
References
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Antipin, A.S.: Continuous and iterative processes with projection operators and projection like operators. AN SSSR, Scientific Counsel on the Complex Problem Cybernetics, Moscow, pp. 5–43 (1989)
Antipin, A.S.: Minimization of convex functions on convex sets by means of differential equations. (Russian) Differentsial’nye Uravneniya 30(9), 1475–1486 (1994); translation in Differential Equations 30, 1365–1375 (1994)
Attouch, H., Alvarez, F.: The heavy ball with friction dynamical system for convex constrained minimization problems. In: Optimization (Namur, 1998), Lecture Notes in Economics and Mathematical Systems 481. Springer, Berlin, pp. 25–35 (2000)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184, 243–287 (2020)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward–backward algorithm for structured monotone inclusions. Appl. Math. Optim. 80, 547–598 (2019)
Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via a third-order in time evolution equation. Optimization 71, 1275–1304 (2022)
Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via a third-order in time evolution equation: TOGES-V an improved version of TOGES. Optimization (2022). https://doi.org/10.1080/02331934.2022.2119084
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method. I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2, 1–34 (2000)
Attouch, H., Peypouquet, J., Redont, P.: Backward–forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457, 1095–1117 (2018)
Avriel, M., Diewert, W.E., Schaible, S., Zang, I.: Generalized Concavity. Society for Industrial and Applied Mathematics (2010)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, Berlin (2011)
Bolte, J.: Continuous gradient projection method in Hilbert spaces. J. Optim. Theory Appl. 119, 235–259 (2003)
Boţ, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Boţ, R.I., Csetnek, E.R.: Convergence rates for forward–backward dynamical systems associated with strongly monotone inclusions. J. Math. Anal. Appl. 457, 1135–1152 (2018)
Boţ, R.I., Csetnek, E.R.: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dyn. Differ. Equat. 29, 155–168 (2017)
Boţ, R.I., Csetnek, E.R., Vuong, P.T.: The forward–backward–forward method from discrete and continuous perspective for pseudo-monotone variational inequalities in Hilbert spaces. Eur. J. Oper. Res. 287, 49–60 (2020)
Boţ, R.I., Sedlmayer, M., Vuong, P.T.: A relaxed inertial forward–backward–forward algorithm for solving monotone inclusions with application to GANs. J. Mach. Learn. Res. 24, 1–37 (2023)
Cavazzuti, E., Pappalardo, P., Passacantando, M.: Nash equilibria, variational inequalities, and dynamical systems. J. Optim. Theory Appl. 114, 491–506 (2002)
Dao, M., Phan, H.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators. SIAM J. Optim. 29, 2697–2724 (2019)
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York (2003)
Goebel, K., Reich, S.: Uniform Convexity, Hyperbolic Geometry, and Nonexpansive Mappings. Marcel Dekker, New York (1984)
Guo, K., Han, D., Yuan, X.: Convergence analysis of Douglas–Rachford splitting method for strongly+ weakly convex programming. SIAM J. Numer. Anal. 55, 1549–1577 (2017)
Ha, N.T.T., Strodiot, J.J., Vuong, P.T.: On the global exponential stability of a projected dynamical system for strongly pseudomonotone variational inequalities. Opt. Lett. 12, 1625–1638 (2018)
Haraux, A.: Systemes Dynamiques Dissipatifs et Applications, Recherches en Mathematiques Appliquees 17. Masson, Paris (1991)
Karamardian, S., Schaible, S.: Seven kinds of monotone maps. J. Optim. Theory Appl. 66, 37–46 (1990)
Kim, D.S., Vuong, P.T., Khanh, P.D.: Qualitative properties of strongly pseudomonotone variational inequalities. Opt. Lett. 10, 1669–1679 (2016)
Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Inequalities and Their Applications. Academic, New York (1980)
Lions, P., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Nagurney, A., Zhang, D.: Projected Dynamical Systems and Variational Inequalities with Applications. Kluwer Academic, Norwell (1996)
O’Connor, D., Vandenberghe, L.: On the equivalence of the primal–dual hybrid gradient method and Douglas–Rachford splitting. Math. Program. 179, 85–108 (2020)
Pappalardo, M., Passacantando, M.: Stability for equilibrium problems: from variational inequalities to dynamical systems. J. Optim. Theory Appl. 113, 567–582 (2002)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In: Advances in Neural Information Processing Systems (NIPS) 27 (2014)
Tam, N.N., Yao, J.C., Yen, N.D.: Solution methods for pseudomonotone variational inequalities. J. Optim. Theory Appl. 138, 253–273 (2008)
Vinh, L.V., Tran, V.N., Vuong, P.T.: A second-order dynamical system for equilibrium problems. Numer. Algorithms 91, 327–351 (2022)
Vuong, P.T.: The global exponential stability of a dynamical system for solving variational inequalities. Netw. Spat. Econ. 22, 395–407 (2022)
Vuong, P.T.: A second order dynamical system and its discretization for strongly pseudo-monotone variational inequalities. SIAM J. Control Optim. 59, 2875–2897 (2021)
Vuong, P.T., Strodiot, J.J.: A dynamical system for strongly pseudo-monotone equilibrium problems. J. Optim. Theory Appl. 185, 767–784 (2020)
Acknowledgements
The authors are grateful to two anonymous reviewers for their constructive comments, which help improving significantly the presentation of the paper. Additionally, the authors would like to extend their appreciation to the Vietnam Institute for Advanced Study in Mathematics (VIASM) for organizing the International Conference "New Trends in Numerical Optimization and Applications" in December, 2021. It was during this event that the authors first met and initiated fruitful discussions that ultimately led to this research project. P. T. Vuong thanks the London Mathematical Society (LMS) for supporting his visit to P.V. Hai at the Hanoi University of Science and Technology in February 2024, when the authors finished the final version of this paper.
Funding
P.V. Hai is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 101.02-2021.24.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boris S. Mordukhovich.
Dedicate to Professor Pham Ky Anh (Vietnam National University) on the occasion of his 75th birthday.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hai, P.V., Vuong, P.T. Third Order Dynamical Systems for the Sum of Two Generalized Monotone Operators. J Optim Theory Appl 202, 519–553 (2024). https://doi.org/10.1007/s10957-024-02437-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-024-02437-y
Keywords
- Monotone inclusion
- Dynamical system
- Generalized monotonicity
- Variational inequality
- Exponential convergence
- Linear convergence