1 Introduction

Let X be a reflexive, strictly convex and smooth Banach space with the dual space \(X^{*}\), \(A:X\rightrightarrows X^{*}\) a general maximal monotone operator, and C a closed convex set in X. We denote by \(N_{C}\) the normal cone to C. In this work, we study the following variational inequality problem: find x in a Banach space X such that

$$ 0\in A(x)+N_{C}(x). $$
(1)

Problem (1) is a very important format for certain concrete problems in machine learning, linear inverse and many nonlinear problems such as convex programming, split feasibility problem, see [3, 4, 15, 17]. The set of solutions of (1) is supposed to be nonempty and is denoted by \(\mathcal{S}\). In [3], the authors provided a generic framework, the so called backward–backward splitting method, for solving (1) in a Hilbert space:

$$ x_{n+1}=(I+\lambda _{n}\beta _{n} \partial \varPsi )^{-1}(I+\lambda _{n}A)^{-1}(x _{n}), $$
(2)

where Ψ is a penalization function and \(\lambda _{n}\), \(\beta _{n}\) are sequences of positive parameters. In [3], convergence results have been obtained for the backward–backward splitting method (2) under the key Fenchel conjugate assumption that

$$ \sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\biggl[\varPsi ^{*}\biggl(\frac{p^{*}}{\beta _{n}} \biggr)- \sigma _{C}\biggl(\frac{p^{*}}{\beta _{n}}\biggr)\biggr]< +\infty,\quad \forall p ^{*}\in R( N_{C}), $$

where \(\varPsi ^{*}\) is the Fenchel conjugate of Ψ, \(\sigma _{C}\) is the support function of C and \(R(N_{C})\) denotes the range of \(N_{C}\). This condition somehow relates the growth of the sequence \(\{\beta _{n}\}\) to the shape of Ψ around its minimizing set C. The reader is referred to [14] for further discussion.

When the penalization function Ψ is Gâteaux differentiable, it is rather natural to consider the following forward–backward splitting method (FBS):

$$ x_{n+1} =(I+\lambda _{n}A)^{-1} \bigl(x_{n}-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\bigr). $$
(3)

The forward–backward method has the advantage of being easier to compute than the backward–backward method, which ensures enhanced applicability to real-life problems. Iterations have lower computational cost and can be computed exactly. A special case of this method is the projected subgradient algorithm aimed at solving constrained minimization problems. There have been many works concerning the problem of finding zero points of the sum of two monotone operators. For further information on forward–backward splitting methods and their applications, see [4, 6, 9, 11, 12].

Let \(X=H\) be a Hilbert space. If \(A=\partial \varPhi \) is the subdifferential of a proper, lower-semicontinuous and convex function \(\varPhi: H\rightarrow (-\infty,+\infty ] \), the variational inequality problem (1) becomes the following minimization problem:

$$ x\in \operatorname{Argmin}\bigl\{ \varPhi (z):z\in \operatorname{Argmin}\varPsi \bigr\} . $$

It is convenient to reformulate method (3) as

$$ \frac{x_{n}-x_{n+1}}{\lambda _{n}}-\beta _{n}\nabla \varPsi (x_{n})\in \partial \varPhi (x_{n+1}). $$

This is equivalent to

$$ x_{n+1} = \mathop{\operatorname{argmin}}_{y\in X}\biggl\{ \frac{1}{2} \Vert x_{n}-y \Vert ^{2}+\lambda _{n}\beta _{n}\bigl\langle \nabla \varPsi (x_{n}) ,y\bigr\rangle + \lambda _{n}\varPhi (y) \biggr\} . $$

In [4], the authors prove that every sequence generated by the forward–backward splitting method converges weakly to a solution of the minimization problem if either the penalization function or the objective function is inf-compact. However, this inf-compactness assumption is not necessary. In [13], the authors prove that every sequence generated by the forward–backward splitting method converges weakly to a solution without the inf-compactness assumption.

A generalization of this method from Hilbert to Banach space is not immediate. The main difficulties are due to the fact that the inner product structure of a Hilbert space is missing in a Banach space. In [18], the authors prove that every sequence generated by a projection iterative method converges strongly to a common minimum norm solution of a variational inequality problem for an inverse strongly monotone mapping in Banach spaces.

In this paper, we extend the forward–backward splitting method (3) to a Banach space, that is,

$$ x_{n+1}=(cJ+\lambda _{n}A)^{-1} \bigl(cJ(x_{n})-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\bigr), $$
(4)

where J is the duality mapping and c is constant. If \(A=\partial \varPhi \) is the subdifferential of a proper, lower-semicontinuous and convex function \(\varPhi: X\rightarrow (-\infty,+\infty ]\), the forward–backward splitting method (4) becomes

$$ \frac{cJ(x_{n})-cJ(x_{n+1})}{\lambda _{n}}-\beta _{n}\nabla \varPsi (x_{n}) \in \partial \varPhi (x_{n+1}). $$
(5)

Iterative formula (5) can be rewritten as

$$ x_{n+1} = \mathop{\operatorname{argmin}}_{y\in X}\biggl\{ \frac{c}{2}W(x_{n},y)+\lambda _{n} \beta _{n}\bigl\langle \nabla \varPsi (x_{n}) ,y\bigr\rangle + \lambda _{n}\varPhi (y)\biggr\} . $$

Throughout this paper, let \(A:X\rightrightarrows X^{*}\) be a maximal monotone operator and let the monotone operator \(T_{A,C} = A + N_{C}\) be also maximal monotone. Let \(\varPsi:X\rightarrow (-\infty,+\infty ]\) be a proper, lower-semicontinuous and convex function with \(C= \operatorname{argmin}(\varPsi )\neq \emptyset \) and \(\min (\varPsi )=0\). We assume that Ψ is Gâteaux differentiable and ∇Ψ is L-Lipschitz continuous on the domain of Ψ. We also assume that there exists \(c>0\) such that

$$ cW(x,y)\geq \Vert x-y \Vert ^{2}, \quad\forall x,y \in X. $$
(6)

The objective of the present paper is to propose a forward–backward splitting method to solve problem (1), which is so far limited to Hilbert spaces, in the general framework of Banach spaces. The paper is organized as follows. In Sect. 2, we provide some preliminary results. We present the forward–backward splitting method and prove its convergence in Sect. 3. Section 4 is devoted to an application of our result to convex minimization problem. Finally, in Sect. 5, we also prove a convergence result without Fenchel conjugate assumption.

2 Preliminaries

Let f be a proper, lower-semicontinuous and convex function on a Banach space X. The subdifferential of f at \(x\in X\) is the convex set

$$ \partial f(x)=\bigl\{ x^{*}\in X^{*}:\bigl\langle x^{*},y-x\bigr\rangle \leq f(y)-f(x), \forall y\in X\bigr\} . $$

It is easy to verify that \(0\in \partial f(\hat{x})\) if and only if \(f(\hat{x})=\min_{x\in X}f(x)\). We denote by \(f^{*}\) the Fenchel conjugate of f:

$$ f^{*}\bigl(x^{*}\bigr)=\sup_{x\in X}\bigl\{ \bigl\langle x^{*},x\bigr\rangle -f(x)\bigr\} . $$

Given a nonempty closed convex set \(C\subset X\), its indicator function is defined as \(\delta _{C}(x)=0\) if \(x\in C\), and +∞ otherwise. The support function of C at a point \(x^{*}\) is \(\sigma _{C}(x^{*})= \sup_{y\in C}\langle x^{*},y\rangle \). Then the normal cone of C at \(x\in X\) is \(N_{C}(x)=\partial \delta _{C}(x)=\{x^{*}\in X^{*}| \langle x^{*},y-x\rangle \leq 0, \forall y\in C\}\).

The duality mapping \(J \colon X \rightrightarrows X^{*}\) is defined by

$$ J(x)=\bigl\{ x^{*}\in X^{*}|\bigl\langle x^{*},x \bigr\rangle = \bigl\Vert x^{*} \bigr\Vert \Vert x \Vert , \bigl\Vert x^{*} \bigr\Vert = \Vert x \Vert \bigr\} ,\quad \forall x\in X. $$

The Hahn–Banach theorem guarantees that \(J(x)\neq \emptyset \) for every \(x\in X\). It is clear that \(J(x)=\partial (\frac{1}{2}\|\cdot \|^{2})(x)\) for all \(x\in X\). It is well known that if X is smooth, then J is single-valued and is norm-to-weak star continuous. It is also well known that if a Banach space X is reflexive, strictly convex and smooth, then the duality mapping \(J^{*}\) from \(X^{*} \) into X is the inverse of J, that is, \(J^{-1}=J^{*}\). Properties of the duality mapping have been given in [1, 2, 8, 17].

Let X be a smooth Banach space. Alber [1, 2] considered the following Lyapunov distance function:

$$ W(x,y)= \Vert x \Vert ^{2}-2\bigl\langle J(x),y\bigr\rangle + \Vert y \Vert ^{2},\quad \forall x,y \in X. $$

It is obvious from the definition of W that

$$ \bigl( \Vert x \Vert - \Vert y \Vert \bigr)^{2}\leq W(x,y)\leq \bigl( \Vert x \Vert + \Vert y \Vert \bigr)^{2}, \quad\forall x,y\in X. $$

We also know that

$$ W(x,y)=W(x,z)+W(z,y)+2\bigl\langle J(x)-J(z),z-y\bigr\rangle , \quad\forall x,y,z \in X. $$
(7)

A set-valued mapping \(A:X\rightrightarrows X^{*}\) is said to be a monotone operator if \(\langle x^{*}-y^{*},x-y\rangle \geq 0\), for all \(x^{*}\in A(x)\) and for all \(y^{*}\in A(y)\). It is maximal monotone if its graph is not properly contained in the graph of any other monotone operator. The subdifferential of a proper, lower-semicontinuous and convex function is maximal monotone. The inverse \(A^{-1}:X^{*}\rightrightarrows X\) of A is defined by \(x\in A^{-1}(x^{*})\Leftrightarrow x^{*} \in A(x)\). The operator \(J +\lambda A\) is surjective for any maximal monotone operator \(A:X\rightrightarrows X^{*}\) and for any \(\lambda > 0\) by Minty’s Theorem. The operator \((J +\lambda A)^{-1}\) is nonexpansive and everywhere defined. If X is a reflexive, strictly convex and smooth Banach space and A is a maximal monotone operator, then for each \(\lambda >0\) and \(x\in X\), there is a unique element satisfying \(J(x)\in J(\bar{x})+\lambda A(\bar{x})\) (see [16]). An operator \(A:X\rightrightarrows X^{*}\) is strongly monotone with parameter \(\alpha >0\) if

$$ \bigl\langle x^{*}-y^{*},x-y\bigr\rangle \geq \alpha \Vert x-y \Vert ^{2}, \quad\forall x^{*}\in A(x), y^{*}\in A(y). $$

Observe that the set of zeroes of a maximal monotone operator which is strongly monotone must contain exactly one element.

Let \(A:X\rightrightarrows X^{*}\) be a maximal monotone operator. Suppose the operator \(T_{A,C} = A + N_{C} \) is maximal monotone and \(\mathcal{S} = (T_{A,C})^{-1}(0)\neq \emptyset \). By maximal monotonicity of \(T_{A,C}\), we know that

$$ z\in \mathcal{S}\quad\Longleftrightarrow\quad \bigl\langle 0-\omega ^{*}, z-u \bigr\rangle \geq 0,\quad\forall \bigl(u,\omega ^{*}\bigr)\in T_{A,C}, $$

that is,

$$ z\in \mathcal{S}\quad\Longleftrightarrow\quad \bigl\langle 0-\omega ^{*}, z-u \bigr\rangle \geq 0, \quad\forall u\in \operatorname{dom}(T_{A,C})=C\cap \operatorname{dom} A, \forall \omega ^{*}\in T_{A,C}(u). $$

If \(A=\partial \varPhi \) is the subdifferential of a proper, lower-semicontinuous and convex function \(\varPhi: X\rightarrow (- \infty,+\infty ]\) and if \(u\in \mathcal{S}\), then there exists \(u^{*}\in N_{C}(u)\) such that \(-u^{*}\in \partial \varPhi (u)\). Hence, by \(u^{*}\in N_{C}(u)\Rightarrow \sigma _{C}(u^{*})=\langle u^{*},u\rangle \), one has

$$ \varPhi (x)\geq \varPhi (u)+\bigl\langle -u^{*},x-u\bigr\rangle = \varPhi (u)+\sigma _{C}\bigl(u ^{*}\bigr)-\bigl\langle u^{*},x\bigr\rangle \geq \varPhi (u), \quad\forall x\in C. $$

Thus, when \(A=\partial \varPhi \), the maximal monotonicity of \(T_{A,C}\) implies

$$ \mathcal{S} =\operatorname{Argmin} \bigl\{ \varPhi (x): x \in C\bigr\} . $$

The following result will be used subsequently.

Lemma 2.1

([4])

Let \(\{a_{n}\}, \{b_{n}\}\) and \(\{\epsilon _{n}\}\) be real sequences. Assume that \(\{a_{n}\}\) is bounded from below, \(\{b_{n}\}\) is nonnegative, \(\sum_{n=1}^{\infty }|\epsilon _{n}|< +\infty \) and \(a_{n+1}-a_{n}+b_{n}\leq \epsilon _{n}\). Then \(\{a_{n}\}\) converges and \(\sum_{n=1}^{\infty }b_{n}< +\infty\).

3 The FBS method for variational inequalities

In this section, we firstly extend Baillon–Haddad theorem (see [5]) to Banach space.

Lemma 3.1

Let \(\varPsi:X\rightarrow (-\infty, +\infty ]\) be a proper, lower-semicontinuous and convex function and let Ψ be Gâteaux differentiable on the domain of Ψ. The following are equivalent:

  1. (i)

    Ψ is Lipschitz continuous with constant L.

  2. (ii)

    \(\varPsi (y)-\varPsi (x)-\langle \nabla \varPsi (x), y-x \rangle \leq \frac{L}{2}\|y-x\|^{2}, \forall x,y\in \operatorname{dom} \varPsi\).

  3. (iii)

    \(\varPsi (x)+\langle \nabla \varPsi (x), y-x\rangle +\frac{L}{2}\|\nabla \varPsi (x)-\nabla \varPsi (y)\|^{2}\leq \varPsi (y), \forall x,y\in \operatorname{dom}\varPsi\).

  4. (iv)

    Ψ is \(\frac{1}{L}\) cocoercive, that is,

    $$ \bigl\langle \nabla \varPsi (x)-\nabla \varPsi (y), x-y\bigr\rangle \geq \frac{1}{L} \bigl\Vert \nabla \varPsi (x)-\nabla \varPsi (y) \bigr\Vert ^{2},\quad \forall x,y \in \operatorname{dom} \varPsi. $$

Proof

\(\mathrm{(i)}\Rightarrow {\mathrm{(ii)}}\). By the mean value theorem, we have

$$\begin{aligned} \varPsi (y)-\varPsi (x)-\bigl\langle \nabla \varPsi (x),y-x\bigr\rangle &= \int _{0}^{1} \bigl\langle \nabla \varPsi (x)-\nabla \varPsi \bigl(x+t(y-x)\bigr),x-y\bigr\rangle \,\mathrm{d}t \\ &\leq \int _{0}^{1} \bigl\Vert \nabla \varPsi \bigl(x+t(y-x)\bigr)-\nabla \varPsi (x) \bigr\Vert \Vert x-y \Vert \,\mathrm{d}t \\ &\leq \int _{0}^{1}L \Vert x-y \Vert ^{2}t\,\mathrm{d}t \\ &\leq \frac{L}{2} \Vert x-y \Vert ^{2}. \end{aligned}$$

\(\mathrm{(ii)}\Rightarrow {\mathrm{(iii)}}\). Let us fix \(x_{0}\in \operatorname{dom}\varPsi \). Consider the function

$$ F(y)=\varPsi (y)-\bigl\langle \nabla \varPsi (x_{0}),y\bigr\rangle . $$

Note that F is a proper, lower-semicontinuous, convex and Gâteaux differentiable function and ∇F is Lipschitz continuous on the \(\operatorname{dom}F=\operatorname{dom}\varPsi \) with constant L. Therefore, by \(\mathrm{(i)}\Rightarrow {\mathrm{(ii)}}\), we have

$$ F(u)-F(v)-\bigl\langle \nabla F(v), u-v\bigr\rangle \leq \frac{L}{2} \Vert u-v \Vert ^{2},\quad \forall u,v\in \operatorname{dom}F. $$
(8)

By the definition of F, we have \(x_{0}\in \operatorname{Argmin}_{x \in X}F(x)\). Then, by (8), we have

$$ F(x_{0})\leq F\biggl(y-\frac{1}{L}J^{-1}\nabla F(y) \biggr)\leq F(y)-\frac{L}{2} \biggl\Vert \biggl(y- \frac{1}{L}J^{-1} \nabla F(y)\biggr)-y \biggr\Vert ^{2}=F(y)-\frac{1}{2L} \bigl\Vert \nabla F(y) \bigr\Vert ^{2}. $$

Hence, by \(\nabla F(y)=\nabla \varPsi (y)-\nabla \varPsi (x_{0})\), we get (iii).

\(\mathrm{(iii)}\Rightarrow {\mathrm{(iv)}}\). For any \(x,y\in \operatorname{dom} \varPsi \), by \(\mathrm{(iii)}\), we have

$$ \varPsi (x)+\bigl\langle \nabla \varPsi (x), y-x\bigr\rangle +\frac{L}{2} \bigl\Vert \nabla \varPsi (x)-\nabla \varPsi (y) \bigr\Vert ^{2}\leq \varPsi (y) $$

and

$$ \varPsi (y)+\bigl\langle \nabla \varPsi (y), x-y\bigr\rangle +\frac{L}{2} \bigl\Vert \nabla \varPsi (x)-\nabla \varPsi (y) \bigr\Vert ^{2}\leq \varPsi (x). $$

Adding the two inequalities, we get

$$ \bigl\langle \nabla \varPsi (x)-\nabla \varPsi (y), x-y\bigr\rangle \geq \frac{1}{L} \bigl\Vert \nabla \varPsi (x)-\nabla \varPsi (y) \bigr\Vert ^{2}. $$

\(\mathrm{(iv)}\Rightarrow {\mathrm{(i)}}\). By Cauchy–Schwartz inequality, we get \(\|\nabla \varPsi (x)-\nabla \varPsi (y)\|\leq L \|x-y\|\). □

Iterative Method 3.1

Given \(x_{0}\in X\), set

$$ x_{n+1}=(cJ+\lambda _{n}A)^{-1} \bigl(cJ(x_{n})-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\bigr), $$
(9)

where \(\{\lambda _{n}\},\{\beta _{n}\}\) are two sequences of positive real numbers with \(\sum_{n=1}^{\infty }\lambda _{n}=+\infty, \sum_{n=1} ^{\infty }\lambda _{n}^{2}<+\infty \) and \(\lambda _{n}\beta _{n}< \frac{2c}{L}\).

For our results, the following Fenchel conjugate assumption will be used subsequently:

$$ \sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\biggl[\varPsi ^{*}\biggl( \frac{p^{*}}{\beta _{n}} \biggr)-\sigma _{C}\biggl(\frac{p^{*}}{\beta _{n}}\biggr) \biggr]< +\infty,\quad \forall p ^{*}\in R(N_{C}). $$
(10)

Remark 3.1

Since \(\varPsi (x)\leq \delta _{C}(x)\) for all \(x\in X\), we obtain, \(\varPsi ^{*}(x^{*} )-\sigma _{C}(x^{*})\geq 0\) for all \(x^{*}\in X^{*}\). Hence, the terms in the sum are nonnegative.

Remark 3.2

If \(\varPsi (x)=\frac{1}{2}\operatorname{dist}(x,C)^{2}\), then we have \(\varPsi ^{*}(x^{*})-\sigma _{C}(x^{*})=\frac{1}{2}\|x^{*}\|^{2}\) for all \(x^{*}\in X^{*}\) and so

$$ \sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\biggl[\varPsi ^{*}\biggl(\frac{p^{*}}{\beta _{n}} \biggr)- \sigma _{C}\biggl(\frac{p^{*}}{\beta _{n}}\biggr)\biggr]< +\infty,\quad \forall p^{*} \in R( N_{C})\quad\Longleftrightarrow\quad \sum _{n=1}^{+\infty }\frac{\lambda _{n}}{\beta _{n}}< +\infty. $$

It is easy to see that, if the sequence \(\{\beta _{n}\}\) is chosen so that \(\limsup_{n\rightarrow +\infty }\lambda _{n}\beta _{n}<+ \infty \) and \(\liminf_{n\rightarrow +\infty }\lambda _{n}\beta _{n}>0\), then

$$ \sum_{n=1}^{+\infty }\frac{\lambda _{n}}{\beta _{n}}< +\infty \quad\Longleftrightarrow \quad\sum_{n=1}^{+\infty }\lambda _{n}^{2}< +\infty. $$

Proposition 3.1

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9). Take \(u\in C\cap\operatorname{dom}A\) and \(v^{*}\in A(u)\). Then for all \(t\geq 0\), we have

$$\begin{aligned} &c W(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1})- \frac{1}{1+t} \Vert x_{n}-x _{n+1} \Vert ^{2}+\frac{2t}{1+t}\lambda _{n}\beta _{n} \varPsi (x_{n}) \\ &\quad\leq \lambda _{n}\beta _{n} \biggl((1+t)\lambda _{n}\beta _{n}- \frac{2}{L(1+t)} \biggr) \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+2\lambda _{n} \bigl\langle v^{*}, u-x_{n+1}\bigr\rangle . \end{aligned}$$
(11)

Proof

Since \(v^{*}\in A(u)\) and \(cJ(x_{n})-cJ(x_{n+1})- \lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\in \lambda _{n}A(x_{n+1})\), the monotonicity of A implies

$$ \bigl\langle cJ(x_{n})-cJ(x_{n+1})-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})- \lambda _{n}v^{*}, x_{n+1}-u\bigr\rangle \geq 0, $$
(12)

and so

$$ \bigl\langle cJ(x_{n})-cJ(x_{n+1}), u-x_{n+1}\bigr\rangle \leq \bigl\langle \lambda _{n}\beta _{n}\nabla \varPsi (x_{n})+\lambda _{n}v^{*}, u-x_{n+1}\bigr\rangle , $$

which in turn gives

$$ cW(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1}) \leq 2\lambda _{n}\bigl\langle \beta _{n}\nabla \varPsi (x_{n})+v^{*}, u-x_{n+1}\bigr\rangle . $$

Hence, we have that

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1}) \\ &\quad \leq 2\lambda _{n}\bigl\langle \beta _{n}\nabla \varPsi (x_{n}), u-x_{n} \bigr\rangle +2\lambda _{n} \bigl\langle \beta _{n}\nabla \varPsi (x_{n}), x_{n}-x _{n+1}\bigr\rangle +2\lambda _{n}\bigl\langle v^{*}, u-x_{n+1}\bigr\rangle . \end{aligned}$$
(13)

On the one hand, since ∇Ψ is Lipschitz continuous, by Lemma 3.1, we have

$$ \bigl\langle \nabla \varPsi (x_{n})-\nabla \varPsi (u), x_{n}-u\bigr\rangle \geq \frac{1}{L} \bigl\Vert \nabla \varPsi (x_{n})-\nabla \varPsi (u) \bigr\Vert ^{2}. $$

Hence, by \(\nabla \varPsi (u)=0\), we obtain

$$ \bigl\langle \nabla \varPsi (x_{n}), u-x_{n}\bigr\rangle \leq -\frac{1}{L} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}. $$
(14)

Since \(\varPsi (u)=0\), by \(\varPsi (u)\geq \varPsi (x_{n})+\langle \nabla \varPsi (x_{n}), u-x_{n}\rangle \), we have

$$ \bigl\langle \nabla \varPsi (x_{n}), u-x_{n}\bigr\rangle \leq -\varPsi (x_{n}). $$
(15)

For any \(t\geq 0\), by taking a convex combination of inequalities (14) and (15), we have

$$ 2\lambda _{n}\beta _{n}\bigl\langle \nabla \varPsi (x_{n}), u-x_{n}\bigr\rangle \leq - \frac{2}{L(1+t)}\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}- \frac{2t}{1+t}\lambda _{n}\beta _{n}\varPsi (x_{n}). $$
(16)

On the other hand, for the remaining term \(2\lambda _{n}\beta _{n} \langle \nabla \varPsi (x_{n}), x_{n}-x_{n+1}\rangle \), we have

$$\begin{aligned} &2\lambda _{n}\beta _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n}-x_{n+1}\bigr\rangle \\ &\quad= 2 \biggl\langle \lambda _{n}\beta _{n}\sqrt{1+t}\nabla \varPsi (x_{n}), \frac{1}{ \sqrt{1+t}}(x_{n}-x_{n+1})\biggr\rangle \\ &\quad\leq 2 \bigl\Vert \lambda _{n}\beta _{n}\sqrt{1+t} \nabla \varPsi (x_{n}) \bigr\Vert \biggl\Vert \frac{1}{ \sqrt{1+t}}(x_{n}-x_{n+1}) \biggr\Vert \\ &\quad\leq (1+t)\lambda _{n}^{2}\beta _{n}^{2} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+ \frac{1}{(1+t)} \Vert x_{n}-x_{n+1} \Vert ^{2}. \end{aligned}$$
(17)

Inequalities (13), (16) and (17) together give

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1})- \frac{1}{1+t} \Vert x_{n}-x _{n+1} \Vert ^{2}+\frac{2t}{1+t}\lambda _{n}\beta _{n} \varPsi (x_{n}) \\ &\quad \leq \lambda _{n}\beta _{n} \biggl((1+t)\lambda _{n}\beta _{n}- \frac{2}{L(1+t)} \biggr) \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+2\lambda _{n} \bigl\langle v^{*}, u-x_{n+1}\bigr\rangle . \end{aligned}$$

 □

Proposition 3.2

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9). Then, there exist \(a>0\) and \(b>0\), such that for any \(u\in C\cap \operatorname{dom} A\) and any \(v^{*}\in A(u)\), we have

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+a\bigl( \Vert x_{n}-x_{n+1} \Vert ^{2} +\lambda _{n} \beta _{n}\varPsi (x_{n})+\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}\bigr) \\ &\quad\leq b\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2}+2\lambda _{n}\bigl\langle v^{*}, u-x _{n}\bigr\rangle . \end{aligned}$$
(18)

Proof

Using the fact that \(\langle p^{*},p\rangle \leq \frac{s}{2}\|p^{*}\|^{2}+\frac{1}{2s}\|p\|^{2}\) for any \(p^{*}\in X ^{*}\), \(p\in X\) and any \(s>0\) yields

$$\begin{aligned} &2\lambda _{n}\bigl\langle v^{*}, u-x_{n+1}\bigr\rangle \\ &\quad=2\lambda _{n}\bigl\langle v^{*}, x_{n}-x_{n+1} \bigr\rangle +2\lambda _{n} \bigl\langle v^{*}, u-x_{n}\bigr\rangle \\ &\quad\leq \frac{t}{2(1+t)} \Vert x_{n}-x_{n+1} \Vert ^{2}+\frac{2(1+t)}{t}\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +2\lambda _{n}\bigl\langle v^{*}, u-x_{n}\bigr\rangle . \end{aligned}$$

Then, we get from (11) that

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1}) \\ &\qquad{}-\frac{1}{1+t} \Vert x_{n}-x_{n+1} \Vert ^{2}+\frac{2t}{1+t}\lambda _{n}\beta _{n} \varPsi (x_{n})+\frac{t}{1+t}\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x _{n}) \bigr\Vert ^{2} \\ &\quad\leq \lambda _{n}\beta _{n} \biggl((1+t)\lambda _{n}\beta _{n}- \frac{2}{L(1+t)}+\frac{t}{1+t} \biggr) \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2} \\ &\qquad{}+\frac{t}{2(1+t)} \Vert x_{n}-x_{n+1} \Vert ^{2}+\frac{2(1+t)}{t}\lambda _{n} ^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +2\lambda _{n}\bigl\langle v^{*}, u-x_{n}\bigr\rangle . \end{aligned}$$

Hence, by \(cW(x_{n},x_{n+1})\geq \|x_{n}-x_{n+1}\|^{2}\), we have

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+ \frac{t}{2+2t} \Vert x_{n}-x_{n+1} \Vert ^{2} + \frac{2t}{1+t}\lambda _{n}\beta _{n} \varPsi (x_{n})\\ &\qquad{}+\frac{t}{1+t}\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2} \\ &\quad\leq \lambda _{n}\beta _{n} \biggl((1+t)\lambda _{n}\beta _{n}- \frac{2}{L(1+t)}+\frac{t}{1+t} \biggr) \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2} + \frac{2(1+t)}{t}\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2}\\ &\qquad{} +2\lambda _{n}\bigl\langle v ^{*}, u-x_{n}\bigr\rangle . \end{aligned}$$

Since \(\lambda _{n}\beta _{n}<\frac{2c}{L}\), we have

$$ \lim_{t\rightarrow 0}\lambda _{n}\beta _{n} \biggl((1+t)\lambda _{n}\beta _{n}-\frac{2}{L(1+t)}+ \frac{t}{1+t} \biggr) =\lambda _{n}\beta _{n}\biggl( \lambda _{n}\beta _{n}-\frac{2}{L}\biggr)< 0. $$

Therefore, it suffices to take \(t_{0}>0\) small enough, then set

$$ a=\frac{t_{0}}{2(1+t_{0})},\qquad b=\frac{2(1+t_{0})}{t_{0}} $$

to obtain (18). □

Proposition 3.3

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9) and let \(u\in C\cap \operatorname{dom} A\). Take \(\omega ^{*} \in T_{A,C}(u),v^{*}\in A(u)\) and \(p^{*}\in N_{C}(u)\), such that \(v^{*} =\omega ^{*}-p^{*}\). The following inequality holds:

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u)+a\biggl( \Vert x_{n}-x_{n+1} \Vert ^{2} +\frac{\lambda _{n}\beta _{n}}{2} \varPsi (x_{n})+\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x _{n}) \bigr\Vert ^{2}\biggr) \\ &\quad \leq b\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +\frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*}\biggl(\frac{4p ^{*}}{a\beta _{n}} \biggr)-\sigma _{C}\biggl( \frac{4p^{*}}{a\beta _{n}}\biggr) \biggr). \end{aligned}$$
(19)

Proof

First observe that

$$\begin{aligned} &2\lambda _{n}\bigl\langle v^{*}, u-x_{n}\bigr\rangle -\frac{a\lambda _{n}\beta _{n}}{2}\varPsi (x_{n}) \\ &\quad =2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +2\lambda _{n}\bigl\langle p^{*}, x_{n}\bigr\rangle -\frac{a\lambda _{n}\beta _{n}}{2}\varPsi (x_{n})-2 \lambda _{n}\bigl\langle p^{*}, u\bigr\rangle \\ &\quad=2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +\frac{a\lambda _{n} \beta _{n}}{2} \biggl(\biggl\langle \frac{4}{a\beta _{n}} p^{*}, x_{n}\biggr\rangle -\varPsi (x_{n})-\biggl\langle \frac{4}{a\beta _{n}} p^{*}, u\biggr\rangle \biggr) \\ &\quad\leq 2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +\frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*} \biggl(\frac{4p^{*}}{a\beta _{n}} \biggr)-\biggl\langle \frac{4p^{*}}{a\beta _{n}}, u\biggr\rangle \biggr). \end{aligned}$$

Since \(\frac{4p^{*}}{a\beta _{n}}\in N_{C}(u)\), the support function satisfies

$$ \sigma _{C}\biggl(\frac{4p^{*}}{a\beta _{n}}\biggr)=\biggl\langle \frac{4p^{*}}{a\beta _{n}},u\biggr\rangle , $$

whence

$$\begin{aligned} & 2\lambda _{n}\bigl\langle v^{*}, u-x_{n}\bigr\rangle \\ &\quad\leq \frac{a\lambda _{n}\beta _{n}}{2}\varPsi (x_{n})+2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +\frac{a \lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*} \biggl(\frac{4p^{*}}{a\beta _{n}} \biggr)- \sigma _{C}\biggl(\frac{4p^{*}}{a\beta _{n}} \biggr) \biggr). \end{aligned}$$
(20)

Hence by (18) and (20), we obtain (19). □

Theorem 3.1

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9). Then, we have the following:

  1. (i)

    For each \(u\in \mathcal{S},\lim_{n\rightarrow +\infty }W(x_{n},u)\) exists.

  2. (ii)

    The series \(\sum_{n=1}^{+\infty }\|x_{n}-x_{n+1} \|^{2},\sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\varPsi (x_{n})\) and \(\sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\|\nabla \varPsi (x_{n})\|^{2}\) are convergent.

In particular, \(\lim_{n\rightarrow +\infty }\|x_{n}-x_{n+1}\|=0\). If, moreover, \(\liminf_{n\rightarrow +\infty }\lambda _{n}\beta _{n}>0\), then \(\lim_{n\rightarrow +\infty }\varPsi (x_{n})= \lim_{n\rightarrow +\infty }\|\nabla \varPsi (x_{n})\|=0\) and every weak cluster point of \(\{x_{n}\}\) lies in C.

Proof

Since \(u\in \mathcal{S}\) one can take \(\omega ^{*}=0\) in (19). By hypothesis the right-hand side is summable, and all the conclusions follow using Lemma 2.1. □

Theorem 3.2

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9), and let \(\{z_{k}\}\) be the sequence of weighted averages

$$ z_{k}=\frac{1}{\gamma _{k}}\sum_{n=1}^{k} \lambda _{n}x_{n}, \quad\textit{where } \gamma _{k}=\sum _{n=1}^{k}\lambda _{n}. $$

Then every weak cluster of \(\{z_{k}\}\) lies in \(\mathcal{S}\).

Proof

Let \(u\in C\cap \operatorname{dom}A\). Take \(\omega ^{*} \in T_{A,C}(u)\), \(v^{*}\in A(u)\) and \(p^{*}\in N_{C}(u)\), so that \(v^{*} =\omega ^{*}-p^{*}\). By Proposition 3.3, we have

$$\begin{aligned} &cW(x_{n+1},u)-cW(x_{n},u) \\ &\quad \leq b\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle + \frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*}\biggl(\frac{4p^{*}}{a\beta _{n}} \biggr)-\sigma _{C}\biggl(\frac{4p^{*}}{a\beta _{n}}\biggr) \biggr). \end{aligned}$$

Hence, we obtain

$$\begin{aligned} &{-}c\frac{W(x_{1},u)}{2\gamma _{k}} \\ &\quad\leq \frac{\sum_{n=1}^{k}b\lambda _{n}^{2} \Vert v^{*} \Vert ^{2} +\sum_{n=1} ^{k}\frac{a\lambda _{n}\beta _{n}}{2} (\varPsi ^{*}(\frac{4p^{*}}{a \beta _{n}} )-\sigma _{C}(\frac{4p^{*}}{a\beta _{n}}) )}{2\gamma _{k}} +\frac{ 2\sum_{n=1}^{k}\langle \omega ^{*}, \lambda _{n}u-\lambda _{n}x_{n}\rangle }{2\gamma _{k}} \\ &\quad =\frac{\sum_{n=1}^{k}b\lambda _{n}^{2} \Vert v^{*} \Vert ^{2} +\sum_{n=1}^{k}\frac{a \lambda _{n}\beta _{n}}{2} (\varPsi ^{*}(\frac{4p^{*}}{a\beta _{n}} )- \sigma _{C}(\frac{4p^{*}}{a\beta _{n}}) )}{2\gamma _{k}} +\biggl\langle \omega ^{*}, u-\frac{\sum_{n=1}^{k}\lambda _{n}x_{n}}{\gamma _{k}} \biggr\rangle . \end{aligned}$$
(21)

Then by (10), (21) and using that \(\gamma _{k}\rightarrow +\infty \) as \(k\rightarrow +\infty \), we obtain

$$ \liminf_{k\rightarrow +\infty }\bigl\langle \omega ^{*}, u-z_{k}\bigr\rangle \geq 0. $$

Finally, if z is any weak sequential cluster point of the sequence \(\{z_{k}\}\), then \(\langle \omega ^{*}, u-z\rangle \geq 0\). Since \(\omega ^{*}\in T_{A,C}(u)\) and \(T_{A,C}\) is maximal monotone, we obtain that \(z\in \mathcal{S}\). □

Theorem 3.3

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (9) and let A be a maximal monotone and strongly monotone operator. Then the sequence \(\{x_{n}\}\) converges strongly as \(n\rightarrow +\infty \) to a point in \(\mathcal{S}\).

Proof

Take \(u\in \mathcal{S}\subset C\cap \operatorname{dom} A\), \(v^{*}\in A(u)\), \(\omega ^{*}\in T_{A,C}(u)\) and \(p^{*}\in N_{C}(u)\), so that \(v^{*} =\omega ^{*}-p^{*}\). Since \(v^{*}\in A(u)\) and \(cJ(x_{n})-cJ(x _{n+1})-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\in \lambda _{n}A(x_{n+1})\), the strong monotonicity of A implies

$$ \bigl\langle cJ(x_{n})-cJ(x_{n+1})-\lambda _{n} \beta _{n}\nabla \varPsi (x_{n})- \lambda _{n}v^{*}, x_{n+1}-u\bigr\rangle \geq \lambda _{n}\alpha \Vert x_{n+1}-u \Vert ^{2}. $$

We follow the arguments in the proof of Proposition 3.3 to obtain successively

$$\begin{aligned} &2\lambda _{n}\alpha \Vert x_{n+1}-u \Vert ^{2}+cW(x_{n+1},u)-cW(x_{n},u) \\ &\qquad{}+a\biggl( \Vert x _{n}-x_{n+1} \Vert ^{2} + \frac{\lambda _{n}\beta _{n}}{2}\varPsi (x_{n})+\lambda _{n}\beta _{n} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}\biggr) \\ &\quad \leq b\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +2\lambda _{n}\bigl\langle \omega ^{*}, u-x_{n}\bigr\rangle +\frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*}\biggl(\frac{4p ^{*}}{a\beta _{n}} \biggr)-\sigma _{C}\biggl( \frac{4p^{*}}{a\beta _{n}}\biggr) \biggr). \end{aligned}$$
(22)

Since \(u\in S\), one can take \(\omega ^{*}=0\) in (22). By \(a(\|x_{n}-x_{n+1}\|^{2} +\frac{\lambda _{n}\beta _{n}}{2}\varPsi (x_{n})+ \lambda _{n}\beta _{n}\|\nabla \varPsi (x_{n})\|^{2})\geq 0\), we have

$$\begin{aligned} &2\lambda _{n}\alpha \Vert x_{n+1}-u \Vert ^{2}+cW(x_{n+1},u)-cW(x_{n},u)\\ &\quad \leq b \lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +\frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*}\biggl( \frac{4p^{*}}{a\beta _{n}} \biggr)-\sigma _{C}\biggl(\frac{4p^{*}}{a\beta _{n}}\biggr) \biggr). \end{aligned}$$

Summation gives

$$\begin{aligned} &2\alpha \sum_{n=1}^{+\infty }\lambda _{n} \Vert x_{n+1}-u \Vert ^{2} \\ &\quad \leq cW(x _{1},u)+b\sum_{n=1}^{+\infty }\lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} +\sum_{n=1} ^{+\infty } \frac{a\lambda _{n}\beta _{n}}{2} \biggl(\varPsi ^{*}\biggl(\frac{4p ^{*}}{a\beta _{n}} \biggr)- \sigma _{C}\biggl(\frac{4p^{*}}{a\beta _{n}}\biggr) \biggr). \end{aligned}$$

Since \(\sum_{n=1}^{+\infty }\lambda _{n}=+\infty \), there exists subsequence \(\{x_{n,k}\}\subset \{x_{n}\}\), such that \(\lim_{k\rightarrow +\infty }\|x_{n,k}-u\|=0\). Then, \(\lim_{k\rightarrow +\infty }W(x_{n,k},u)=0\). Since \(\lim_{n\rightarrow +\infty }W(x_{n},u)\) exists by Theorem 3.1(i), we must have \(\lim_{n\rightarrow +\infty }W(x _{n},u)=0\). Hence, by \(cW(x_{n},u)\geq \|x_{n}-u\|^{2}\), we have \(\lim_{n\rightarrow +\infty }\|x_{n}-u\|=0\). □

4 The FBS method for the minimization

In this section, we consider the forward–backward splitting method in the special case where \(A=\partial \varPhi \) is the subdifferential of a proper, lower-semicontinuous and convex function \(\varPhi:X\rightarrow (-\infty,+\infty ]\). The solution set \(\mathcal{S}\) is equal to

$$ (\partial \varPhi +N_{C})^{-1}(0)= \mathop{\operatorname{Argmin}}_{C} \varPhi. $$

Iterative Method 4.1

Given \(x_{0}\in X\), set

$$ x_{n+1}=(cJ+\lambda _{n}\partial \varPhi )^{-1}\bigl(cJx_{n}-\lambda _{n}\beta _{n}\nabla \varPsi (x_{n})\bigr), $$
(23)

where \(\{\lambda _{n}\},\{\beta _{n}\}\) are two sequences of positive real numbers with \(\sum_{n=1}^{\infty }\lambda _{n}=+\infty, \sum_{n=1} ^{\infty }\lambda _{n}^{2}<+\infty\), \(\beta _{n+1}-\beta _{n}\leq K,~K>0,~0< \bar{c}\leq \lambda _{n}\beta _{n}<\frac{2c}{L}\).

We also shall make the following Fenchel conjugate assumption:

$$ \sum_{n=1}^{+\infty }\lambda _{n}\beta _{n}\biggl[\varPsi ^{*}\biggl(\frac{p^{*}}{\beta _{n}} \biggr)- \sigma _{C}\biggl(\frac{p^{*}}{\beta _{n}}\biggr)\biggr]< +\infty, \quad\forall p ^{*}\in R(N_{C}). $$

The analysis relies on the study of the sequence \(\{H_{n}(x_{n})\}\), where \(H_{n}\) is the penalized function given by \(H_{n}=\varPhi +\beta _{n}\varPsi \) for \(n\geq 1\).

Proposition 4.1

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (23). Then the sequence \(\{H_{n}(x_{n})\}\) converges as \(n\rightarrow +\infty\).

Proof

Recall that \(\frac{cJ(x_{n})-cJ(x_{n+1})}{\lambda _{n}}-\beta _{n}\nabla \varPsi (x_{n})\in \partial \varPhi (x_{n+1})\). The subdifferential inequality for Φ gives

$$ \varPhi (x_{n})\geq \varPhi (x_{n+1})+\biggl\langle \frac{cJ(x_{n})-cJ(x_{n+1})}{ \lambda _{n}}-\beta _{n}\nabla \varPsi (x_{n}), x_{n}-x_{n+1}\biggr\rangle , $$

and so

$$ \varPhi (x_{n+1})-\varPhi (x_{n})+ \frac{1}{2\lambda _{n}} \bigl(cW(x_{n+1},x_{n})+cW(x _{n},x_{n+1})\bigr)\leq \beta _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n}-x_{n+1} \bigr\rangle . $$
(24)

Then, by Lemma 3.1(ii), we have

$$ \varPsi (x_{n+1})\leq \varPsi (x_{n})+\bigl\langle \nabla \varPsi (x_{n}), x_{n+1}-x _{n}\bigr\rangle +\frac{L}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}, $$
(25)

whence

$$\begin{aligned} &\beta _{n+1}\varPsi (x_{n+1})-\beta _{n}\varPsi (x_{n}) \\ &\quad \leq \beta _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n+1}-x_{n}\bigr\rangle +\frac{L \beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}+(\beta _{n+1}- \beta _{n})\varPsi (x_{n+1}). \end{aligned}$$
(26)

Adding (24) and (26), we obtain

$$\begin{aligned} & H_{n+1}(x_{n+1})-H_{n}(x_{n})+ \frac{1}{2\lambda _{n}} \bigl(cW(x_{n+1},x _{n})+cW(x_{n},x_{n+1}) \bigr)-\frac{L\beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2} \\ &\quad \leq (\beta _{n+1}-\beta _{n})\varPsi (x_{n+1}). \end{aligned}$$
(27)

Since \(\lambda _{n}\beta _{n}<\frac{2c}{L} \) and \(cW(x,y)\geq \|x-y\| ^{2}\), \(\forall x,y \in X\), we have

$$ \frac{1}{2\lambda _{n}} \bigl(cW(x_{n+1},x_{n})+cW(x_{n},x_{n+1}) \bigr)-\frac{L \beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}\geq 0. $$
(28)

Since \(\beta _{n+1}-\beta _{n}\leq K \), by (27) and (28),

$$ H_{n+1}(x_{n+1})-H_{n}(x_{n})\leq K \varPsi (x_{n+1}). $$

By Theorem 3.1(i), we deduce that \(\{x_{n}\}\) is bounded and \(\{\varPhi (x_{n})\}\) is therefore bounded from below. Hence, the sequence \(\{H_{n}(x_{n})\}\) is also bounded from below. The right-hand side is summable by Theorem 3.1(ii), whence Lemma 2.1 implies that \(\lim_{n\rightarrow +\infty }H_{n}(x_{n})\) exists. □

Proposition 4.2

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (23). For each \(u\in C\), we have \(\sum_{n=1}^{+\infty } \lambda _{n}(H_{n+1}(x_{n+1})-\varPhi (u))<+\infty\).

Proof

First observe that

$$\begin{aligned} &H_{n+1}(x_{n+1})-\varPhi (u) \\ &\quad=\varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})- \varPhi (u)+(\beta _{n+1}-\beta _{n})\varPsi (x_{n+1})+\beta _{n}\bigl(\varPsi (x_{n+1})-\varPsi (x_{n})\bigr) \\ &\quad \leq \varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})-\varPhi (u)+K\varPsi (x_{n+1})+ \beta _{n} \bigl(\varPsi (x_{n+1})-\varPsi (x_{n})\bigr). \end{aligned}$$
(29)

Using (25), we obtain

$$\begin{aligned} \beta _{n}\bigl(\varPsi (x_{n+1})- \varPsi (x_{n})\bigr) &\leq \beta _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n+1}-x_{n}\bigr\rangle + \frac{L\beta _{n}}{2} \Vert x_{n+1}-x _{n} \Vert ^{2} \\ &\leq \beta _{n} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert \Vert x_{n+1}-x_{n} \Vert +\frac{L\beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2} \\ &\leq \frac{\beta _{n}}{2} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+\frac{(L+1)\beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}. \end{aligned}$$
(30)

Inequalities (29) and (30) give

$$\begin{aligned} &\lambda _{n}\bigl(H_{n+1}(x_{n+1})- \varPhi (u)\bigr) \\ &\quad\leq \lambda _{n}\bigl(\varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})-\varPhi (u)\bigr) \\ &\qquad{}+\lambda _{n}K\varPsi (x_{n+1})+\frac{\lambda _{n}\beta _{n}}{2} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+ \frac{(L+1)\lambda _{n}\beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}. \end{aligned}$$

Since the sequence \(\{\lambda _{n}\}\) is bounded and \(0<\bar{c}\leq \lambda _{n}\beta _{n}<\frac{2}{L} \), Theorem 3.1 implies

$$ \sum_{n}^{+\infty }\biggl(\lambda _{n}K\varPsi (x_{n+1})+\frac{\lambda _{n}\beta _{n}}{2} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2} + \frac{(L+1)\lambda _{n}\beta _{n}}{2} \Vert x_{n+1}-x_{n} \Vert ^{2}\biggr)< +\infty. $$

On the other hand, the subdifferential inequality for Φ at points u and \(x_{n+1}\) gives

$$ \varPhi (u)\geq \varPhi (x_{n+1})+\biggl\langle \frac{cJx_{n}-cJx_{n+1}}{\lambda _{n}}-\beta _{n}\nabla \varPsi (x_{n}), u-x_{n+1}\biggr\rangle . $$
(31)

Since \(\varPsi (u)=0\), the subdifferential inequality for Ψ at points u and \(x_{n}\) gives

$$ 0\geq \varPsi (x_{n})+\bigl\langle \nabla \varPsi (x_{n}), u-x_{n}\bigr\rangle = \varPsi (x_{n})+\bigl\langle \nabla \varPsi (x_{n}), u-x_{n+1}\bigr\rangle +\bigl\langle \nabla \varPsi (x_{n}), x_{n+1}-x_{n}\bigr\rangle . $$
(32)

Combining (31) and (32), we obtain

$$ 2\lambda _{n}\bigl(\varPhi (x_{n+1})+\beta _{n} \varPsi (x_{n})-\varPhi (u)\bigr)\leq 2 \langle cJx_{n}-cJx_{n+1}, x_{n+1}-u\rangle +2\lambda _{n}\beta _{n} \bigl\langle \nabla \varPsi (x_{n}), x_{n}-x_{n+1}\bigr\rangle . $$

However,

$$ 2\langle cJx_{n}-cJx_{n+1}, x_{n+1}-u\rangle =cW(x_{n},u)-cW(x_{n+1},u)-cW(x _{n},x_{n+1}) $$

and

$$ 2\lambda _{n}\beta _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n}-x_{n+1}\bigr\rangle \leq \frac{4}{L^{2}} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+ \Vert x_{n}-x_{n+1} \Vert ^{2}. $$

Hence,

$$\begin{aligned} &2\lambda _{n}\bigl(\varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})-\varPhi (u)\bigr) \\ &\quad\leq cW(x_{n},u)-cW(x_{n+1},u)-cW(x_{n},x_{n+1})+ \frac{4}{L^{2}} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+ \Vert x_{n}-x_{n+1} \Vert ^{2} \\ &\quad\leq cW(x_{n},u)-cW(x_{n+1},u)+\frac{4}{L^{2}} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}. \end{aligned}$$

We conclude that

$$ \sum_{n=1}^{m}2\lambda _{n} \bigl(\varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})- \varPhi (u)\bigr) \leq cW(x_{1},u)-cW(x_{m+1},u)+ \frac{4}{L^{2}}\sum_{n=1} ^{m} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2} $$

for \(m\geq 1\). In view of Theorem 3.1, this show

$$ \sum_{n=1}^{+\infty }\lambda _{n}\bigl( \varPhi (x_{n+1})+\beta _{n}\varPsi (x_{n})- \varPhi (u)\bigr)< +\infty, $$

and completes the proof. □

The duality mapping J is said to be weakly continuous on a smooth Banach space if \(x_{n}\rightharpoonup x\) implies \(J(x_{n})\rightharpoonup J(x)\). This happens, for example, if X is a Hilbert space, or finite-dimensional and smooth, or \(l^{p},1< p<+\infty \). This property of Banach spaces was introduced by Browder [7]. More information can be found in [10].

Theorem 4.1

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (23). Then every weak cluster point of \(\{x_{n}\}\) lies in \(\mathcal{S}\). If the duality mapping J is weakly continuous, then \(\{x_{n}\}\) convergence weakly as \(n\rightarrow +\infty \) to a point in \(\mathcal{S}\).

Proof

Since \(\sum_{n=1}^{+\infty }\lambda _{n}=+\infty \), Propositions 4.1 and 4.2 imply \(\lim_{n\rightarrow +\infty }H_{n}(x_{n})\leq \varPhi (u)\) whenever \(u\in C\). Suppose that a subsequence \(\{x_{n,k}\}\) of \(\{x_{n}\}\) converges weakly to some as \(k\rightarrow +\infty \). Then \(\hat{x}\in C\) by Theorem 3.1. The weak lower-semicontinuity of Φ and \(\varPhi =H_{n}-\beta _{n}\varPsi \leq H_{n}\) then gives

$$ \varPhi (\hat{x})\leq \liminf_{k\rightarrow +\infty }\varPhi (x_{n,k}) \leq \liminf_{k\rightarrow +\infty }H_{n,k}(x_{n,k})= \lim _{n\rightarrow +\infty }H_{n}(x_{n})\leq \varPhi (u). $$

Therefore, minimizes Φ on C, and so \(\hat{x}\in \mathcal{S}\).

Clearly, the sequence \(\{x_{n}\}\) is bounded (see Theorem 3.1(i)). The space being reflexive, it suffices to prove that \(\{x_{n}\}\) has only one weak cluster point as \(n\rightarrow +\infty \). Suppose otherwise that \(x_{n,l}\rightharpoonup \bar{x}\) and \(x_{n,k}\rightharpoonup \hat{x}\). Since

$$ 2\bigl\langle J(x_{n}),\bar{x}-\hat{x}\bigr\rangle =W(x_{n},\hat{x})-W(x_{n}, \bar{x})- \Vert \hat{x} \Vert ^{2}+ \Vert \bar{x} \Vert ^{2}, $$

we deduce the existence of \(\lim_{n\rightarrow +\infty }2 \langle J(x_{n}),\bar{x}-\hat{x}\rangle \). Hence,

$$ \lim_{l\rightarrow +\infty }\bigl\langle J(x_{n,l}),\bar{x}-\hat{x} \bigr\rangle -\lim_{k\rightarrow +\infty }\bigl\langle J(x_{n,k}), \bar{x}-\hat{x}\bigr\rangle =0. $$

Since the duality mapping J is weakly continuous, we have

$$ \bigl\langle J(\bar{x})-J(\hat{x}),\bar{x}-\hat{x}\bigr\rangle =0. $$

Since X is strictly convex, we have that \(\bar{x}=\hat{x}\). □

If \(\varPhi:X\rightarrow (-\infty,+\infty ]\) is also a strongly convex function, that is, there exists \(\lambda >0\), for any \(0< t<1\), any \(x,y\in \operatorname{dom} \varPhi \) such that \(t\varPhi (x)+(1-t)\varPhi (y) \geq \varPhi (tx+(1-t)y)+\lambda t(1-t)\|x-y\|^{2}\), then ∂Ψ is strong monotone. Hence, the following theorem follows immediately from Theorem 3.3.

Theorem 4.2

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (23) and let Φ be a proper, lower semicontinuous and strongly convex function. Then the sequence \(\{x_{n}\}\) converges strongly as \(n\rightarrow +\infty \) to a point in \(\mathcal{S}\).

5 Additional result

The purpose of this section is to prove a convergence result without Fenchel conjugate assumption.

Iterative Method 5.1

Given \(x_{0}\in X\), set

$$ x_{n+1}=(cJ+\lambda _{n}A)^{-1} \bigl(cJx_{n}-\lambda _{n}\nabla \varPsi (x_{n}) \bigr), $$
(33)

where \(\{\lambda _{n}\}\) is a sequence of positive real numbers with \(\sum_{n=1}^{\infty }\lambda _{n}=+\infty, \sum_{n=1}^{\infty } \lambda _{n}^{\frac{4}{3}}<+\infty\).

Keeping the notations of the preceding section, set \(z_{k}=\frac{1}{ \gamma _{k}}\sum_{n=1}^{k}\lambda _{n}x_{n}\), where \(\gamma _{k}=\sum_{n=1}^{k}\lambda _{n}\). The following gives the weak ergodic convergence of the sequence \(\{x_{n}\}\) given by (33).

Proposition 5.1

Let \(\{x_{n}\}\) be a sequence generated by iterative formula (33). Assume that the sequence \(\{\lambda _{n}^{\frac{1}{3}} \nabla \varPsi (x_{n})\}\) is bounded. Then every weak cluster of \(\{z_{k}\}\) lies in \(\mathcal{S}\).

Proof

Take \(u\in C\cap \operatorname{dom} A\), \(v^{*}\in A(u)\) such that \(v^{*}= v^{*}+0\in A(u)+ N_{C}(u)=T_{A,C}(u)\). Since \(v^{*}\in A(u)\) and \(cJ(x_{n})-cJ(x_{n+1})-\lambda _{n}\nabla \varPsi (x _{n})\in \lambda _{n}A(x_{n+1})\), the monotonicity of A implies

$$ \bigl\langle cJ(x_{n})-cJ(x_{n+1})-\lambda _{n} \nabla \varPsi (x_{n})-\lambda _{n}v^{*}, x_{n+1}-u\bigr\rangle \geq 0, $$

and so

$$ \bigl\langle cJ(x_{n})-cJ(x_{n+1}), u-x_{n+1}\bigr\rangle \leq \bigl\langle \lambda _{n} \nabla \varPsi (x_{n})+\lambda _{n}v^{*}, u-x_{n+1} \bigr\rangle . $$

Then, we get from (7) that

$$ cW(x_{n+1},u)-cW(x_{n},u)+cW(x_{n},x_{n+1}) \leq 2\lambda _{n}\bigl\langle \nabla \varPsi (x_{n})+v^{*}, u-x_{n+1}\bigr\rangle . $$

By developing the right-hand side, we deduce the following inequality:

$$\begin{aligned} &2\lambda _{n}\bigl\langle \nabla \varPsi (x_{n})+v^{*}, x_{n}-u\bigr\rangle +2 \lambda _{n}\bigl\langle \nabla \varPsi (x_{n})+v^{*}, x_{n+1}-x_{n}\bigr\rangle \\ &\quad \leq cW(x_{n},u)-cW(x_{n+1},u)-cW(x_{n},x_{n+1}). \end{aligned}$$
(34)

Now, combing the facts that

$$ 2\lambda _{n}\bigl\langle \nabla \varPsi (x_{n})+v^{*}, x_{n+1}-x_{n}\bigr\rangle \geq - \Vert x_{n+1}-x_{n} \Vert ^{2}-\lambda _{n}^{2} \bigl\Vert \nabla \varPsi (x_{n})+v ^{*} \bigr\Vert ^{2} $$

and

$$ \bigl\langle \nabla \varPsi (x_{n})+v^{*}, x_{n}-u\bigr\rangle =\bigl\langle \nabla \varPsi (x_{n}), x_{n}-u\bigr\rangle +\bigl\langle v^{*}, x_{n}-u \bigr\rangle , $$

we derive from (34) that

$$\begin{aligned} &2\lambda _{n}\bigl\langle \nabla \varPsi (x_{n}), x_{n}-u\bigr\rangle +2\lambda _{n}\bigl\langle v^{*}, x_{n}-u\bigr\rangle -\lambda _{n}^{2} \bigl\Vert \nabla \varPsi (x _{n})+v^{*} \bigr\Vert ^{2} \\ &\quad\leq cW(x_{n},u)-cW(x_{n+1},u)-cW(x_{n},x_{n+1})+ \Vert x_{n+1}-x_{n} \Vert ^{2}. \end{aligned}$$
(35)

Since \(u\in C\cap \operatorname{dom} A\) and \(C=\operatorname{argmin}(\varPsi )\), we have \(\nabla \varPsi (u)=0\). Hence,

$$ \bigl\langle \nabla \varPsi (x_{n}), x_{n}-u\bigr\rangle =\bigl\langle \nabla \varPsi (x _{n})- \nabla \varPsi (u), x_{n}-u\bigr\rangle \geq 0. $$
(36)

Since \(cW(x_{n},x_{n+1})\geq \|x_{n+1}-x_{n}\|^{2}\), then by (35) and (36) we have

$$ 2\lambda _{n}\bigl\langle v^{*}, x_{n}-u\bigr\rangle -\lambda _{n}^{2} \bigl\Vert \nabla \varPsi (x_{n})+v^{*} \bigr\Vert ^{2} \leq cW(x_{n},u)-cW(x_{n+1},u). $$

Summing up these inequalities over n from 1 to k, and dividing by \(\gamma _{k}\) gives

$$ 2\bigl\langle v^{*}, z_{k}-u\bigr\rangle \leq \frac{cW(x_{1},u)}{\gamma _{k}}+\frac{ \sum_{n=1}^{k}\lambda _{n}^{2} \Vert \nabla \varPsi (x_{n})+v^{*} \Vert ^{2}}{\gamma _{k}}. $$
(37)

Since \(\{\lambda _{n}^{\frac{1}{3}}\nabla \varPsi (x_{n})\}\) is bounded, due to \(\sum_{n=1}^{\infty }\lambda _{n}^{\frac{4}{3}}<+\infty \) and \(\{\lambda _{n}^{2}\|\nabla \varPsi (x_{n})\|^{2}\}= \{\lambda _{n}^{ \frac{4}{3}}\lambda _{n}^{\frac{2}{3}}\|\nabla \varPsi (x_{n})\|^{2}\}\), we have

$$ \sum_{n=1}^{+\infty }\lambda _{n}^{2} \bigl\Vert \nabla \varPsi (x_{n})+v^{*} \bigr\Vert ^{2} \leq 2 \Biggl(\sum_{n=1}^{+\infty } \lambda _{n}^{2} \bigl\Vert \nabla \varPsi (x_{n}) \bigr\Vert ^{2}+\sum_{n=1}^{+\infty } \lambda _{n}^{2} \bigl\Vert v^{*} \bigr\Vert ^{2} \Biggr)< + \infty. $$

Finally, since \(\gamma _{k}\rightarrow +\infty \) as \(k\rightarrow + \infty \), we conclude that

$$ \lim_{k\rightarrow +\infty }\frac{\sum_{n=1}^{k}\lambda _{n}^{2} \Vert \nabla \varPsi (x_{n})+v^{*} \Vert ^{2}}{\gamma _{k}}=0. $$

Consequently, if z is any weak sequential cluster point of the sequence \(\{z_{k}\}\), letting \(k\rightarrow +\infty \) on both side of (37) yields

$$ \bigl\langle v^{*}, z-u\bigr\rangle \leq 0. $$

Then by maximal monotonicity of \(A+N_{C}\), we conclude that \(0\in (A+N_{C})(z)\), that is, \(z\in \mathcal{S}\). □

6 Concluding remarks

In this paper, we considered a class of forward–backward splitting methods based on Lyapunov distance for variational inequalities and convex minimization problem in a reflexive, strictly convex and smooth Banach space. Weak and strong convergence results have been obtained for the forward–backward splitting method under the key Fenchel conjugate assumption. Finally, we have also obtained a weak convergence result without Fenchel conjugate assumption.