1 Introduction

Let H be a real Hilbert space with the inner product \(\langle \cdot ,\cdot \rangle \) and the induced norm \(\Vert \cdot \Vert \), and let C be a nonempty closed convex subset of H. Let \(\Gamma _{0}(H)\) be a space of functions in H that are proper, convex, and lower semicontinuous. We will deal with the unconstrained convex optimization problem of the following type:

$$ \min_{x\in H}f(x)+g(x), $$

where \(f,g\in \Gamma _{0}(H)\). It is often the case where f is differentiable and g is subdifferentiable.

In 1978, problem (1.1) was first studied in [13] and provided a natural tool to study various generic optimization models under a common framework. In recent years, many researchers have already proposed some algorithms to solve problem (1.1) and have discussed a lot of weak and strong convergence results, such as [1, 6, 12, 23, 25], just to name a few. As we know, lots of important optimization problems can be cast in this form. See, for instance, [23], where the author introduced the properties and iterative methods for the lasso as a special case of (1.1); due to the involvement of the \(l_{1}\) norm, which promotes sparsity, we can get a good result on solving the corresponding problem.

The following proposition is very useful for constructing the iterative algorithms.

Proposition 1.1

(see [23])

Let \(f,g\in \Gamma _{0}(H)\). Let \(x^{*}\in H\) and \(\lambda >0\). Assume that f is finite-valued and differential on H. Then \(x^{*}\) is a solution to (1.1) if and only if \(x^{*}\) solves the fixed point equation

$$ x^{*}=\bigl(\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)\bigr)x^{*}. $$

On the other hand, we know that the errors often are produced in the process of calculation. It is an important property of algorithms which guarantees the convergence of the iterate under summable errors. Many authors have studied algorithms with perturbations and their convergence. Some related results are found in [35]. In 2011, Boikanyo and Morosanu introduced [2] a proximal point algorithm with error sequence. Under the summability condition on errors and some additional conditions on the parameters, they obtained strong convergence theorem.

In 2016, Jin, Censor, and Jiang [11] presented the projected scaled gradient (PSG) method with bounded perturbations in a finite dimensional setting for solving the following minimization problem:

$$ \min_{x\in C}f(x), $$

where f is a continuously differentiable, convex function. More precisely, the method generates a sequence according to

$$ x_{n+1}=P_{C}\bigl(x_{n}-\lambda _{n}D(x_{n})\nabla f(x_{n})+e(x_{n}) \bigr),\quad n\geq 0,$$

and converges to a solution of problem (1.3) under suitable conditions, where \(D(x_{n})\) is a diagonal scaling matrix.

In 2017, Xu extended the method to infinite dimensional space and projected the superiorization techniques for the relaxed PSG [24]. The following iterative step was introduced:

$$ x_{n+1}=(1-\tau _{n})x_{n}+\tau _{n}P_{C}\bigl(x_{n}-\gamma _{n}D(x_{n}) \nabla f(x_{n})+e(x_{n})\bigr),\quad n\geq 0,$$

where \(\tau _{n}\in [0,1]\). The weak convergence theorem was obtained in [24].

Quite recently, Guo and Cui [8] considered the modified proximal gradient method:

$$ x_{n+1}=\alpha _{n} h(x_{n})+(1- \alpha _{n})\operatorname{prox}_{\lambda _{n} g}(I- \lambda _{n} \nabla f) (x_{n})+e(x_{n}),\quad n\geq 0,$$

where h is a contractive mapping. The algorithm converges strongly to a solution of problem (1.1).

To accelerate the convergence of iteration methods, Polyak [19] introduced the following algorithm that can speed up gradient descent:

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=y_{n}-\lambda _{n}\nabla F(x_{n}). \end{cases} $$

This modification was made immensely popular by Nesterov’s accelerated gradient algorithm [18]. Generally, an inertial iteration for operator P writes

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\mathbf{P}(y_{n}). \end{cases} $$

In 2009, Beck and Teboulle [1] proposed a fast iterative shrinkage-thresholding algorithm for linear inverse problems. By applying the inertial technique, \(\{x_{n}\}\) is not employed on the previous point \(\{x_{n-1}\}\), but rather at the point \(\{y_{n}\}\) which uses a very specific linear combination of the previous two points \(\{x_{n-1},x_{n-2}\}\). Therefore, the convergence speed of the algorithm is greatly accelerated.

In 2015, for solving the maximal monotone inclusion problem, Mu and Peng [17] introduced alternated inertial proximal point iterates as follows:

$$ x_{n+1}=J_{\lambda T}(y_{n}), $$

where \(y_{n}\) is defined as

$$ y_{n}= \textstyle\begin{cases} x_{n}+\delta _{n}(x_{n}-x_{n-1}), &n=\mathit{odd}, \\ x_{n}, &n=\mathit{even}. \end{cases} $$

In equation (1.9), T is a set-valued maximal monotone operator and \(\lambda >0\). This form is a lot less popular than general inertia. However, it has pretty good convergence properties and performance.

In 2017, Iutzeler and Hendrickx [10] proposed a generic acceleration for optimization algorithm via relaxation and inertia, they also used alternated inertial acceleration in their algorithm. They obtained the convergence of the iterative sequence under some suitable assumptions.

Very recently, Shehu and Gibali [21] studied a new alternated inertial procedure for solving split feasibilities. Under some mild assumptions, they showed that the sequence converges strongly.

In this paper, mainly inspired and motivated by the above works, we introduce several iterative algorithms. Firstly, we combine the contractive mapping and proximal operator to propose an inertial acceleration proximal gradient method with errors for solving problem (1.1). Under more general and flexible conditions, we prove that the sequence converges strongly. Further, we extend the algorithm to a more generalized viscosity inertial acceleration method. Secondly, we propose a kind of alternating inertial proximal point algorithm with errors to solve problem (1.1), then we prove that the sequence converges weakly under appropriate conditions. Finally, we present several numerical examples to illustrate the effectiveness of our iterative schemes.

2 Preliminaries

We start by recalling some lemmas, definitions, and propositions needed in the proof of the main results.

Recall that given a closed subset C of a real Hilbert space H, for any \(x\in H\), there exists a unique nearest point in C denoted by \(P_{C}x\) such that

$$ \Vert x-P_{C}x \Vert \leq \Vert x-y \Vert , \quad \forall y \in C. $$

Such a \(P_{C}x\) is called the metric projection of H onto C.

Lemma 2.1

(see [14])

Let C be a nonempty closed convex subset of a real Hilbert space H. Given \(x\in H\) and \(z\in C\), then \(y=P_{C}x\) if and only if we have the relation

$$ \langle x-y,y-z\rangle \geq 0, \quad \forall z\in C. $$

Lemma 2.2

Let H be a real Hilbert space, the following statements hold:

  1. (i)

    \(\Vert x+y\Vert ^{2}=\Vert x\Vert ^{2}+2\langle x,y\rangle +\Vert y\Vert ^{2}\), \(\forall x,y \in H \).

  2. (ii)

    \(\Vert x+y\Vert ^{2}\leq \Vert x\Vert ^{2}+2\langle x+y,y\rangle \), \(\forall x,y \in H \).

  3. (iii)

    \(\Vert \alpha x+(1-\alpha )y\Vert ^{2}=\alpha \Vert x\Vert ^{2}+(1-\alpha )\Vert y\Vert ^{2}- \alpha (1-\alpha )\Vert x-y\Vert ^{2}\) for all \(\alpha \in \mathbb{R}\) and \(x,y\in H \).

Definition 2.3

A mapping \(F:H\rightarrow H\) is said to be

  1. (i)

    Lipschitzian if there exists a positive constant L such that

    $$ \Vert F x-Fy \Vert \leq L \Vert x-y \Vert ,\quad \forall x,y\in H. $$

    In particular, if \(L=1\), F is called nonexpansive. If \(L\in [0,1)\), F is called contractive.

  2. (ii)

    α-averaged mapping(α-av for short) if

    $$ F=(1-\alpha )I+\alpha T, $$

    where \(\alpha \in (0,1)\) and \(T:H\rightarrow H\) is nonexpansive.

Proposition 2.4


  1. (i)

    If \(T_{1}, T_{2},\ldots, T_{n} \) are averaged mappings, then we can get that \(T_{n}T_{n-1}\cdots T_{1}\) is averaged. In particular, if \(T_{i}\) is \(\alpha _{i}\)-av for each \(i=1,2\), where \(\alpha _{i} \in (0,1)\), then \(T_{2}T_{1}\) is \((\alpha _{2}+\alpha _{1}-\alpha _{2}\alpha _{1})\)-av.

  2. (ii)

    If the mappings \(\{T_{i}\}^{N}_{i=1}\) are averaged and have a common fixed point, then we have

    $$ \bigcap^{N}_{i=1}\operatorname{Fix}(T_{i})= \operatorname{Fix}(T_{1}\cdots T_{N}). $$

    Here, the notation \(\operatorname{Fix}(T)\) denotes the set of fixed points of the mapping T; that is, \(\operatorname{Fix}(T) := \{x \in H : Tx = x\}\).

  3. (iii)

    If T is ν-ism, then, for any \(\tau >0\), τT is \(\frac{\nu }{\tau }\)-ism.

  4. (iv)

    T is averaged if and only if \(I-T\) is ν-ism for some \(\nu >\frac{1}{2}\). Indeed, for any \(0<\alpha <1\), T is α-averaged if and only if \(I-T\) is \(\frac{1}{2\alpha }\)-ism.

Definition 2.5

(see [16])

The proximal operator of \(\varphi \in \Gamma _{0}(H)\) is defined by

$$ \operatorname{prox}_{\varphi }(x)=\arg \min_{\nu \in H}\biggl\{ \varphi (\nu )+\frac{1}{2} \Vert \nu -x \Vert ^{2}\biggr\} , \quad x\in H. $$

The proximal operator of φ of order \(\lambda >0\) is defined as the proximal operator of λφ, that is,

$$ \operatorname{prox}_{\lambda \varphi }(x)=\arg \min_{\nu \in H}\biggl\{ \varphi (\nu )+ \frac{1}{2\lambda } \Vert \nu -x \Vert ^{2}\biggr\} , \quad x\in H . $$

Lemma 2.6

The proximal identity

$$ \operatorname{prox}_{\lambda \varphi }x=\operatorname{prox}_{\mu \varphi } \biggl(\frac{\mu }{\lambda }x+\biggl(1- \frac{\mu }{\lambda }\biggr) \operatorname{prox}_{\lambda \varphi }x\biggr) $$

holds for \(\varphi \in \Gamma _{0}(H)\), \(\lambda >0 \) and \(\mu >0\).

Lemma 2.7

(Demiclosedness principle, see [7])

Let H be a real Hilbert space, and let \(T:H\rightarrow H \) be a nonexpansive mapping with \(\operatorname{Fix}(T)\neq \emptyset \). If \(\{x_{n}\}\) is a sequence in H weakly converging to x and if \(\{(I-T)x_{n}\}\) converges strongly to y, then \((I-T)x=y\); in particular, if \(y=0\), then \(x\in \operatorname{Fix}(T)\).

Lemma 2.8

(see [9])

Assume that \(\{s_{n}\}\) is a sequence of nonnegative real numbers such that

$$\begin{aligned}& s_{n+1}\leq (1-\gamma _{n})s_{n}+\gamma _{n}\mu _{n}, \quad n\geq 0, \\& s_{n+1}\leq s_{n}-\eta _{n}+\varphi _{n}, \quad n\geq 0, \end{aligned}$$

where \(\{\gamma _{n}\}\) is a sequence in \((0,1)\), \(\{\eta _{n}\}\) is a sequence of nonnegative real numbers and \(\{\mu _{n}\}\) and \(\{\varphi _{n}\}\) are two sequences in \(\mathbb{R}\) such that

  1. (i)

    \(\sum_{n=0}^{\infty }\gamma _{n}=\infty \),

  2. (ii)

    \(\lim_{n\rightarrow \infty }\varphi _{n}=0\),

  3. (iii)

    \(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\) for any subsequence \(\{n_{k}\}\subset \{n\}\).

Then \(\lim_{n\rightarrow \infty }s_{n}=0\).

Lemma 2.9

(see [7])

Let C be a nonempty closed convex subset of a real Hilbert space H. Let \(\{x_{n}\}\) be a sequence in H satisfying the properties:

  1. (i)

    \(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists for each \(z\in C\),

  2. (ii)

    \(\omega _{w}(x_{n})\subset C\), where \(\omega _{w}(x_{n}):=\{x : \exists x_{n_{j}}\rightharpoonup x\}\) (\(\{x_{n_{j}} \}\) is a subsequence of \(\{x_{n}\}\)) denotes the weak ω-limit set of \(\{x_{n}\}\).

Then \(\{x_{n} \}\) converges weakly to a point in C.

Lemma 2.10

(see [20])

Let \(\{s_{n}\}\) be a sequence of nonnegative numbers satisfying the generalized nonincreasing property

$$ s_{n+1}\leq s_{n}+\sigma _{n},\quad n\geq 0, $$

where \(\{\sigma _{n}\}\) is a sequence of nonnegative numbers such that \(\sum_{n=0}^{\infty }\sigma _{n}<\infty \). Then \(\{s_{n}\}\) is bounded and \(\lim_{n\rightarrow \infty }s_{n} \) exists.

3 Main results

3.1 Inertial proximal gradient algorithm

In this section, we combine a viscosity iterative method for approximating the unique fixed point of the following variational inequality problem (VIP for short):

$$ \bigl\langle (I-h)x^{*},\tilde{x}-x^{*}\bigr\rangle \geq 0,\quad \forall \tilde{x}\in \operatorname{Fix}(V_{\lambda }), $$

where \(h: H\rightarrow H\) is ρ-contractive and \(V_{\lambda }\) is nonexpansive.

We propose an inertial acceleration algorithm.

Algorithm 1

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\alpha _{n}h(y_{n})+(1- \alpha _{n}) \bigl(\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr)\bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n= n+1\) and go to 2.

Rewrite iteration (3.3) as follows:

$$\begin{aligned} x_{n+1}&=\alpha _{n}h(y_{n})+(1-\alpha _{n})\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+\hat{e}_{n}\bigr) \\ &=\alpha _{n}h(y_{n})+(1-\alpha _{n}) \bigl( \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n} \nabla f(y_{n})\bigr)+\tilde{e}_{n}\bigr), \end{aligned}$$

where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and

$$ \tilde{e}_{n}=\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+ \hat{e}_{n}\bigr)- \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}-\lambda _{n} \nabla f(y_{n})\bigr). $$

Note that \(\Vert \tilde{e}_{n}\Vert \leq \Vert \hat{e}_{n}\Vert \leq \Vert e(y_{n})\Vert +\lambda _{n} \Vert \theta (y_{n})\Vert \), it is easy to get \(\sum_{n=0}^{\infty }\Vert \tilde{e}_{n}\Vert <\infty \) from conditions (iii)–(iv) of Theorem 3.1. We use S to denote the solution set of problem (1.1).

Theorem 3.1

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Let h be ρ-contractive self-map of H with \(0\leq \rho <1\) andf is L-Lipschitzian. Assume that D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 1, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that

  1. (i)

    \(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);

  2. (ii)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  5. (v)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).


We divide the proof into several steps.

Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),

$$\begin{aligned} \Vert y_{n}-z \Vert &= \bigl\Vert x_{n}+\delta _{n}(x_{n}-x_{n-1})-z \bigr\Vert \\ &\leq \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert . \end{aligned}$$

Put \(V_{\lambda _{n}}:=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)\), from (3.4) and (3.5), we have

$$\begin{aligned} & \Vert x_{n+1}-z \Vert \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert \\ &\quad = \bigl\Vert \alpha _{n}\bigl(h(y_{n})-z \bigr)+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}-z)+(1- \alpha _{n})\tilde{e}_{n} \bigr\Vert \\ &\quad \leq \alpha _{n} \bigl\Vert h(y_{n})-h(z) \bigr\Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert +(1-\alpha _{n}) \Vert V_{\lambda _{n}}y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \alpha _{n}\rho \Vert y_{n}-z \Vert + \alpha _{n} \bigl\Vert h(z)-z \bigr\Vert +(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad =\bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert y_{n}-z \Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad =\bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert x_{n}-z \Vert +\alpha _{n}(1-\rho ) \frac{ \Vert h(z)-z \Vert +(\delta _{n} \Vert x_{n}-x_{n-1} \Vert + \Vert \tilde{e}_{n} \Vert )/\alpha _{n}}{1-\rho }. \end{aligned}$$

From conditions (iii)–(v) and \(\alpha _{n}>0\), we get \(\{(\delta _{n}\Vert x_{n}-x_{n-1}\Vert +\Vert \tilde{e}_{n}\Vert )/\alpha _{n}\}\) is bounded. Thus there exists some \(M_{1}>0\) such that

$$ M_{1}\geq \sup \bigl\{ \bigl\Vert h(z)-z \bigr\Vert +\bigl(\delta _{n} \Vert x_{n}-x_{n-1} \Vert + \Vert \tilde{e}_{n} \Vert \bigr)/\alpha _{n}\bigr\} $$

for all \(n\geq 0\). Then the mathematical induction implies that

$$ \Vert x_{n}-z \Vert \leq \max \biggl\{ \Vert x_{0}-z \Vert , \frac{M_{1}}{1-\rho }\biggr\} . $$

Therefore, the sequence \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\), \(\{h(y_{n})\}\), and \(\{V_{\lambda _{n}}y_{n}\}\).

Step 2. Show that \(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies

$$ \lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert =0 $$

for any sequence \(\{n_{k}\}\subset \{n\}\). Firstly, fix \(z\in S\), we have

$$\begin{aligned} \Vert y_{n}-z \Vert ^{2}&= \bigl\Vert x_{n}+\delta _{n}(x_{n}-x_{n-1})-z \bigr\Vert ^{2} \\ &\leq \Vert x_{n}-z \Vert ^{2}+2\bigl\langle x_{n}-z+\delta _{n}(x_{n}-x_{n-1}), \delta _{n}(x_{n}-x_{n-1})\bigr\rangle \\ &\leq \Vert x_{n}-z \Vert ^{2}+2\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert + \delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr). \end{aligned}$$

Then from (3.4) we get

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert ^{2} \\ &\quad \leq \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n})V_{\lambda _{n}}y_{n}-z \bigr\Vert ^{2} +2(1-\alpha _{n})\bigl\langle \alpha _{n}h(y_{n})+(1- \alpha _{n})V_{ \lambda _{n}}y_{n}-z,\tilde{e}_{n} \bigr\rangle \\ &\quad \quad {}+ \Vert \tilde{e}_{n} \Vert ^{2} \\ &\quad \leq \alpha _{n}^{2} \bigl\Vert h(y_{n})-z \bigr\Vert ^{2}+(1-\alpha _{n})^{2} \Vert V_{ \lambda _{n}}y_{n}-z \Vert ^{2}+2\alpha _{n}(1-\alpha _{n})\bigl\langle h(y_{n})-z,V_{ \lambda _{n}}y_{n}-z \bigr\rangle \\ &\quad \quad {} +\bigl(2\alpha _{n} \bigl\Vert h(y_{n})-z \bigr\Vert +2(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \bigr) \Vert \tilde{e}_{n} \Vert \\ &\quad \leq 2\alpha _{n}^{2}\bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert ^{2}+ \bigl\Vert h(z)-z \bigr\Vert ^{2}\bigr)+(1- \alpha _{n})^{2} \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} + 2\alpha _{n}(1-\alpha _{n})\bigl\langle h(y_{n})-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq 2\alpha _{n}^{2}\bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert ^{2}+ \bigl\Vert h(z)-z \bigr\Vert ^{2}\bigr)+(1- \alpha _{n})^{2} \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} + 2\alpha _{n}(1-\alpha _{n}) \bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert \Vert y_{n}-z \Vert + \bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z\bigr\rangle \bigr)+M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \bigl(1-\alpha _{n}\bigl(2-\alpha _{n} \bigl(1+2\rho ^{2}\bigr)-2(1-\alpha _{n}) \rho \bigr)\bigr) \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} +2\alpha _{n}(1-\alpha _{n})\bigl\langle h(z)-z,V_{ \lambda _{n}}y_{n}-z\bigr\rangle +2\alpha _{n}^{2} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert , \end{aligned}$$

where \(M_{2}\) is some constant such that

$$ M_{2}\geq \sup \bigl\{ 2\alpha _{n} \bigl\Vert h(y_{n})-z \bigr\Vert +2(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \bigr\} . $$

Put \(\gamma _{n}:=\alpha _{n}(2-\alpha _{n}(1+2\rho ^{2})-2(1-\alpha _{n}) \rho )\), using (3.4) and (3.7), we deduce that

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad \leq (1-\gamma _{n}) \Vert x_{n}-z \Vert ^{2}+2\delta _{n}(1-\gamma _{n}) \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr) \\ &\quad\quad {} +2\alpha _{n}(1-\alpha _{n})\bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle +2\alpha _{n}^{2} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$

Secondly, since \(V_{\lambda _{n}} \) is \(\frac{2+\lambda _{n}L}{4} \)-av by Proposition 2.4, we can rewrite

$$\begin{aligned} V_{\lambda _{n}}=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)=(1-w_{n})I+w_{n}T_{n}, \end{aligned}$$

where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive and, by condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.4), (3.8), and (3.10), we obtain

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert ^{2} \\ &\quad \leq \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n})V_{\lambda _{n}}y_{n}-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \bigl\Vert V_{\lambda _{n}}y_{n}-z+\alpha _{n} \bigl(h(y_{n})-V_{\lambda _{n}}y_{n}\bigr) \bigr\Vert ^{2} +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \Vert V_{\lambda _{n}}y_{n}-z \Vert ^{2}+{ \alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\Vert ^{2}+2\alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \bigl\Vert (1-w_{n})y_{n}+w_{n}T_{n}y_{n}-z \bigr\Vert ^{2}+{\alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {} +2\alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad =(1-w_{n}) \Vert y_{n}-z \Vert ^{2}+w_{n} \Vert T_{n}y_{n}-T_{n}z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2} \\ &\quad \quad {} +{\alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2}+2 \alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \Vert y_{n}-z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}+ \alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {}+2\alpha _{n}\bigl\langle V_{ \lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \Vert x_{n}-z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}+ \alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {}+2\alpha _{n}\bigl\langle V_{ \lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle \\ & \quad \quad {} +2\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr)+M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$


$$\begin{aligned}& s_{n}= \Vert x_{n}-z \Vert ^{2},\quad\quad \eta _{n}=w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}, \\& \begin{aligned} \mu _{n}={}&\frac{1}{2-\alpha _{n}(1+2{\rho }^{2})-2(1-\alpha _{n})\rho }\biggl(2{ \alpha _{n}} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \frac{ \Vert \tilde{e}_{n} \Vert }{\alpha _{n}} \\ &{}+ \frac{2\delta _{n} \Vert x_{n}-x_{n-1} \Vert ( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert )}{\alpha _{n}} \\ &{} +2(1-\alpha _{n})\bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle \biggr), \end{aligned} \\& \varphi _{n}=\alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2}+2 \alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$

Since \(\sum_{n=0}^{\infty }\gamma _{n}=\infty \) and \(\varphi _{n}\rightarrow 0\) hold obviously, in order to complete the proof by using Lemma 2.8, it suffices to verify that \(\eta _{n_{k}}\rightarrow 0\) (\(k\rightarrow \infty \)) implies

$$ \limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0 $$

for any subsequence \(\{n_{k}\}\subset \{n\}\).

Indeed, as \(k\rightarrow \infty \), \(\eta _{n_{k}}\rightarrow 0\) implies \(\Vert T_{n_{k}}y_{n_{k}}-y_{n_{k}}\Vert \rightarrow 0\), from (3.10), we have

$$\begin{aligned} \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert =w_{n_{k}} \Vert y_{n_{k}}-T_{n_{k}}y_{n_{k}} \Vert \rightarrow 0. \end{aligned}$$

Due to condition (v), it follows that

$$\begin{aligned} \Vert y_{n_{k}}-x_{n_{k}} \Vert =\delta _{n_{k}} \Vert x_{n_{k}}-x_{n_{k}-1} \Vert \rightarrow 0. \end{aligned}$$

Thus, we have

$$\begin{aligned} &\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad =\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-y_{n_{k}}+y_{n_{k}}-V_{ \lambda _{n_{k}}}y_{n_{k}}+V_{\lambda _{n_{k}}}y_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad \leq \lim_{k\rightarrow \infty } \Vert x_{n_{k}}-y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert V_{\lambda _{n_{k}}}y_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad \leq \lim_{k\rightarrow \infty }2 \Vert x_{n_{k}}-y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert . \end{aligned}$$

It follows from (3.12) and (3.13) that

$$\begin{aligned} &\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert =0. \end{aligned}$$

Step 3. Show that

$$\begin{aligned} \omega _{w}(x_{n_{k}})\subset S. \end{aligned}$$

Take \(\tilde{x} \in \omega _{w}(x_{n_{k}})\) and assume that \(\{x_{n_{k_{j}}}\}\) is a subsequence of \(\{x_{n_{k}}\}\) weakly converging to . Without loss of generality, we still use \(\{x_{n_{k}}\}\) to denote \(\{x_{n_{k_{j}}}\}\). Assume \(\lambda _{n_{k}}\rightarrow \lambda \), then \(0<\lambda <\frac{2}{L}\). Set \(V_{\lambda }=\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)\), then \(V_{\lambda }\) is nonexpansive. Set

$$ t_{k}=x_{n_{k}}-\lambda _{n_{k}}\nabla f(x_{n_{k}}), \quad\quad z_{k}=x_{n_{k}}- \lambda \nabla f(x_{n_{k}}). $$

Using the proximal identity of Lemma 2.6, we deduce that

$$\begin{aligned} & \Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \\ &\quad = \Vert \operatorname{prox}_{\lambda _{n_{k}}g}t_{k}- \operatorname{prox}_{\lambda g}z_{k} \Vert \\ &\quad = \biggl\Vert \operatorname{prox}_{\lambda g}\biggl( \frac{\lambda }{\lambda _{n_{k}}}t_{k}+ \biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr) \operatorname{prox}_{{\lambda _{n_{k}}}g}t_{k}\biggr)-\operatorname{prox}_{ \lambda g}z_{k} \biggr\Vert \\ &\quad \leq \biggl\Vert \frac{\lambda }{\lambda _{n_{k}}}t_{k}+\biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr)\operatorname{prox}_{\lambda _{n_{k}}g}t_{k}-z_{k} \biggr\Vert \\ &\quad \leq \frac{\lambda }{\lambda _{n_{k}}} \Vert t_{k}-z_{k} \Vert +\biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr) \Vert \operatorname{prox}_{\lambda _{n_{k}}g}t_{k}-z_{k} \Vert \\ &\quad =\frac{\lambda }{\lambda _{n_{k}}} \vert \lambda _{n_{k}}-\lambda \vert \bigl\Vert \nabla f(x_{n_{k}}) \bigr\Vert +\biggl(1-\frac{\lambda }{\lambda _{n_{k}}} \biggr) \Vert \operatorname{prox}_{ \lambda _{n_{k}}g}t_{k}-z_{k} \Vert . \end{aligned}$$

Since \(\{x_{n}\}\) is bounded, ∇f is Lipschitz continuous, and \(\lambda _{n_{k}}\rightarrow \lambda \), we immediately derive from the last relation that \(\Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}}\Vert \rightarrow 0\). As a result, we find

$$ \Vert x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \leq \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert + \Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \rightarrow 0. $$

Using Lemma 2.7, we get \(\omega _{w}(x_{n_{k}})\subset S\). Meanwhile, we have

$$\begin{aligned} &\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},V_{\lambda _{n_{k}}}y_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},V_{\lambda _{n_{k}}}x_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},x_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\bigl\langle h\bigl(x^{*}\bigr)-x^{*}, \tilde{x}-x^{*}\bigr\rangle ,\quad \forall \tilde{x} \in S. \end{aligned}$$

Also, since \(x^{*}\) is the unique solution of variational inequality problem (3.1), we get

$$ \limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},x_{n_{k}}-x^{*} \bigr\rangle \leq 0, $$

and hence \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\). □

Furthermore, we extend Algorithm 1 to a more generalized viscosity iterative algorithm. Suppose that the contractive mappings sequence \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume that the solution set \(S\neq \emptyset \), next we prove that the sequence \(\{x_{n}\}\) generated by Algorithm 2 converges strongly to a point \(x^{*}\in S\), which also solves variational inequality (3.1).

A more general inertial iterative algorithm is as follows.

Algorithm 2

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\alpha _{n}h_{n}(y_{n})+(1- \alpha _{n}) (\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n}) \bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.

Theorem 3.2

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent. Let \(\{h_{n}\}\) be a sequence of \(\rho _{n}\)-contractive self-mappings of H with \(0<\rho _{l}=\liminf_{n\rightarrow \infty }\rho _{n}\leq \limsup_{n \rightarrow \infty }\rho _{n}=\rho _{u}<1\) and \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume thatf is L-Lipschizian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), define the sequence \(\{x_{n}\}\) by Algorithm 2, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that

  1. (i)

    \(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);

  2. (ii)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  5. (v)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).


Using the uniform convergence of the sequence of contractive mapping \(\{h_{n}\}\) and consulting [6], we have \(\lim_{n\rightarrow \infty }h_{n}=h\). It is not hard to complete the proof by using some similar techniques as in Theorem 3.1. □

3.2 Alternated inertial proximal gradient algorithm

In the light of the ideas of [10, 17, 21] and more related references, combining the proximal gradient method, we consider the following algorithm.

Algorithm 3

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}= \textstyle\begin{cases} x_{n}+\delta _{n}(x_{n}-x_{n-1}), &n=\mathit{odd}, \\ x_{n}, &n=\mathit{even}. \end{cases} $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}-\lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.

Similar to (3.3), we rewrite (3.23) as follows:

$$ x_{n+1}=\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}-\lambda _{n}\nabla f(y_{n})\bigr)+ \tilde{e}_{n}, $$

where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and

$$ \tilde{e}_{n}=\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+ \hat{e}_{n}\bigr)- \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}-\lambda _{n} \nabla f(y_{n})\bigr). $$

Theorem 3.3

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Assume thatf is L-Lipschitzian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 3, where \(\lambda _{n}\in (0,\frac{2}{L})\). Suppose that

  1. (i)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  2. (ii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges weakly to a solution of the minimization problem of (1.1).


Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),

$$\begin{aligned} \Vert x_{2n+2}-z \Vert &= \Vert V_{\lambda _{2n+1}}y_{2n+1}+ \tilde{e}_{2n+1}-z \Vert \\ &\leq \Vert y_{2n+1}-z \Vert + \Vert \tilde{e}_{2n+1} \Vert \\ &= \bigl\Vert x_{2n+1}+\delta _{2n+1}(x_{2n+1}-x_{2n})-z \bigr\Vert + \Vert \tilde{e}_{2n+1} \Vert \\ &\leq \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert + \Vert \tilde{e}_{2n+1} \Vert . \end{aligned}$$

Applying conditions (ii) and (iv), we deduce that \(\{x_{2n}\}\) is bounded. Since

$$\begin{aligned} \Vert x_{2n+1}-z \Vert &= \Vert V_{\lambda _{2n}}y_{2n}+ \tilde{e}_{2n}-z \Vert \\ &= \Vert V_{\lambda _{2n}}x_{2n}+\tilde{e}_{2n}-z \Vert \\ &\leq \Vert x_{2n}-z \Vert + \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

It is easy to get that \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\) and \(\{V_{\lambda _{n}}y_{n}\}\). Also, it follows from (3.25) and (3.26) that \(\{x_{n}\}\) is quasi-Fejer monotone with respect to S. By Lemma 2.10, \(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists.

Step 2. Show that \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\) and \(\lim_{n\rightarrow \infty }\Vert x_{n}-V_{\lambda _{n}}x_{n}\Vert =0\). Firstly, fix \(z\in S\), by Lemma 2.2 and Schwartz’s inequality, we have

$$\begin{aligned} \Vert y_{2n+1}-z \Vert ^{2}&= \bigl\Vert x_{2n+1}+\delta _{2n+1}(x_{2n+1}-x_{2n})-z \bigr\Vert ^{2} \\ &\leq \Vert x_{2n+1}-z \Vert ^{2}+2\bigl\langle x_{2n+1}-z+\delta _{2n+1}(x_{2n+1}-x_{2n}), \delta _{2n+1}(x_{2n+1}-x_{2n})\bigr\rangle \\ &\leq \Vert x_{2n+1}-z \Vert ^{2} \\ &\quad{} +2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr). \end{aligned}$$

Since \(V_{\lambda _{n}}\) is \(\frac{2+\lambda _{n}L}{4} \)-av, we see that

$$\begin{aligned} V_{\lambda _{n}}=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)=(1-w_{n})I+w_{n}T_{n}, \end{aligned}$$

where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive. From condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.23) and (3.26), we obtain

$$\begin{aligned} \Vert x_{2n+2}-z \Vert ^{2}&= \Vert V_{\lambda _{2n+1}}y_{2n+1}+ \tilde{e}_{2n+1}-z \Vert ^{2} \\ &= \Vert V_{\lambda _{2n+1}}y_{2n+1}-z \Vert ^{2}+2\langle V_{\lambda _{2n+1}}y_{2n+1}-z, \tilde{e}_{2n+1}\rangle + \Vert \tilde{e}_{2n+1} \Vert ^{2} \\ &\leq \Vert y_{2n+1}-z \Vert ^{2}+ \Vert \tilde{e}_{2n+1} \Vert \bigl(2 \Vert y_{2n+1}-z \Vert + \Vert \tilde{e}_{2n+1} \Vert \bigr) \\ &\leq \Vert x_{2n+1}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ &\quad{} +M_{3} \Vert \tilde{e}_{2n+1} \Vert , \end{aligned}$$

where \(M_{3}=\sup \{2\Vert y_{2n+1}-z\Vert +\Vert \tilde{e}_{2n+1}\Vert \}\).

With the help of equality (3.28), we have

$$\begin{aligned} & \Vert x_{2n+1}-z \Vert ^{2} \\ &\quad = \Vert V_{\lambda _{2n}}y_{2n}+\tilde{e}_{2n}-z \Vert ^{2} \\ &\quad = \bigl\Vert (1-w_{2n})x_{2n}+w_{2n}T_{2n}x_{2n}-z \bigr\Vert ^{2}+2\langle V_{\lambda _{2n}}x_{2n}-z, \tilde{e}_{2n}\rangle + \Vert \tilde{e}_{2n} \Vert ^{2} \\ &\quad \leq (1-w_{2n}) \Vert x_{2n}-z \Vert ^{2}+w_{2n} \Vert T_{2n}x_{2n}-T_{2n}z \Vert ^{2}-w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} \\ & \quad \quad {} +\bigl(2 \Vert x_{2n}-z \Vert + \Vert \tilde{e}_{2n} \Vert \bigr) \Vert \tilde{e}_{2n} \Vert \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}-w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2}+M_{4} \Vert \tilde{e}_{2n} \Vert , \end{aligned}$$

where \(M_{4}=\sup \{2\Vert x_{2n}-z\Vert +\Vert \tilde{e}_{2n}\Vert \}\).

Substituting (3.30) into (3.29), we get

$$\begin{aligned} & \Vert x_{2n+2}-z \Vert ^{2} \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ & \quad \quad {} -w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2}+M_{3} \Vert \tilde{e}_{2n+1} \Vert +M_{4} \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

Hence, we have the following result:

$$\begin{aligned} &w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}- \Vert x_{2n+2}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ & \quad \quad {} +M_{3} \Vert \tilde{e}_{2n+1} \Vert +M_{4} \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

Noting the fact that \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\), we deduce from (3.32) that

$$\begin{aligned} \sum_{n=0}^{\infty } \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} < \infty . \end{aligned}$$

In particular, \(\lim_{n\rightarrow \infty }\Vert T_{2n}x_{2n}-x_{2n}\Vert =0\). Now we have

$$\begin{aligned} \Vert x_{2n+1}-x_{2n} \Vert \leq w_{2n} \Vert T_{2n}x_{2n}-x_{2n} \Vert + \Vert \tilde{e}_{2n} \Vert \rightarrow 0. \end{aligned}$$

Similarly, we argue that

$$\begin{aligned} \sum_{n=0}^{\infty } \Vert T_{2n+1}y_{2n+1}-y_{2n+1} \Vert ^{2} < \infty . \end{aligned}$$

Observe that

$$\begin{aligned} x_{2n+2}=(1-w_{2n+1})y_{2n+1}+w_{2n+1}T_{2n+1}y_{2n+1}+ \tilde{e}_{2n+1}. \end{aligned}$$

From (3.35) and condition (ii), we get

$$\begin{aligned} \Vert x_{2n+2}-y_{2n+1} \Vert \leq w_{2n+1} \Vert T_{2n+1}y_{2n+1}-y_{2n+1} \Vert + \Vert \tilde{e}_{2n+1} \Vert \rightarrow 0. \end{aligned}$$

It follows from (3.36) and condition (iv) that

$$\begin{aligned} \Vert x_{2n+2}-x_{2n+1} \Vert &\leq \Vert x_{2n+2}-y_{2n+1} \Vert + \Vert y_{2n+1}-x_{2n+1} \Vert \\ &= \Vert x_{2n+2}-y_{2n+1} \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \rightarrow 0. \end{aligned}$$

Combining (3.34) and (3.38), we obtain \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\). This yields

$$\begin{aligned} \Vert x_{n}-V_{\lambda _{n}}x_{n} \Vert &\leq \Vert x_{n}-x_{n+1} \Vert + \Vert x_{n+1}-V_{ \lambda _{n}}y_{n} \Vert + \Vert V_{\lambda _{n}}y_{n}-V_{\lambda _{n}}x_{n} \Vert \\ &\leq \Vert x_{n}-x_{n+1} \Vert + \Vert \tilde{e}_{n} \Vert + \Vert y_{n}-x_{n} \Vert \rightarrow 0. \end{aligned}$$

Step 3. Show that

$$\begin{aligned} \omega _{w}(x_{n})\subset S. \end{aligned}$$

Since \(\lambda _{n}\) is bounded, we may assume that the subsequence \(\lambda _{n_{k}}\) converges to some λ. It can be proved by a method similar to step 3 in Theorem 3.1. We conclude that (3.40) holds. By Lemma 2.9, we get \(\{x_{n}\}\) converges weakly. □

4 Numerical illustrations

In this section, we consider the following two examples to demonstrate the effectiveness of the algorithms and convergence of Theorem 3.1 and Theorem 3.3.

Example 4.1

Let \(H=\mathbb{R}^{N}\). Define \(h(x)=\frac{1}{10}x\). Take \(f(x)=\frac{1}{2}\Vert Ax-b\Vert ^{2}\), then we obtain that \(\nabla f(x)=A^{T}(Ax-b)\) with Lipschitz constant \(L=\Vert A^{T}A\Vert \), where \(A^{T}\) represents the transpose of A. Take \(g=\Vert x\Vert _{1}\), then

$$ \operatorname{prox}_{\lambda g}x=\arg \min_{v\in H}\biggl\{ \frac{1}{2\lambda } \Vert v-x \Vert ^{2}+ \Vert v \Vert _{1}\biggr\} . $$

In [15], we know that

$$ \operatorname{prox}_{\lambda _{n}\Vert \cdot \Vert _{1}}x=\bigl[\operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(1), \operatorname{prox}_{ \lambda _{n}\vert \cdot \vert }x(2),\ldots, \operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(N) \bigr]^{T}, $$

where \(\operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(i)=\max \{\vert x(i)\vert -\lambda _{n},0\}\operatorname{sign}(x(i))\), and \(x(i)\) denotes the ith element of x, \(i=1,2,\ldots,N\). Let D be a diagonal matrix with the element \(y_{n}(i)\). That is, \(D_{ii}=y_{n}(i)\), \(i=1,2,\ldots, N\). Given \(\alpha _{n}=\frac{1}{100n}\), \(\lambda _{n}=\frac{1}{30L}\frac{n+1}{n+2}\), and

$$ \delta _{n}= \textstyle\begin{cases} \frac{1}{n^{2} \Vert x_{n}-x_{n-1} \Vert }, & \Vert x_{n}-x_{n-1} \Vert \neq 0, \\ 0, & \Vert x_{n}-x_{n-1} \Vert = 0 \end{cases} $$

for every \(n\geq 0\). Generate an \(M*N\) random matrix A whose entries are sampled independently from uniformly distribution. Generate randomly a vector b from a Gaussian distribution of zero mean and unit variance.

According to the iterative process of Theorem 3.1, the sequence \(\{x_{n}\}\) is generated by

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\alpha _{n}h(y_{n})+(1-\alpha _{n})(\operatorname{prox}_{\lambda _{n}g}(y_{n}- \lambda _{n}D(y_{n})A^{T}(Ay_{n}-b)+e(y_{n})). \end{cases} $$

Next, we use MATLAB software for numerical implementation. Set \(M=100\), \(N=1000\). Under the same parameters, contrast with iterative algorithm (4.2) in reference [6]. Take different error limit ϵ, we obtain the numerical experiment results in Table 1, where n and t denote the iterative number and running time(tic/toc), respectively. We use \(\Vert x_{n+1}-x_{n}\Vert <\epsilon \) as the stopping criteria.

$$\begin{aligned} x_{n+1}=\alpha _{n}h(x_{n})+(1- \alpha _{n}) (\operatorname{prox}_{\lambda _{n}g}\bigl(x_{n}- \lambda _{n}DA^{T}(Ax_{n}-b)+e(x_{n}) \bigr). \end{aligned}$$
Table 1 Comparison of Algorithm 1 (IA) with the algorithm without inertia step (UA) for Example 4.1. \(x_{0}=\operatorname{randn}(N,1)\)

In addition, we compare the values of \(\Vert x_{n+1}-x_{n}\Vert \) at the same number of iterations of (4.1) and (4.2). The results can be seen in Fig. 1. We also present different running time and the number of iterations at different stopping criteria ϵ. See Fig. 2.

Figure 1
figure 1

The comparison of \(\Vert x_{n+1}-x_{n}\Vert \) of inertial acceleration (IA) and without inertial acceleration (UA) for \((M,N)= (100,1000)\) of Example 4.1

Figure 2
figure 2

The comparison of running time and iteration steps of inertial acceleration (IA) and without inertial acceleration (UA) with the same stopping criteria for \((M,N)= (100,1000)\) of Example 4.1

It can be easily seen from Table 1, Fig. 1, and Fig. 2 that Algorithm 1 is faster than iterative formula (4.2) without inertial step. At the same stopping criteria, the values of \(\Vert x_{n+1}-x_{n}\Vert \) and \(\Vert Ax_{n}-b\Vert \) of Algorithm 1 are smaller.

In what follows, we give an example in an infinite dimensional space.

Example 4.2

Suppose that \(H=L^{2}([0,1])\) with the norm \(\Vert x\Vert =(\int _{0}^{1}(x(t))^{2}\,dt)^{\frac{1}{2}}\) and the inner product \(\langle x,y\rangle =\int _{0}^{1}x(t)y(t)\,dt\), \(\forall x,y\in H\). Define \(h(x)=\frac{1}{2}x\) and \(Ax(t)=tx(t)\). Let \(f(x)=\frac{1}{2}\Vert Ax(t)-u(t)\Vert ^{2}\) and \(g(x)\) be the indicator function of C, respectively, where \(u(t)\in H\) is a fixed function and \(C=\{x\in H\vert \Vert x\Vert \leq 1\}\).

By the definition of f and g, we obtain

$$ \nabla f(x)=A^{*}(Ax-u) $$


$$ \operatorname{prox}_{\lambda g}x=\arg \min_{v\in H}\biggl\{ \frac{1}{2\lambda } \Vert v-x \Vert ^{2}+ \iota _{C}(v) \biggr\} =P_{C}(x), $$

where \(\iota _{C}\) denotes the indicator function and

$$ \iota _{C}(x)= \textstyle\begin{cases} 0, & \text{if } x\in C, \\ \infty ,& \text{if } x\notin C. \end{cases} $$

We also deduce the adjoint operator of A is still A, i.e., \(A^{*}=A\). Take \(D(x_{n})=I\), set the parameters \(\alpha _{n}=\frac{1}{1000n}\) and \(\lambda _{n}=\frac{n}{L*(n+1)}\), according to the iterative algorithm of Theorem 3.1, we get the following sequence \(\{x_{n}\}\):

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\frac{1}{1000n}\frac{1}{2}y_{n}+(1-\frac{1}{1000n})P_{C}(y_{n}- \frac{n}{L(n+1)}A(Ay_{n}-u)). \end{cases} $$

The numerical integration method used in this example is the trapezoidal formula. We test these two algorithms with different stopping criteria. The numerical results are shown in Table 2.

Table 2 Comparison of Algorithm 3 (AIA) with Algorithm 1 (IA) for Example 4.2. \(u=e^{t}\), \(x_{0}=t\), \(x_{1}=t^{2}\)

In what follows, we present a comparison of inertial proximal gradient algorithm (IA) and alternated inertial proximal gradient algorithm (AIA). Set \(e(y_{n})=\frac{1}{n^{2}}\) as the outer perturbation, the numerical results are reported in Table 3.

Table 3 Comparison of Algorithm 3 (AIA) with Algorithm 1 (IA) for Example 4.2. \(u=\sin t\), \(x_{0}=t\), \(x_{1}=2t\)

It is observed that the norm of \(x_{n}\) is close to 1 with the increase of iteration steps. From this example, the alternated inertia algorithm needs fewer iterations and less running time than inertia algorithm, but there is not much difference between the two algorithms.