1 Introduction

The split feasibility problem has received much attention due to its applications in signal processing and image reconstruction [1] with particular progress in intensity modulated therapy [2]. Recently, the split feasibility problem (1.3) has been studied extensively by many authors (see, for instance, [316]).

Our purpose of the present manuscript is to study the more general case of the proximal split minimization problems by introducing new algorithms with the regularization technique.

In the sequel, we assume that \(H_{1}\) and \(H_{2}\) are two real Hilbert spaces, \(f:H_{1}\to\mathcal{R}\cup\{+\infty\}\) and \(g:H_{2}\to\mathcal {R}\cup\{+\infty\}\) are two proper and lower semi-continuous convex functions and \(A:H_{1}\to H_{2}\) is a bounded linear operator.

Now, we focus on the following minimization problem:

$$ \min_{x^{\dagger}\in H_{1}}\bigl\{ f\bigl(x^{\dagger} \bigr)+g_{\lambda}\bigl(Ax^{\dagger}\bigr)\bigr\} , $$
(1.1)

where \(g_{\lambda}\) stands for the Moreau-Yosida approximate of the function g of parameter λ, that is,

$$g_{\lambda}(x)=\min_{y\in H_{2}} \biggl\{ g(y)+\frac{1}{2\lambda} \|x-y\|^{2} \biggr\} . $$

Remark 1.1

(1) The problem (1.1) includes the split feasibility problem as a special case. In fact, we choose f and g as the indicator functions of two nonempty closed convex sets \(C\subset H_{1}\) and \(Q\in H_{2}\), that is,

$$ f\bigl(x^{\dagger}\bigr)=\delta_{C}\bigl(x^{\dagger}\bigr)= \textstyle\begin{cases} 0, & \mbox{if } x^{\dagger}\in C, \\ +\infty, &\mbox{otherwise} \end{cases} $$

and

$$ g\bigl(x^{\dagger}\bigr)=\delta_{Q}\bigl(x^{\dagger}\bigr)= \textstyle\begin{cases} 0, & \mbox{if } x^{\dagger}\in Q, \\ +\infty, &\mbox{otherwise}. \end{cases} $$

Then the problem (1.1) reduces to

$$ \min_{x^{\dagger}\in H_{1}}\bigl\{ \delta_{C}\bigl(x^{\dagger} \bigr)+(\delta_{Q})_{\lambda}\bigl(Ax^{\dagger }\bigr)\bigr\} , $$

which equals

$$ \min_{x^{\dagger}\in C} \biggl\{ \frac{1}{2\lambda}\bigl\Vert (I-\operatorname{proj}_{Q}) \bigl(Ax^{\dagger}\bigr)\bigr\Vert ^{2} \biggr\} . $$
(1.2)

(2) Now, we know that to solve (1.2) is exactly to solve the split feasibility problem of finding \(x^{\ddagger}\) such that

$$ x^{\ddagger}\in C \quad \mbox{and} \quad Ax^{\ddagger}\in Q $$
(1.3)

provided \(C\cap A^{-1}(Q)\ne\emptyset\).

In order to solve (1.3), one of key ideas is to use fixed point technique, that is, \(x^{\dagger}\) solves (1.3) if and only if

$$ x^{\dagger}=\operatorname{proj}_{C}\bigl(I-\gamma A^{*}(I- \operatorname{proj}_{Q})A\bigr)x^{\dagger}, $$

where \(\gamma>0\) is a constant and \(\operatorname{proj}_{C}\) and \(\operatorname{proj}_{Q}\) stand for the orthogonal projections on the closed convex sets C and Q, respectively.

According to the above fixed point equation, a popular algorithm to solve the split feasibility problems is the CQ method ([4]):

$$ x_{n+1}=\operatorname{proj}_{C}\bigl(x_{n}- \tau_{n} A^{*}(I-\operatorname{proj}_{Q})Ax_{n}\bigr), $$

where the step size \(\tau_{n} \in(0, 2/\|A\|^{2})\).

However, the determination of the step size \(\tau_{n}\) depends on the operator norm \(\|A\|\) (or the largest eigenvalue of \(A^{*}A\)) which is in general not an easy work in practice. To overcome the above difficulty, the so-called self-adaptive method which permits step size \(\tau_{n}\) being selected self-adaptively was developed.

Self-adaptive algorithm

([17])

Let \(x_{0}\in H_{1}\) be an initial arbitrarily point. Assume that a sequence \(\{x_{n}\}\) in C has been constructed with \(\nabla{\bar{h}}(x_{n})\ne0\) as follows: Compute \(x_{n+1}\) via the rule

$$ x_{n+1}=\operatorname{proj}_{C}\bigl(x_{n}- \tau_{n} A^{*}(I-\operatorname{proj}_{Q})Ax_{n}\bigr), $$
(1.4)

where \(\tau_{n}=\rho_{n}\frac{\bar{h}(x_{n})}{\|\nabla{\bar{h}}(x_{n})\|^{2}}\) with \(0<\rho_{n}<4\) and \(\bar{h}(x)=\frac{1}{2}\|(I-P_{Q})Ax\|^{2}\).

If \(\nabla{\bar{h}}(x_{n})=0\), then \(x_{n+1}=x_{n}\) is a solution of the problem (1.3) and the iterative process stops. Otherwise, we set \(n:=n+1\) and go to the sequence (1.4).

In the present manuscript, our main purpose is to solve the problem (1.1) by using the fixed point technique and the self-adaptive methods. First, by the differentiability of the Yosida approximate \(g_{\lambda}\), we have

$$\begin{aligned} \partial\bigl(f\bigl(x^{\dagger}\bigr)+g_{\lambda} \bigl(Ax^{\dagger}\bigr)\bigr) =&\partial f\bigl(x^{\dagger }\bigr)+A^{*} \nabla g_{\lambda}\bigl(Ax^{\dagger}\bigr) \\ =&\partial f\bigl(x^{\dagger}\bigr)+A^{*} \biggl(\frac{I-\operatorname{prox}_{\lambda g}}{\lambda} \biggr) \bigl(Ax^{\dagger}\bigr), \end{aligned}$$
(1.5)

where \(\partial f(x^{\dagger})\) denotes the subdifferential of f at \(x^{\dagger}\) and \(\operatorname{prox}_{\lambda g}(x^{\dagger})\) is the proximal mapping of g, that is,

$$\partial f\bigl(x^{\dagger}\bigr)=\bigl\{ x^{*}\in H_{1}:f \bigl(x^{\ddagger}\bigr)\ge f\bigl(x^{\dagger}\bigr)+\bigl\langle x^{*},x^{\ddagger}-x^{\dagger}\bigr\rangle , \forall x^{\ddagger}\in H_{1}\bigr\} $$

and

$$\operatorname{prox}_{\lambda g}\bigl(x^{\dagger}\bigr)=\arg\min _{x^{\ddagger}\in H_{2}} \biggl\{ g\bigl(x^{\ddagger }\bigr)+\frac{1}{2\lambda} \bigl\Vert x^{\ddagger}-x^{\dagger}\bigr\Vert ^{2} \biggr\} . $$

Note that the optimality condition of (1.5) is as follows:

$$ 0\in\partial f\bigl(x^{\dagger}\bigr)+A^{*} \biggl(\frac{I-\operatorname{prox}_{\lambda g}}{\lambda} \biggr) \bigl(Ax^{\dagger}\bigr), $$

which can be rewritten as

$$ 0\in\mu\lambda\partial f\bigl(x^{\dagger}\bigr)+\mu A^{*}(I- \operatorname{prox}_{\lambda g}) \bigl(Ax^{\dagger}\bigr), $$
(1.6)

which is equivalent to the fixed point equation:

$$ x^{\dagger}=\operatorname{prox}_{\mu\lambda f}\bigl(x^{\dagger}- \mu A^{*}(I-\operatorname{prox}_{\lambda g}) \bigl(Ax^{\dagger}\bigr)\bigr) $$
(1.7)

for all \(\mu>0\).

If \(\arg\min f\cap A^{-1}(\arg\min g)\ne\emptyset\), then (1.1) is reduced to the following proximal split feasibility problem.

Find \(x^{\dagger}\) such that

$$ x^{\dagger}\in\arg\min f \quad \mbox{and}\quad Ax^{\dagger}\in\arg\min g, $$
(1.8)

where

$$\arg\min f=\bigl\{ x^{*}\in H_{1}: f\bigl(x^{*}\bigr)\le f \bigl(x^{\dagger}\bigr), \forall x^{\dagger}\in H_{1}\bigr\} $$

and

$$\arg\min g=\bigl\{ x^{\dagger}\in H_{2}: g\bigl(x^{\dagger} \bigr)\le g(x), \forall x\in H_{2}\bigr\} . $$

In the sequel, we use Γ to denote the solution set of the problem (1.8).

Recently, in order to solve the problem (1.8), Moudafi and Thakur [18] presented the following split proximal algorithm with a way of selecting the step sizes such that its implementation does not need any prior information as regards the operator norm.

Self-adaptive split proximal algorithm

For an initialization \(x_{0}\in H_{1}\), assume that a sequence \(\{x_{n}\}\) in H has been constructed and \(\theta(x_{n})\ne\emptyset\) as follows: Compute \(x_{n+1}\) via

$$ x_{n+1}=\operatorname{prox}_{\mu_{n}\lambda f}\bigl(x_{n}- \mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr) $$
(1.9)

for all \(n\ge0\), where the step size \(\mu_{n}=\rho_{n}\frac {h(x_{n})+l(x_{n})}{\theta^{2}(x_{n})}\) in which \(0<\rho_{n}<4\),

$$h(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr\Vert ^{2},\qquad l(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\mu_{n} \lambda f})x_{n}\bigr\Vert ^{2} $$

and

$$\theta(x_{n})=\sqrt{\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}+\bigl\Vert \nabla l(x_{n})\bigr\Vert ^{2}}. $$

If \(\theta(x_{n})=0\), then \(x_{n+1}=x_{n}\) is a solution of the problem (1.8) and the iterative process stops. Otherwise, we set \(n:=n+1\) and go to the sequence (1.9).

Consequently, they demonstrated the following weak convergence of the above split proximal algorithm.

Theorem 1.2

Suppose that \(\Gamma\ne\emptyset\). Assume that the parameters satisfy the condition:

$$ \epsilon\le\rho_{n}\le\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\epsilon $$

for some \(\epsilon>0\) small enough. Then the sequence \(\{x_{n}\}\) generated by (1.9) weakly converges to a solution of the problem (1.8).

Note that Theorem 1.2 has only the weak convergence. So, a natural problem arises:

Could we design a new algorithm such that the strong convergence is obtained?

In this paper, our main purpose is to adapt the algorithm (1.9) by using the regularization means such that the strong convergence is guaranteed.

2 Preliminaries

Let H be a real Hilbert space with the inner product \(\langle\cdot,\cdot\rangle\) and the norm \(\|\cdot\|\), respectively and C be a nonempty closed convex subset of H.

Recall that a mapping \(T:C\to C\) is said to be:

  1. (1)

    L-Lipschitz if there exists \(L>0\) such that

    $$\|Tx-Ty\|\le L\|x-y\| $$

    for all \(x,y\in C\). If \(L\in(0,1)\), then we call T the L-contraction. If \(L=1\), we call T a nonexpansive mapping.

  2. (2)

    Firmly nonexpansive if

    $$\|Tx-Ty\|^{2}\le\|x-y\|^{2}-\bigl\Vert (I-T)x-(I-T)y\bigr\Vert ^{2} $$

    for all \(x,y\in C\), where I denotes the identity, which is equivalent to

    $$\|Tx-Ty\|^{2}\le\langle Tx-Ty, x-y\rangle $$

    for all \(x,y\in C\). Also, the mapping \(I-T\) is firmly nonexpansive.

  3. (3)

    Strongly positive if there exists a constant \(\zeta>0\) such that

    $$\langle Tx,x\rangle\ge\zeta\|x\|^{2} $$

    for all \(x\in C\).

Note that the proximal mapping of g is firmly nonexpansive, namely,

$$\langle \operatorname{prox}_{\lambda g}x-\operatorname{prox}_{\lambda g}y,x-y\rangle \ge\| \operatorname{prox}_{\lambda g}x-\operatorname{prox}_{\lambda g}y\|^{2} $$

for all \(x,y\in H_{2}\) and it is also the case for the complement \(I-\operatorname{prox}_{\lambda g}\). Thus \(A^{*}(I-\operatorname{prox}_{\lambda g})A\) is cocoercive with coefficient \(\frac{1}{\|A\|^{2}}\), where we recall that a mapping \(B:H_{1}\to H_{1}\) is cocoercive if there exists \(\alpha>0\) such that

$$\langle Bx-By, x-y\rangle\ge\alpha\|Bx-By\|^{2} $$

for all \(x,y\in H_{1}\). If \(\mu\in (0,\frac{1}{\|A\|^{2}} )\), then \(I-\mu A^{*}(I-\operatorname{prox}_{\lambda g})A\) is nonexpansive.

Let C be a nonempty closed convex subset of H. For all \(x\in H\), there exists a unique nearest point in C, denoted by \(\operatorname{proj}_{C}x\), such that

$$ \|x-\operatorname{proj}_{C}x\|\leq\|x-y\| $$

for all \(y\in C\). The mapping \(\operatorname{proj}_{C}\) is called the metric projection of H onto C. It is well known that \(\operatorname{proj}_{C}\) is a nonexpansive mapping and is characterized by the following property:

$$ \langle x-\operatorname{proj}_{C}x,y-\operatorname{proj}_{C}x \rangle\leq0 $$
(2.1)

for all \(x\in H\) and \(y\in C\).

Now, we introduce two lemmas for our main results in this paper.

Lemma 2.1

([19])

Let \(\{a_{n}\}\) be a sequence of nonnegative real numbers satisfying the following relation:

$$a_{n+1}\le(1-\alpha_{n})a_{n}+ \alpha_{n}\sigma_{n}+\delta_{n} $$

for all \(n\ge0\), where

  1. (a)

    \(\{\alpha_{n}\}_{n\in\mathbb{N}}\subset[0,1]\) and \(\sum_{n=1}^{\infty}\alpha_{n}=\infty\);

  2. (b)

    \(\limsup_{n\to\infty}\sigma_{n}\le0\);

  3. (c)

    \(\sum_{n=1}^{\infty}\delta_{n}<\infty\).

Then \(\lim_{n\to\infty}a_{n}=0\).

Lemma 2.2

([20])

Let \(\{\gamma_{n}\}\) be a sequence of real numbers such that there exists a subsequence \(\{\gamma_{n_{i}}\}\) of \(\{\gamma_{n}\}\) such that \(\gamma _{n_{i}}<\gamma_{n_{i}+1}\) for all \(i\geq1\). Then there exists a nondecreasing sequence \(\{m_{k}\}\) of positive integers such that \(\lim_{k\to\infty}m_{k}=\infty\) and the following properties are satisfied by all (sufficiently large) positive integers k:

$$\gamma_{m_{k}}\le\gamma_{m_{k}+1},\qquad \gamma_{k}\le \gamma_{m_{k}+1}. $$

In fact, \(m_{k}\) is the largest number n in the set \(\{1, \ldots, k\}\) such that the condition \(\gamma_{n}<\gamma_{n+1}\) holds.

3 Main results

Now, we first introduce our self-adaptive algorithm. Let \(H_{1}\) and \(H_{2}\) be two real Hilbert spaces. Let \(f:H_{1}\to\mathcal{R}\cup\{ +\infty\}\) and \(g:H_{2}\to\mathcal{R}\cup\{+\infty\}\) be two proper and lower semi-continuous convex functions and \(A:H_{1}\to H_{2}\) be a bounded linear operator. Let \(\psi:H_{1}\to H_{1}\) be a κ-contraction and \(B:H_{1}\to H_{1}\) be a strongly positive bounded linear operator with coefficient \(\zeta>\kappa\).

Algorithm 3.1

Set

$$h(x)=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda g})Ax\bigr\Vert ^{2}, \qquad l(x)=\frac{1}{2}\bigl\Vert (I- \operatorname{prox}_{\lambda f})x\bigr\Vert ^{2} $$

and

$$\theta(x)=\sqrt{\bigl\Vert \nabla h(x)\bigr\Vert ^{2}+\bigl\Vert \nabla l(x)\bigr\Vert ^{2}} $$

for all \(x\in H_{1}\). For an initialization \(x_{0}\in H_{1}\), assume that a sequence \(\{x_{n}\}\) has been constructed in \(H_{1}\) with \(\theta(x_{n})\ne \emptyset\) as follows.

Compute \(x_{n+1}\) via

$$ x_{n+1}=\alpha_{n} \psi(x_{n})+(I- \alpha_{n}B)\operatorname{prox}_{\lambda f}\bigl(x_{n}-\mu _{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr) $$
(3.1)

for all \(n\ge0\), where \(\{\alpha_{n}\}\subset[0,1]\) is a real number sequence and \(\mu_{n}\) is the step size satisfying \(\mu_{n}=\rho_{n}\frac {h(x_{n})+l(x_{n})}{\theta^{2}(x_{n})}\) with \(0<\rho_{n}<4\).

If \(\theta(x_{n})=0\), then \(x_{n}\) is a solution of the problem (1.8) and the iterative process stops. Otherwise, we set \(n:=n+1\) and go to the sequence (3.1).

Theorem 3.2

Suppose that \(\Gamma\ne\emptyset\). Assume the parameters \(\{\alpha_{n}\}\) and \(\{\rho_{n}\}\) satisfy the conditions:

  1. (C1)

    \(\lim_{n\to\infty}\alpha_{n}=0\);

  2. (C2)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  3. (C3)

    \(\epsilon\le\rho_{n}\le\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\epsilon\) for some \(\epsilon>0\) small enough.

Then the sequence \(\{x_{n}\}\) generated by (3.1) converges strongly to a point \(z=\operatorname{proj}_{\Gamma}(\psi+I-B)(z)\).

Proof

From (2.1), we deduce that \(z=\operatorname{proj}_{\Gamma}(\psi +I-B)(z)\) implies

$$\bigl\langle (\psi+I-B) (z)-z, x-z\bigr\rangle \le0 $$

for all \(x\in\Gamma\), which has a unique solution. Let \(x^{*}\in\Gamma \). Since minimizers of any function are exactly fixed points of its proximal mappings, we have \(x^{*}=\operatorname{prox}_{\lambda f}x^{*}\) and \(Ax^{*}=\operatorname{prox}_{\lambda g}Ax^{*}\). Since \(\operatorname{prox}_{\lambda f}\) is nonexpansive, by (3.1), we can derive

$$\begin{aligned}& \bigl\Vert x_{n+1}-x^{*}\bigr\Vert \\& \quad =\bigl\Vert \alpha_{n} \psi(x_{n})+(I- \alpha_{n}B)\operatorname{prox}_{\lambda f}\bigl(x_{n}-\mu _{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr)-x^{*} \bigr\Vert \\& \quad =\bigl\Vert (I-\alpha_{n}B) \bigl(\operatorname{prox}_{\lambda f} \bigl(x_{n}-\mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr)-x^{*}\bigr) +\alpha_{n} \bigl(\psi(x_{n})-Bx^{*}\bigr) \bigr\Vert \\& \quad =\Vert I-\alpha_{n}B\Vert \bigl\Vert \operatorname{prox}_{\lambda f}\bigl(x_{n}-\mu_{n}A^{*}(I- \operatorname{prox}_{\lambda g})Ax_{n}\bigr)-\operatorname{prox}_{\lambda f}x^{*} \bigr\Vert \\& \qquad {} +\alpha_{n} \bigl\Vert \psi(x_{n})-\psi \bigl(x^{*}\bigr)\bigr\Vert +\alpha_{n}\bigl\Vert \psi\bigl(x^{*} \bigr)-Bx^{*}\bigr\Vert \\& \quad \le\alpha_{n}\kappa\bigl\Vert x_{n}-x^{*}\bigr\Vert +\alpha_{n}\bigl\Vert \psi\bigl(x^{*}\bigr)-Bx^{*}\bigr\Vert \\& \qquad {} +(1-\zeta\alpha_{n})\bigl\Vert \operatorname{prox}_{\lambda f} \bigl(x_{n}-\mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr)-\operatorname{prox}_{\lambda f}x^{*}\bigr\Vert \\& \quad \le\alpha_{n}\kappa\bigl\Vert x_{n}-x^{*}\bigr\Vert +\alpha_{n}\bigl\Vert \psi\bigl(x^{*}\bigr)-Bx^{*}\bigr\Vert \\& \qquad {} +(1-\zeta\alpha_{n})\bigl\Vert x_{n}- \mu_{n} A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}-x^{*}\bigr\Vert . \end{aligned}$$

Thus we have

$$\begin{aligned} \bigl\Vert x_{n+1}-x^{*}\bigr\Vert ^{2} \le& \alpha_{n}\kappa\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+(\zeta-\kappa)\alpha_{n}\frac{\Vert \psi (x^{*})-Bx^{*}\Vert ^{2}}{(\zeta-\kappa)^{2}} \\ &{} +(1-\zeta\alpha_{n})\bigl\Vert x_{n}- \mu_{n} A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}-x^{*}\bigr\Vert ^{2}. \end{aligned}$$
(3.2)

Since \(\operatorname{prox}_{\lambda g}\) is firmly nonexpansive, we deduce that \(I-\operatorname{prox}_{\lambda g}\) is also firmly nonexpansive. Hence we have

$$\begin{aligned}& \bigl\langle A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}, x_{n}-x^{*}\bigr\rangle \\& \quad =\bigl\langle (I-\operatorname{prox}_{\lambda g})Ax_{n}, Ax_{n}-Ax^{*}\bigr\rangle \\& \quad =\bigl\langle (I-\operatorname{prox}_{\lambda g})Ax_{n}-(I- \operatorname{prox}_{\lambda g})Ax^{*}, Ax_{n}-Ax^{*}\bigr\rangle \\& \quad \ge\bigl\Vert (I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr\Vert ^{2} \\& \quad =2h(x_{n}). \end{aligned}$$
(3.3)

Note that \(\nabla h(x_{n})=A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\) and \(\nabla l(x_{n})=(I-\operatorname{prox}_{\lambda f})x_{n}\). Thus it follows from (3.3) that

$$\begin{aligned}& \bigl\Vert x_{n}-\mu_{n}A^{*}(I- \operatorname{prox}_{\lambda g})Ax_{n}-x^{*}\bigr\Vert ^{2} \\& \quad =\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+ \mu_{n}^{2}\bigl\Vert A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr\Vert ^{2}-2\mu_{n}\bigl\langle A^{*}(I- \operatorname{prox}_{\lambda g})Ax_{n}, x_{n}-x^{*}\bigr\rangle \\& \quad =\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+ \mu_{n}^{2}\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}-2\mu_{n}\bigl\langle \nabla h(x_{n}), x_{n}-x^{*}\bigr\rangle \\& \quad \le\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+ \mu_{n}^{2}\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}-4\mu_{n}h(x_{n}) \\& \quad =\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+ \rho_{n}^{2}\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{4}(x_{n})}\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}-4\rho_{n} \frac{h(x_{n})+l(x_{n})}{\theta^{2}(x_{n})}h(x_{n}) \\& \quad \le\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+ \rho_{n}^{2}\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})} -4\rho_{n} \frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})}\frac {h(x_{n})}{h(x_{n})+l(x_{n})} \\& \quad =\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}- \rho_{n} \biggl(\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho_{n} \biggr) \frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})}. \end{aligned}$$
(3.4)

By the condition (C3), without loss of generality, we can assume that

$$\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\frac{\rho_{n}}{1-\alpha_{n}}\ge0 $$

for all \(n\ge0\). Thus, from (3.2) and (3.4), we obtain

$$\begin{aligned}& \bigl\Vert x_{n+1}-x^{*}\bigr\Vert ^{2} \\& \quad \le\alpha_{n}\kappa\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}+(\zeta-\kappa)\alpha_{n}\frac{\Vert \psi (x^{*})-Bx^{*}\Vert ^{2}}{(\zeta-\kappa)^{2}} \\& \qquad {} +(1-\zeta\alpha_{n}) \biggl[\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2}-\rho_{n} \biggl( \frac {4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho_{n} \biggr)\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta ^{2}(x_{n})} \biggr] \\& \quad =(\zeta-\kappa)\alpha_{n}\frac{\Vert \psi(x^{*})-Bx^{*}\Vert ^{2}}{(\zeta-\kappa )^{2}}+\bigl[1-(\zeta- \kappa)\alpha_{n}\bigr]\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2} \\& \qquad {} -(1-\zeta\alpha_{n})\rho_{n} \biggl( \frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho_{n} \biggr)\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})} \\& \quad \le(\zeta-\kappa)\alpha_{n}\frac{\Vert \psi(x^{*})-Bx^{*}\Vert ^{2}}{(\zeta-\kappa )^{2}}+\bigl[1-(\zeta- \kappa)\alpha_{n}\bigr]\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2} \\& \quad \le\max \biggl\{ \frac{\Vert \psi(x^{*})-Bx^{*}\Vert ^{2}}{(\zeta-\kappa)^{2}},\bigl\Vert x_{n}-x^{*}\bigr\Vert ^{2} \biggr\} . \end{aligned}$$
(3.5)

Hence \(\{x_{n}\}\) is bounded.

Let \(z=P_{\Gamma}(\psi+I-B)z\). From (3.5), we deduce

$$\begin{aligned} 0 \le&(1-\zeta\alpha_{n})\rho_{n} \biggl( \frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho _{n} \biggr)\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})} \\ \le&(\zeta-\kappa)\alpha_{n}\frac{\|\psi(z)-Bz\|^{2}}{(\zeta-\kappa )^{2}}+\bigl[1-(\zeta- \kappa)\alpha_{n}\bigr]\|x_{n}-z\|^{2}- \|x_{n+1}-z\|^{2}. \end{aligned}$$
(3.6)

We consider the following two cases.

Case 1. \(\|x_{n+1}-z\|\le\|x_{n}-z\|\) for all \(n\ge n_{0}\) large enough.

In this case, \(\lim_{n\to\infty}\|x_{n}-z\|\) exists and is finite, and hence

$$\lim_{n\to\infty}\bigl(\Vert x_{n+1}-z\Vert -\Vert x_{n}-z\Vert \bigr)=0. $$

This together with (3.6) implies that

$$\rho_{n} \biggl(\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho_{n} \biggr) \frac {(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})}\to0. $$

Since \(\rho_{n} (\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\rho_{n} )\ge\epsilon ^{2}\) by the condition (C3), we have

$$\frac{(h(x_{n})+l(x_{n}))^{2}}{\theta^{2}(x_{n})}\to0. $$

Noting that \(\theta^{2}(x_{n})=\|\nabla h(x_{n})\|^{2}+\|\nabla l(x_{n})\|^{2}\) is bounded, we deduce immediately that

$$\lim_{n\to\infty}\bigl(h(x_{n})+l(x_{n}) \bigr)=0. $$

Therefore, we have

$$ \lim_{n\to\infty}h(x_{n})=\lim _{n\to\infty}l(x_{n})=0. $$
(3.7)

Next, we prove that

$$\limsup_{n\to\infty}\bigl\langle (\psi+I-B)z-z, x_{n}-z \bigr\rangle \le0. $$

Since \(\{x_{n}\}\) is bounded, there exists a subsequence \(\{x_{n_{i}}\}\) of \(\{x_{n}\}\) such that \(x_{n_{i}}\rightharpoonup z^{\dagger}\) and

$$\limsup_{n\to\infty}\bigl\langle (\psi+I-B)z-z, x_{n}-z \bigr\rangle =\lim_{i\to \infty}\bigl\langle (\psi+I-B)z-z, x_{n_{i}}-z\bigr\rangle . $$

By the lower semi-continuity of h, we have

$$0\le h\bigl(z^{\dagger}\bigr)\le\liminf_{i\to\infty}h(x_{n_{i}})= \lim_{n\to\infty}h(x_{n})=0. $$

So, we have

$$h\bigl(z^{\dagger}\bigr)=\frac{1}{2}\bigl\Vert (I- \operatorname{prox}_{\lambda g})Az^{\dagger}\bigr\Vert =0, $$

that is, \(Az^{\dagger}\) is a fixed point of the proximal mapping of g or, equivalently, \(0 \in\partial g(Az^{\dagger})\). In other words, \(Az^{\dagger}\) is a minimizer of g.

Similarly, from the lower semi-continuity of l, we have

$$0\le l\bigl(z^{\dagger}\bigr)\le\liminf_{i\to\infty}l(x_{n_{i}})= \lim_{n\to\infty}l(x_{n})=0. $$

Therefore, we have

$$l\bigl(z^{\dagger}\bigr)=\frac{1}{2}\bigl\Vert (I- \operatorname{prox}_{\lambda f})z^{\dagger}\bigr\Vert =0, $$

that is, \(z^{\dagger}\) is a fixed point of the proximal mapping of f or, equivalently, \(0 \in\partial f(z^{\dagger})\). In other words, \(z^{\dagger}\) is a minimizer of f. Hence \(z^{\dagger}\in\Gamma\). Therefore, we have

$$\begin{aligned} \limsup_{n\to\infty}\bigl\langle (\psi+I-B)z-z, x_{n}-z\bigr\rangle &=\lim_{i\to \infty}\bigl\langle ( \psi+I-B)z-z, x_{n_{i}}-z\bigr\rangle \\ &=\bigl\langle (\psi+I-B)z-z, z^{\dagger}-z\bigr\rangle \le0. \end{aligned}$$
(3.8)

By (3.4), we have

$$\bigl\Vert \operatorname{prox}_{\lambda f}\bigl[x_{n}- \mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr]-z \bigr\Vert \le \Vert x_{n}-z\Vert . $$

Thus it follows from (3.1) that

$$\begin{aligned}& \Vert x_{n+1}-z\Vert ^{2} \\& \quad =\alpha_{n}\bigl\langle \psi(x_{n})-\psi(z), x_{n+1}-z\bigr\rangle +\alpha_{n}\bigl\langle \psi(z)-Bz,x_{n+1}-z\bigr\rangle \\& \qquad {} +(I-\alpha_{n}B)\bigl\langle \operatorname{prox}_{\lambda f} \bigl(x_{n}-\mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr)-z, x_{n+1}-z\bigr\rangle \\& \quad \le\alpha_{n}\bigl\Vert \psi(x_{n})-\psi(z)\bigr\Vert \Vert x_{n+1}-z\Vert +\alpha_{n}\bigl\langle \psi (z)-Bz,x_{n+1}-z\bigr\rangle \\& \qquad {} +\Vert I-\alpha_{n}B\Vert \bigl\Vert \operatorname{prox}_{\lambda f}\bigl(x_{n}-\mu_{n}A^{*}(I- \operatorname{prox}_{\lambda g})Ax_{n}\bigr)-z\bigr\Vert \Vert x_{n+1}-z\Vert \\& \quad \le\alpha_{n}\kappa \Vert x_{n}-z\Vert \Vert x_{n+1}-z\Vert +\alpha_{n}\bigl\langle \psi (z)-Bz,x_{n+1}-z\bigr\rangle \\& \qquad {} +(1-\zeta\alpha_{n})\Vert x_{n}-z\Vert \Vert x_{n+1}-z\Vert \\& \quad \le\frac{1-(\zeta-\kappa)\alpha_{n}}{2}\Vert x_{n}-z\Vert ^{2}+ \frac{1}{2}\Vert x_{n+1}-z\Vert ^{2}+ \alpha_{n}\bigl\langle \psi(z)-Bz,x_{n+1}-z\bigr\rangle . \end{aligned}$$

Thus it follows that

$$ \Vert x_{n+1}-z\Vert ^{2}\le\bigl[1-(\zeta- \kappa)\alpha_{n}\bigr]\Vert x_{n}-z\Vert ^{2}+2\alpha _{n}\bigl\langle \psi(z)-Bz,x_{n+1}-z \bigr\rangle . $$
(3.9)

From Lemma 2.1, (3.8) and (3.9) we deduce that \(x_{n}\to z\).

Case 2. There exists a subsequence \(\{\|x_{n_{j}}-z\|\}\) of \(\{\| x_{n}-z\|\}\) such that

$$\|x_{n_{j}}-z\|< \|x_{n_{j}+1}-z\| $$

for all \(j\ge1\). By Lemma 2.2, there exists a strictly increasing sequence \(\{m_{k}\}\) of positive integers such that \(\lim_{k\to\infty}m_{k}=+\infty\) and the following properties are satisfied: for all \(k\in\mathbb{N}\),

$$ \|x_{m_{k}}-z\|\le\|x_{m_{k+1}}-z\|, \qquad \|x_{k}-z\|\le\|x_{m_{k+1}}-z\|. $$
(3.10)

Consequently, we have

$$\begin{aligned} 0&\le\lim_{k\to\infty}\bigl(\Vert x_{m_{k+1}}-z\Vert - \Vert x_{m_{k}}-z\Vert \bigr) \\ &\le\limsup_{n\to\infty}\bigl(\Vert x_{n+1}-z\Vert - \Vert x_{n}-z\Vert \bigr) \\ &\le\limsup_{n\to\infty}\bigl(\alpha_{n}\Vert u-z\Vert +(1-\alpha_{n})\Vert x_{n}-z\Vert -\Vert x_{n}-z\Vert \bigr) \\ &= \limsup_{n\to\infty}\alpha_{n}\bigl(\Vert u-z\Vert -\Vert x_{n}-z\Vert \bigr)=0 \end{aligned}$$

and so

$$ \lim_{k\to\infty}\bigl(\Vert x_{m_{k+1}}-z \Vert -\Vert x_{m_{k}}-z\Vert \bigr)=0. $$
(3.11)

By a similar argument to Case 1, we can prove that

$$\limsup_{k\to\infty}\bigl\langle (\psi+I-B)z-z, x_{m_{k}}-z \bigr\rangle \le0 $$

and

$$ \|x_{m_{k}+1}-z\|^{2}\le\bigl[1-(\zeta-\kappa) \alpha_{n}\bigr]\|x_{m_{k}}-z\|^{2}+\alpha _{m_{k}}\sigma_{m_{k}}, $$

where \(\sigma_{m_{k}}=2\langle\psi(z)-Bz,x_{m_{k}+1}-z\rangle\). In particular, we have

$$\begin{aligned} (\zeta-\kappa)\alpha_{m_{k}}\|x_{m_{k}}-z\|^{2} \le& \|x_{m_{k}}-z\|^{2}-\| x_{m_{k}+1}-z\|+\alpha_{m_{k}} \sigma_{m_{k}} \\ \le&\alpha_{m_{k}}\sigma_{m_{k}}. \end{aligned}$$

Then we have

$$ \limsup_{k\to\infty}\|x_{m_{k}}-z\|^{2}\le\limsup _{k\to\infty}\sigma _{m_{k}}\le0. $$

Thus it follows from (3.10) and (3.11) that

$$ \limsup_{k\to\infty}\|x_{k}-z\|\le\limsup _{k\to\infty}\|x_{m_{k}+1}-z\|=0, $$

which implies that \(x_{n}\to z\). This completes the proof. □

Algorithm 3.3

For an initialization \(x_{0}\in H_{1}\). Assume that a sequence \(\{x_{n}\}\) has been constructed as follows: Set

$$h(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr\Vert ^{2}, \qquad l(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda f})x_{n}\bigr\Vert ^{2} $$

and

$$\theta(x_{n})=\sqrt{\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}+\bigl\Vert \nabla l(x_{n})\bigr\Vert ^{2}} $$

for all \(n\in\mathbb{N}\).

If \(\theta(x_{n})\ne\emptyset\), then compute \(x_{n+1}\) via

$$ x_{n+1}=\alpha_{n} \psi(x_{n})+(1- \alpha_{n})\operatorname{prox}_{ \lambda f}\bigl(x_{n}-\mu _{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n}\bigr) $$
(3.12)

for all \(n\ge0\), where \(\{\alpha_{n}\}_{n\in\mathbb{N}}\subset[0,1]\) is a real number sequence and \(\mu_{n}\) is the step size satisfying

$$\mu_{n}=\rho_{n}\frac{h(x_{n})+l(x_{n})}{\theta^{2}(x_{n})} $$

with \(0<\rho_{n}<4\).

If \(\theta(x_{n})=0\), then \(x_{n}\) is a solution of the problem (1.8) and the iterative process stops. Otherwise, we set \(n:=n+1\) and go to (3.12).

From Theorem 3.2, we have the following corollary.

Corollary 3.4

Suppose that \(\Gamma\ne\emptyset\). Assume the parameters \(\{\alpha_{n}\}\) and \(\{\rho_{n}\}\) satisfy the conditions:

  1. (C1)

    \(\lim_{n\to\infty}\alpha_{n}=0\);

  2. (C2)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  3. (C3)

    \(\epsilon\le\rho_{n}\le\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\epsilon\) for some \(\epsilon>0\) small enough.

Then the sequence \(\{x_{n}\}\) generated by (3.12) converges strongly to a point \(z=\operatorname{proj}_{\Gamma}(\psi)(z)\).

Algorithm 3.5

For an initialization \(x_{0}\in H_{1}\). Assume that a sequence \(\{x_{n}\}\) has been constructed as follows: Set

$$h(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr\Vert ^{2}, \qquad l(x_{n})=\frac{1}{2}\bigl\Vert (I-\operatorname{prox}_{\lambda f})x_{n}\bigr\Vert ^{2} $$

and

$$\theta(x_{n})=\sqrt{\bigl\Vert \nabla h(x_{n})\bigr\Vert ^{2}+\bigl\Vert \nabla l(x_{n})\bigr\Vert ^{2}} $$

for all \(n\in\mathbb{N}\).

If \(\theta(x_{n})\ne\emptyset\), then compute \(x_{n+1}\) via

$$ x_{n+1}=(1-\alpha_{n})\operatorname{prox}_{\lambda f} \bigl(x_{n}-\mu_{n}A^{*}(I-\operatorname{prox}_{\lambda g})Ax_{n} \bigr) $$
(3.13)

for all \(n\ge0\), where \(\{\alpha_{n}\}_{n\in\mathbb{N}}\subset[0,1]\) is a real number sequence and \(\mu_{n}\) is the step size satisfying

$$\mu_{n}=\rho_{n}\frac{h(x_{n})+l(x_{n})}{\theta^{2}(x_{n})} $$

with \(0<\rho_{n}<4\).

If \(\theta(x_{n})=0\), then \(x_{n}\) is a solution of the problem (1.8) and the iterative process stops. Otherwise, we set \(n:=n+1\) and go to (3.13).

Corollary 3.6

Suppose that \(\Gamma\ne\emptyset\). Assume the parameters \(\{\alpha_{n}\}\) and \(\{\rho_{n}\}\) satisfy the following conditions:

  1. (C1)

    \(\lim_{n\to\infty}\alpha_{n}=0\);

  2. (C2)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  3. (C3)

    \(\epsilon\le\rho_{n}\le\frac{4h(x_{n})}{h(x_{n})+l(x_{n})}-\epsilon\) for some \(\epsilon>0\) small enough.

Then the sequence \(\{x_{n}\}\) generated by (3.13) converges strongly to a point \(z=\operatorname{proj}_{\Gamma}(0)\), which is the minimum norm element in Γ.

Remark 3.7

Where the bounded linear operator A is the identity operator, the problem (1.8) is nothing else than the problem of finding a common minimizer of f and g and (3.1) reduces to the following relaxed split proximal algorithm:

$$ x_{n+1}=\alpha_{n} \psi(x_{n})+(I- \alpha_{n}B)\operatorname{prox}_{\lambda f}\bigl((1-\mu _{n})x_{n}+\mu_{n}\operatorname{prox}_{\lambda g}x_{n} \bigr) $$

for all \(n\ge0\).