1 Introduction

Let E be a real Banach space with dual \(E^*\) and \(\langle f,g \rangle \) denotes the duality pairing between \(f \in E\) and \(g \in E^*.\) Let C be a nonempty, closed and convex subset of E and \(F:E \rightarrow E^*\) be a nonlinear operator, we consider the variational inequality problem (VIP) in the sense of Fichera (1963; 1984) which is defined as finding a point \(x^* \in C\) such that

$$\begin{aligned} \langle Fx^*,y-x^* \rangle \ge 0,~~\forall ~y~ \in C. \end{aligned}$$
(1.1)

We denote the set of solution of VIP (1.1) by VIP(CF). The theory of VIP has been researched widely due to its applications in modelling various real life problems. For instance, it is used as a model for obstacle and contact problems in mechanics, traffic equilibrium problems and so on (see, e.g. Alber 1996). It also finds application in several fields of study such as optimization theory, nonlinear programming, science and engineering (Facchinei and Pang 2003; Hartman and Stampacchia 1966; Kinderlehrer and Stampachia 2000; Lions and Stampacchia 1967).

The problem of approximating a solution of the VIP has received a lot of attention in recent published articles in the area of optimization theory (see the recent works of Hieu 2017; Jolaoso et al. 2021a, b; Oyewole et al. 2021b, a). The most popular of these methods is the Extragradient method (EGM) which was first introduced by Korpelevich (1976) for solving the saddle point problem. The EGM is known to be characterized with some drawbacks in its use for the VIP (Jolaoso et al. 2021b; Jolaoso and Aphane 2020). For instance, the EGM only converges weakly when the underlying operator is monotone. Also, the execution of the EGM depends on calculating a projection onto a feasible set twice per iteration. Another drawback of the EGM is the dependence of the method on the Lipschitz constant of the cost operator. In this direction, (Censor et al. 2011a, b) introduced the subgradient extragradient method (SEGM) where the second projection was replaced by a projection onto a carefully selected constructible halfspace. It is known that this method still preserves the weakness of the EGM through the dependence of the method on the Lipschitz condition of the cost operator. To overcome this, authors have deviced several means which includes the use of an Armijo linesearch rule (a linesearch is a technique where the outside loop is determined by some inner calculation) (see Shehu and Iyiola 2019). Another means of avoiding the dependence of the method on the Lipschitz constant is by using a self adaptive method (see Oyewole et al. 2021a and the references therein).

Schematic approximation of a common solution of a system of variational inequality problems (CSVIP) has been considered in the literature. We define the CSVIP as follows: Let \(i=1,2, \cdots , N,\) C be a nonempty, closed and convex subset of a real reflexive Banach space E and \(F^i : C \rightarrow E^*\) be nonlinear mappings. The CSVIP consists of finding a point \(x^* \in C\) such that

$$\begin{aligned} \langle F^ix^*,y-x^* \rangle \ge 0,~~\forall ~ y \in C, ~~i=1,2,\cdots N. \end{aligned}$$
(1.2)

We denote the set of solution of the CSVIP (1.2) by \(\Gamma ,\) i.e., \(\Gamma := \{z \in C: \langle F^iz,y-z \rangle \ge 0,~~ \forall ~y \in C, ~~ {i=1,2,\cdots N} \}.\) Note that, if \(N=1\) in (1.2), then the CSVIP (1.2) redues to the VIP (1.1). The motivation for studying the CSVIP stems from its applications in studying several problems in applied sciences whose constraint can be modelled as CSVIP, for instance, generalized Nash equilibrium problem and utility-based bandwidth allocation problems (see, e.g., Jolaoso et al. 2021).

For studying the approximation of solution of CSVIP in the setting of real reflexive Banach spaces, Kassay et al. (2011) first proposed a Bregman projection technique for the case when \(F^i\) is a Bregman-inverse strongly monotone operator. Censor et al. (2011a) introduced the subgradient exragradient method as an improvement of the extragradient method introduced by Korpelevich (1976) and Antipin (1976) in 1970’s for solving the classical VIP (1.1) in finite dimensional spaces. Their results has been generalized to infinite dimensional real Hilbert spaces by many authors, see, e.g., (Censor et al. 2011b; Khanh and Vuong 2014; Solodov and Svaiter 1999; Lin et al. 2005; Malitsky 2015). Censor et al. (2012) also proposed a hybrid method for CSVIP (1.2) which requires finding the minimum of the Lipschitz constants of the cost operators as an input parameter. This idea can be very complicated in two ways; first if \(F^i\) has a complex structure (as in optimal control model) and if the feasible set is not so simple for calculating the projection onto the intersection of two constructed half-spaces. Censor et al. (2012) algorithm was modified by Hieu (2017) by using a parallel method, however the new method inherits the two drawbacks mentioned earlier. Other similar methods for solving the CSVIP can be found in Anh and Phuong (2018). Recently, Kitisak et al. (2020) proposed a modified hybrid parallel extragradient method by using an Armijo line-search technique which compute the stepsize in each iteration implicitly. It is known that the line-search idea uses an inner iteration which consume additional time and memory in its computation. Jolaoso et al. (2021b) recently proposed a new self-adaptive technique as an improvement of the previous methods for solving the CSVIP with monotone operators in real Hilbert spaces.

We observe that most of the above cited results on the CSVIP used the Euclidean norm and metric projections. The Bregman distance is a key substitute and generalization of the Euclidean distance, and it is induced by a chosen convex function. It has found numerous applications in optimization theory, nonlinear analysis, inverse problems, and recently machine learning; see, for instance, (Chambolle 2004; Reem et al. 2019). In addition, the use of Bregman distance allows the consideration of a general feasible set structure for the variational inequalities. In particular, we can choose Kullback-Leibler divergence (a Bregman distance on negative entropy) and obtain an explicitly calculated projection onto a simplex. For solving the VIP, Nemirovski (2004) introduced the Nemirovski-prox method which is a variant of the EGM where the projection is understood in the sense of Bregman divergence. Gibali (2018) also proposed a Bregman projection algorithm for solving the VIP (1.1) in finite dimensional space using the extragradient technique. Moreover, other methods involving the Bregman distance for solving the VIP (1.1) can be found in Jolaoso and Aphane (2020), Jolaoso et al. (2021) and references therein.

Motivated by the above results, we introduce a self adaptive algorithm for approximating a common solution of a finite family of VIPs. Using Bregman distance and projections, we obtain and prove a strong convergence theorem under a pseudomonotonicity assumption of the cost operator. In particular, the following highlights the contribution of this article to the literature:

  1. (i)

    We note that in the work of Jolaoso et al. (2021a), the authors studied the approximation of solution of CSVIP using metric projection. Moreover, the cost operators are monotone and weakly sequentially continuous. In this paper, using Bregman projection technique, we studied the CSVIP in the settings of reflexive Banach spaces and the cost operators are pseudomonotone with weaker continuity assumption.

  2. (ii)

    In Kitisak et al. (2020), the authors proposed Algorithm MPHSEM which uses a line search technique known to consume additional computational time and memory. In our algorithm, the stepsize is determined by a self-adaptive process and does not require a line searching procedure.

  3. (iii)

    Also in Hieu (2017), Anh and Phuong (2018), Kitisak et al. (2020), the algorithms required constructing the two sets \(C_n\) and \(Q_n\) then computing projection of a vector onto their intersection which can be complicated. Our method in this paper does require constructing the sets \(C_n\) and \(Q_n\) nor the projection onto their intersection.

2 Preliminaries

In this section, we give some prelimnary results and definitions that we will be used in this sequel. Throughout this paper, unless stated otherwise, let C be a nonempty closed and convex subset of a real Banach space E. We denote the weak and strong convergence of a sequence \(\{x^k\} \subset E\) to a point \(x \in E\) by \(x^k \rightharpoonup x\) and \(x^k \rightarrow x\) respectively.

Let \(f : E \rightarrow (-\infty , \infty ]\) be a function. Let \(x \in \) int(domf), for any \(y \in E\), the directional derivative of f at x is defined by

$$\begin{aligned} f^{o}(x,y) = \lim _{t \rightarrow 0^{+}}\dfrac{f(x+ty)-f(x)}{t}. \end{aligned}$$
(2.1)

If the limit as \(t \rightarrow 0^{+}\) in (2.1) exists for each y, then f is said to be Gâteaux differentiable at x. In this case, the gradient of f at x is the linear function \(\nabla f(x)\), which is defined by

$$\begin{aligned} \langle \nabla f(x),y \rangle = f^{o}(x,y), \end{aligned}$$

for all \(y \in E\). The function f is said to be Gâteaux differentiable if it is Gâteaux differentiable at each \(x \in \) int(domf). When the limit as \(t \rightarrow 0^{+}\) in (2.1) is attained uniformly for any \(y \in E\) with \(||y||=1\), we say that f is Fréchet differentiable at x.

Let \(f: E \rightarrow {\mathbb {R}} \cup \{+\infty \}\) be a convex differentiable function.

  1. (a)

    The function f is called proper if the domain of f denoted domf defined by

    $$\begin{aligned}dom f=\{x \in E: f(x) < +\infty \}\ne \emptyset . \end{aligned}$$
  2. (b)

    The subdifferential set of f at a point x,  denoted \(\partial f(x)\) is given by

    $$\begin{aligned}\partial f(x)=\{z \in E: f(y)-f(x)\ge \langle z,y-x \rangle , ~~\forall ~ y \in H \}. \end{aligned}$$

    An element \(z \in \partial f(x)\) is called a subgradient of f. If f is continuously differentiable, then \(\partial f(x)=\{\nabla f(x) \}.\)

  3. (c)

    The Fénchel conjugate function of f is the convex function \(f^* : E \rightarrow {\mathbb {R}}\) defined by

    $$\begin{aligned}f^*(y)=\sup \{\langle y,x \rangle -f(x): x \in E \}. \end{aligned}$$
  4. (d)

    The function f is said to be Legendre if and only if the following hold:

    1. (L1)

      \(int (dom f)\ne \emptyset \) and \(\partial f\) is single-valued on its domain;

    2. (L2)

      \(int (dom f^*)\ne \emptyset \) and \(\partial f^*\) is single-valued on its domain.

The Bregman distance (see Bregman 1967) corresponding to the function f is defined by

$$\begin{aligned} D_f(x,y)=f(x)-f(y)-\langle \nabla f(y),x-y \rangle , ~~\forall ~ x \in dom f,~~ \forall ~ y \in int (dom f). \end{aligned}$$

It is well documented that the Bregman distance is not a metric in the usual sense because it fails to be symmetric and does not satisfy the triangle inequality property. However, it posses the following useful property referred to as the three points identity: for \(x \in dom f\) and \(y,z \in int (dom f),\) we have

$$\begin{aligned} D_f(x,y)+D_f(y,z)\le D_f(x,z)+\langle \nabla f(z)-\nabla f(y),x-y \rangle . \end{aligned}$$
(2.2)

The Bregman distance finds applications in solving many important optimization problems (see e.g Bauschke et al. 2001; Jolaoso et al. 2021a) and the references therein. The following are common Bregman functions with their corresponding Bregman distance functions (See Beck 2017).

Example 2.1

  1. (i)

    If \(f^{KL}(x)=-\sum \limits _{j=1}^{m}x_j \log x_j\) with domain \({\mathbb {R}}_{+}=\{x \in {\mathbb {R}}^m: x_j >0\},\) then

    $$\begin{aligned}D_f^{KL}(x,y)=\sum \limits _{j=1}^{m}\left( x_j \log \left( \frac{x_j}{y_j}\right) -x_j \right) +\sum \limits _{j=1}^{m}y_j\end{aligned}$$

    which is the Kullback-Leibler distance (KLD).

  2. (ii)

    If \(f^{IS}(x) = -\sum \limits _{j=1}^{m}\log (x_j)\) with domain \({\mathbb {R}}_{++}=\{x \in {\mathbb {R}}^m: x_j >0\},\) then we have

    $$\begin{aligned} D_f^{IS}(x,y)=\sum \limits _{j=1}^{m}\left( \frac{x_j}{y_j}-\log \left( \frac{x_j}{y_j}\right) -1 \right) \end{aligned}$$

    which is called Itakura-Saito Distance (ISD).

  3. (iii)

    If \(f^{SE}(x)=\frac{1}{2}\Vert x\Vert ^2\) with domain \({\mathbb {R}}^n,\) then \(D_f^{SE}(x,y)=\frac{1}{2}\Vert x-y\Vert ^2\) is the square Euclidean distance (SED).

The function f is called \(\sigma \)-strongly convex if

$$\begin{aligned} D_f(x,y) \ge \frac{\sigma }{2}\Vert x-y\Vert ^2, ~~\forall ~x \in dom f,~~y \in int (dom f). \end{aligned}$$
(2.3)

The Bregman projection with respect to f of \(x \in int (dom f)\) for a nonempty, closed and convex set \(C \subset int (dom f)\) is the unique vector \(\Pi _C (x) \in C\) satisfying

$$\begin{aligned} {D_f(\Pi _C(x),x)}:=\inf \{D_f(y,x): y \in C \}. \end{aligned}$$

The Bregman projection is characterized by the following identities which are the analogues of the metric projection (see Reich and Sabach 2009):

$$\begin{aligned} z=\Pi _C(x) \iff \langle \nabla f(x)-\nabla f(z),z-y \rangle \ge 0,~~\forall ~y \in C \end{aligned}$$
(2.4)

and

$$\begin{aligned} D_f(y,\Pi _C x)+D_f(\Pi _C x,x) \le D_f(y,x), ~~\forall ~ y \in C,~ x \in int (dom f). \end{aligned}$$
(2.5)

We also require the function \(V_f : E \times E \rightarrow [0, \infty )\) associated with f (see Alber 1996; Censor and Lent 1981) defined by

$$\begin{aligned} V_f(x,y)=f(x)-\langle x,y \rangle + f^*(y), ~~ \forall ~x,y \in E.\end{aligned}$$

It is known that \(V_f\) is nonnegative and \(V_f(x,y)=D_f(x, \nabla f^*(y))\) for all \(x,y \in E.\) Moreover, by the subdifferential inequality, it is easy to see (e.g Kohsaka and Takahashi 2005), that

$$\begin{aligned} V_f(x,y)+\langle z, \nabla f^*(y)-x \rangle \le V_f(x,z+y) ~~\forall ~x,y,z \in E. \end{aligned}$$
(2.6)

In sum, if \(f: E \rightarrow {\mathbb {R}} \cup \{+ \infty \}\) is a proper, convex and lower semicontinuous function, then \(f^* : E \rightarrow {\mathbb {R}} \cup \{+ \infty \}\) is a proper, convex, weak lower semicontinuous function (see Phelps 1993) and thus \(V_f\) is convex in second variable, that is

$$\begin{aligned} D_f \left( z, \nabla f^* \left( \sum \limits _{i=1}^{N} \lambda _i\nabla f(x_i) \right) \right) \le \sum \limits _{i=1}^{N} \lambda _i D_f(z,x_i), \end{aligned}$$
(2.7)

where \(\{x_i\}_{i=1}^{N} \subset E\) and \(\{\lambda _i \}_{i=1}^{N} \subset (0,1)\) with \(\sum \limits _{i=1}^{N}\lambda _i=1.\)

Lemma 2.2

Naraghirad and yao (2013) Let \(r > 0\) be a constant and let \(f: E \rightarrow {\mathbb {R}}\) be a continuous uniformly convex function on bounded subsets of E. Then,

$$\begin{aligned} f\left( \sum \limits _{k=0}\alpha ^kx^k \right) \le \sum \limits _{k=0}\alpha ^kf(x^k)-\alpha _i \alpha _j \rho _r(\Vert x_i-x_j\Vert ), \end{aligned}$$
(2.8)

for all \(i,j \in {\mathbb {N}} \cup \{0\},\) \(x^k \in B_r, ~~\alpha ^k \in (0,1)\) and \(k \in {\mathbb {N}} \subset \{ 0\}\) with \(\sum \limits _{k=0} \alpha ^k=1,\) where \(\rho _r\) is the gauge of uniform convexity of f.

An operator \(F:E \rightarrow E^*\) is called

  1. (i)

    monotone if \(\langle F(x) - F(y), x- y \rangle \ge 0\) for all \(x,y \in E\);

  2. (ii)

    Bregman inverse strongly monotone if \(\langle F(x) - F(y), \nabla f^*(\nabla f(x) - F(x)) - \nabla f^*(\nabla f(y) - F(y)) \rangle \ge 0\) for all \(x,y \in E;\)

  3. (iii)

    pseudomonotone if \(\langle F(x), y- x \rangle \ge 0 \Rightarrow \langle F(y), y-x \rangle \ge 0\) for all \(x,y \in E.\)

Lemma 2.3

(Cottle and Yao 1992, Lemma 2.1) Consider the VIP (1.1) with C a nonempty, closed and convex subset of a real Hilbert space E and \(F: C \rightarrow E^*\) is a pseudomonotone and continuous mapping. Then, p is a solution of (1.1) if and only if

$$\begin{aligned} \langle Fp,q-p \rangle \ge 0, ~~\forall ~ q \in C. \end{aligned}$$

Lemma 2.4

Saejung and Yotkaew (2012) Let \(\{a^k\}\) be a sequence of nonnegative real numbers, \(\{\alpha ^k\}\) be a sequence of real numbers in (0, 1) with the condition \(\sum \limits _{k=1}^{k}\alpha ^k=\infty \) and \(\{b^k \}\) be a subsequence of real numbers. Suppose that

$$\begin{aligned} a^{k+1}\le (1-\alpha ^k)a^k+\alpha ^k b^k, ~~\forall ~ k \ge 1.\end{aligned}$$

If \(\limsup b^{k_j} \le 0\) for every subsequence \(\{a^{k_j}\}\) of \(\{a^k\}\) satisfying

$$\begin{aligned} \liminf \limits _{j \rightarrow \infty }(a^{k_j+1}-a^{k_j}) \ge 0, \end{aligned}$$

then \(a^k \rightarrow 0\) as \(k \rightarrow \infty .\)

3 Main result

In this section, we introduce a strong convergent algorithm for approximating a solution of the CSVIP and then discuss its convergence analysis. First we make the following assumptions:

Assumption A

  1. (A1)

    For each \(i=1,2, \cdots , N,\) the mapping \(F^i : E \rightarrow E^*\) is pseudomonotone and uniformly continuous and \(L_i\)-Lipschitz continuous. However, note that the execution of our method does not require that the Lipschitz constants be known.

  2. (A2)

    For any sequence \(\{x^k\} \subset C\) such that \(x^k \rightharpoonup p \in E\) and \(\liminf \limits _{k \rightarrow \infty }\Vert F^ix^k\Vert =0,\) then \(F^ip=0.\)

  3. (A3)

    The solution set \(\Gamma \) is nonempty.

  4. (A4)

    Let \(f: E \rightarrow {\mathbb {R}} \cup \{+\infty \}\) be a function which is Legendre, uniformly Fréchet differentiable, \(\sigma \)-strongly convex, and bounded on bounded subsets of E.

Assumption B

  1. (B1)

    \(\beta ^k \in (0,1)\) such that \(\sum \limits _{k=1}^{\infty }\beta ^k=\infty \) and \(\lim \limits _{k \rightarrow \infty }\beta ^k=0;\)

  2. (B2)

    \(\circ (\beta ^k)=\frac{1}{k^2};\) i.e \(\lim \limits _{k \rightarrow \infty }\frac{1}{k^2\beta ^k}=0;\)

  3. (B3)

    \(\liminf \limits _{k \rightarrow \infty }\delta ^{k,0}\delta ^{k,i} > 0,~~i=1,2,\cdots ,N.\)

figure a

Remark 3.2

We observe from Algorithm 3.1, that the method is self adaptive with a monotone decreasing sequence \(\{\alpha ^k\},\) thus the dependence of the cost operators on the Lipschitz constants is dispensed with. The convergence of the method is made faster by the incorporation of the inertial term and finally the technique is based on the Bregman projection and distances which opens the method for more applications.

Remark 3.3

Note that the property (A2) is strictly weaker than the weak sequentially continuous condition which has recently been used for solving pseudomonotone VIP; see Jolaoso et al. (2021); Hieu (2017); Kitisak et al. (2020); Thong et al. (2021); Jolaoso et al. (2020). For instance, let \(E = \ell _2,\) and \(F: \ell _2 \rightarrow \ell _2\) be defined by \(F(x) = \Vert x\Vert x.\) Let \(\{x^k\} \subset \ell _2\) such that \(x^k \rightharpoonup p\) and \(\liminf _{k\rightarrow \infty }\Vert F(x^k) \Vert = 0.\) Indeed, by the weakly lower semicontinuity of the squared norm, we have \(\Vert p\Vert ^2 \le \liminf _{k\rightarrow \infty }\Vert p\Vert ^2.\) Hence

$$\begin{aligned} \Vert F(p)\Vert = \Vert p\Vert ^2 \le \liminf _{n\rightarrow \infty }\Vert p\Vert ^2 = \liminf _{n\rightarrow \infty }\Vert F(x^k)\Vert = 0. \end{aligned}$$

This implies that \(\Vert F(p)\Vert = 0.\) However, to see that A is not weak sequentially continuous, choose \(x^k = e^k + e^1,\) where \(\{e^k\}\) is the standard basis of \(\ell _2,\) i.e., \(e^k = (0,0,\dots ,1,\dots )\) with 1 at the k-th position. It is clear that \(x^k \rightharpoonup e^1\) and \(F(x^k) = F(e^1 + e^k) = \Vert e^1+e^k\Vert (e^1 + e^k) \rightharpoonup \sqrt{2}e^1.\) But \(F(e^1) = \Vert e^1\Vert e^1 = e^1.\) Hence F is not weak sequentially continuous.

Remark 3.4

The sequence \(\{\alpha ^k\}\) given by (3.6) is monotonically non-increasing and

$$\begin{aligned} \lim \limits _{k \rightarrow \infty }\alpha ^k=\alpha \ge \min \left\{ \alpha ^0,\min \left\{ \frac{\mu }{L_i}\right\} \right\} . \end{aligned}$$

To see this, it is easy to see that \(\{\alpha ^k \}\) is monotonically non-increasing. Now, from the Lipschitz continuity of \(F^i\) for all \(i=1,2, \cdots , N,\) we have

$$\begin{aligned} \frac{\mu \Vert w^k-y^{k,i}\Vert }{\Vert F^i w^k-F^iy^{k,i}\Vert } \ge \frac{\mu \Vert w^k-y^{k,i}\Vert }{L_i\Vert w^k-y^{k,i}\Vert }=\frac{\mu }{L_i}. \end{aligned}$$

Hence, the sequence \(\{\alpha ^k \}\) as a lower bound of \(\min \left\{ \min _{1 \le i \le N} \left\{ \frac{\mu }{L_i}\right\} ,\alpha ^0 \right\} .\) Thus, there exists a limit \(\lim \limits _{k \rightarrow \infty }\alpha ^k=\alpha \ge \min \left\{ \alpha ^0,\min \left\{ \frac{\mu }{L_i}\right\} \right\} .\)

Lemma 3.5

Let \(p \in \Gamma \) and let \(\{x^k\}\) be the sequence given by Algorithm 3.1, then the following inequality holds

$$\begin{aligned} D_f(p,z^{k,i})\le D_f(p,w^k)-\left( 1-\frac{\mu \alpha ^k}{\sigma \alpha ^{k+1}} \right) [D_f(y^{k,i},w^k)+D_f(z^{k,i},y^{k,i})]. \end{aligned}$$

Proof

From \(p \in \Gamma \) and (3.3), we have

$$\begin{aligned} D_f(p,z^{k,i}&\le D_f(p, \nabla f^*(\nabla f(w^k)-\alpha ^kF^iy^{k,i}))-D_f(z^{k,i},\nabla f^*(\nabla f(w^k)-\alpha ^kF^iy^{k,i}))\nonumber \\&=V_f(p,f(w^k)-\alpha ^kF^iy^{k,i})-V_f(z^{k,i},f(w^k)-\alpha ^kF^iy^{k,i})\nonumber \\&=f(p)-\langle p,\nabla f(w^k) \rangle +\alpha ^k \langle p,F^iy^{k,i}\rangle -f(z^{k,i})+\langle z^{k,i},\nabla f(w^k)\rangle \nonumber \\&\quad -\alpha ^k \langle z^{k,i},F^iy^{k,i} \rangle \nonumber \\&=D_f(p,w^k)-D_f(z^{k,i},w^k)+\alpha ^k \langle p-z^{k,i},F^iy^{k,i} \rangle . \end{aligned}$$
(3.7)

From \(p \in \Gamma ,\) we have that \(\langle F^ip,y-p \rangle \ge 0\) for all \(y \in C.\) Since \(F^i\) is pseudomonotone for each i,  we have \(\langle F^iy,y-p \rangle \ge 0\) for all \(y \in C,\) thus \(\langle F^iy^{k,i},y^{k,i}-p \rangle \ge 0.\) Therefore, we have

$$\begin{aligned} \langle F^iy^{k,i},p-z^{k,i} \rangle&= \langle F^iy^{k,i},p-y^{k,i} \rangle + \langle F^iy^{k,i},y^{k,i}-z^{k,i} \rangle \\&\le \langle F^iy^{k,i},y^{k,i}-z^{k,i} \rangle . \end{aligned}$$

Using this in (3.7), we obtain

$$\begin{aligned} D_f(p,z^{k,i})&\le D_f(p,w^k)-D_f(z^{k,i},w^k)+\alpha ^k \langle y^{k,i}-z^{k,i},F^iy^{k,i} \rangle , \end{aligned}$$

which implies by (2.2), that

$$\begin{aligned} D_f(p,z^{k,i})&\le D_f(p,w^k)-D_f(z^{k,i},y^{k,i})-D_f(y^{k,i},w^k)+\langle \nabla f(w^k)\nonumber \\&\quad -\nabla f(y^{k,i}),z^{k,i}-y^{k,i}\rangle +\alpha ^k \langle y^{k,i}-z^{k,i},F^iy^{k,i} \rangle \nonumber \\&=D_f(p,w^k)-D_f(z^{k,i},y^{k,i})-D_f(y^{k,i},w^k)+\langle \nabla f(w^k)\nonumber \\&\quad -\alpha ^k F^i y^{k,i}-\nabla f(y^{k,i}),z^{k,i}-y^{k,i} \rangle . \end{aligned}$$
(3.8)

Observe from \(z^{k,i} \in T^{k,i}\) for all i, that

$$\begin{aligned} \langle z^{k,i}-y^{k,i},w^k-\alpha ^kF^iw^k-y^{k,i}\rangle \le 0, \end{aligned}$$

thus

$$\begin{aligned}&\langle \nabla f(w^k)-\alpha ^kF^iy^{k,i}-\nabla f(z^{k,i}),z^{k,i}-y^{k,i}\nonumber \\&\quad =\langle \nabla f(w^k)- \alpha ^k F^iw^k-\nabla f(y^{k,i}) + \alpha ^k \langle F^iw^k- F^iy^{y,k},z^{k,i}-y^{k,i} \rangle \nonumber \\&\quad \le \alpha ^k \langle F^iw^k- F^iy^{y,k},z^{k,i}-y^{k,i} \rangle \nonumber \\&\quad \le \dfrac{\mu \alpha ^k}{\alpha ^{k+1}}\Vert w^k-y^{k,i}\Vert \cdot \Vert z^{k,i}-y^{k,i}\Vert . \end{aligned}$$
(3.9)

Substituting (3.9) into (3.8), we get

$$\begin{aligned} D_f(p,z^{k,i})&\le D_f(p,w^k)-D_f(z^{k,i},y^{k,i})-D_f(y^{k,i},w^k)\\&\quad +\dfrac{\mu \alpha ^k}{2\alpha ^{k+1}}[\Vert w^k-y^{k,i}\Vert ^2+ \Vert z^{k,i}-y^{k,i}\Vert ^2]. \end{aligned}$$

We thus have from (2.3), that

$$\begin{aligned} D_f(p,z^{k,i})&\le D_f(p,w^k)-\left( 1-\frac{\mu \alpha ^k}{\sigma \alpha ^{k+1}} \right) (D_f(y^{k,i},w^k)+D_f(z^{k,i},y^{k,i})). \end{aligned}$$
(3.10)

\(\square \)

Lemma 3.6

The sequence \(\{x^k\}\) defined iteratively by Algorithm 3.1 is bounded. Consequently, the sequences \(\{y^{k,i}\}\) and \(\{z^{k,i}\}\) are bounded.

Proof

Let \(p \in \Gamma .\) Since \(\mu \in (0,1)\) and \(\sigma > 0,\) we have

$$\begin{aligned} \lim \limits _{k \rightarrow \infty }\left( 1-\frac{\mu \alpha ^k}{\sigma \alpha ^{k+1}}\right) =\left( 1-\frac{\mu }{\sigma }\right) > 0. \end{aligned}$$

Hence, there exists \(N_1 \ge 0,\) such that for all \(k \ge N_1,\) we get \(1-\frac{\mu \alpha ^k}{\sigma \alpha ^{k+1}} > 0.\) Thus, we get from Lemma 3.5 that

$$\begin{aligned} D_f(p,z^{k,i})\le D_f(p,w^k). \end{aligned}$$

It follows from (3.2), that

$$\begin{aligned} D_f(p,w^k)&=D_f(p,\nabla f^*(\nabla f(x^k)+\theta ^k(\nabla f(x^{k-1})-\nabla f(x^k)) ))\nonumber \\&\le (1-\theta ^k)D_f(p,x^k)+\theta ^kD_f(p,x^{k-1}). \end{aligned}$$
(3.11)

Now, we see from (3.4) and Lemma 2.2, that

$$\begin{aligned} D_f(p,v^k)&=D_f(p,\nabla f^* (\delta ^{k,0}\nabla f(w^k)+\sum \limits _{i=1}^{N}\delta ^{k,i}\nabla f(z^{k,i}))\nonumber \\&=f(p)-\delta ^{k,0}\langle p,\nabla f(w^k) \rangle -\sum \limits _{i=1}^{N}\delta ^{k,i} \langle p,\nabla f(z^{k,i})+\delta ^{k,0} f^*(\nabla f(w^k))\nonumber \\&\quad +\sum \limits _{i=1}^{N}\delta ^{k,i} f^*(\nabla f(z^{k,i}))\nonumber \\&\quad -\sum \limits _{i=1}^{N}\delta ^{k,0}\delta ^{k,i}\rho _{s}^{*}(\Vert \nabla f(w^k)-\nabla f(z^{k,i})\Vert )\nonumber \\&=\delta ^{k,0}V_f(p,\nabla f(w^k))+\sum \limits _{i=1}^{N}\delta ^{k,i}V_f(p,\nabla f(z^{k,i}))\nonumber \\&\quad -\sum \limits _{i=1}^{N}\delta ^{k,0}\delta ^{k,i}\rho _{s}^{*}(\Vert \nabla f(w^k)-\nabla f(z^{k,i})\Vert )\nonumber \\&\le D_f(p,w^k)\sum \limits _{i=1}^{N}\delta ^{k,0}\delta ^{k,i}\rho _{s}^{*}(\Vert \nabla f(w^k)-\nabla f(z^{k,i})\Vert )\nonumber \\&\le D_f(p,w^k). \end{aligned}$$
(3.12)

Finally, we have from (3.5), that

$$\begin{aligned} D_f(p,x^{k+1})&=D_f(p,\nabla f^* (\beta ^k \nabla f(x^0)+ (1-\beta ^k)\nabla f(v^k)))\\&\le \beta ^k D_f(p,x^0) + (1-\beta ^k)[(1-\theta ^k)D_f(p,x^k)+\theta ^k D_f(p,x^{k-1})]\\&\le \max \left\{ D_f(p,x^0), \max \{D_f(p,x^{k}),D_f(p,x^{k-1}) \} \right\} \\&\vdots \\&\le \max \{D_f(p,x^0),D_f(p,x^1) \} < \infty . \end{aligned}$$

This implies that \(\{D_f(p,x^k)\}\) is bounded which implies \(\{x^k\}\) is bounded. As a consequence of this, the sequence \(\{w^k\}\) is bounded. We obtain from the properties of \(\Pi _C\) and continuity of \(\nabla f,\) that \(\{y^{k,i}\}\) is bounded. It follows easily also that the sequences \(\{z^{k,i}\}\) and \(\{v^k\}\) are bounded. \(\square \)

Lemma 3.7

Let \(\{w^{k}\}\) and \(\{y^{k,i}\}\) be the sequences given in Algorithm 3.1 such that \(\Vert w^k-y^{k,i}\Vert \rightarrow 0\) as \(k \rightarrow \infty .\) Let \(q \in C\) be the weak limit of the subsequence \(\{w^{k_j}\}\) of \(\{w^k\}\) for \(j \in {\mathbb {N}}.\) Then, \(q \in \Gamma .\)

Proof

From the definition of \(y^{k_j,i}\) and (2.4), we get

$$\begin{aligned} \langle \nabla f(w^{K_j})-\alpha ^{k_j}F_iw^{k_j}-\nabla f(y^{k_j,i}),x-y^{k_j,i}\rangle \le 0, ~~\forall ~ x \in C. \end{aligned}$$

This implies

$$\begin{aligned} \frac{1}{\alpha ^{k_j}}\langle \nabla f(w^{k_j})-\nabla f(y^{k_j,i}),x-y^{k_j,i} \rangle \ge \langle F^iw^{k_j},x-y^{k_j,i}\rangle , ~~ \forall ~x \in C. \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{\alpha ^{k_j}}\langle \nabla f(w^{k_j})-\nabla f(y^{k_j,i}),x-y^{k_j,i} \rangle + \langle F^iw^{k_j},y^{k_j,i} -w^{k_j}\rangle \le \langle F^iw^{k_j},x-y^{k_j,i}\rangle . \end{aligned}$$
(3.13)

Since \(\Vert w^{k_j}-y^{k_j,i}\Vert \) as \(j \rightarrow \infty ,\) we obtain by the Fréchet differentiability of f,  that

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert \nabla f(w^{k_j})-\nabla f(y^{k_j,i})\Vert =0. \end{aligned}$$

Thus, we have from this, (3.13) and \(\lim \limits _{j \rightarrow \infty }\alpha ^{k_j} >0\), that

$$\begin{aligned} \liminf \limits _{j \rightarrow \infty }\langle F^iw^{k_j,i},x-w^{k_j} \rangle \ge 0,~~ x \in C. \end{aligned}$$
(3.14)

We consider the following possible cases to show that \(q \in \Gamma .\)

Case i: Let \(\liminf \limits _{j \rightarrow \infty }\Vert F^iw^{k_j}\Vert =0.\) Since \(w^{k_j} \rightharpoonup q,\) we obtain by condition (A1), that \(F^iq=0.\) Hence \(q \in \Gamma .\)

Case ii: Suppose \(\liminf \limits _{j \rightarrow \infty }\Vert F^iw^{k_j}\Vert >0.\) Let \(\{\epsilon ^j\}\) be a sequence of nonnegative real numbers decreasing to zero as j increases to \(\infty .\) For each \(\epsilon ^j,\) let \(N_j\) be the smallest integer such that

$$\begin{aligned} \langle u^{k_j},x-w^{k_j} \rangle + \epsilon ^j \ge 0, ~~\forall j \in N_j, \end{aligned}$$
(3.15)

where \(u^{k_j}\) is the unit vector of \(F^iw^{k_j},\) that is \(u^{k_j}=\frac{F^iw^{k_j}}{\Vert F^iw^{k_j}\Vert }\) (suppose for each \(j \ge 0,\) \(F^iw^{k_j}\ne 0\) otherwise \(w^{k_j} \in \Gamma \)).

Therefore,

$$\begin{aligned} \langle F^iw^{N_j},x+\epsilon ^j u^{N_j}-w^{N_j} \rangle \ge 0. \end{aligned}$$

Since \(F^i\) is pseudomontone, we obtain

$$\begin{aligned} \langle F^i(x+\epsilon ^ju^{N_j}),x+\epsilon ^ju^{N_j}-w^{N_j}\rangle \ge 0,\end{aligned}$$

which implies

$$\begin{aligned} \langle F^ix,x-w^{N_j}\rangle \ge \langle Fx-F(x+\epsilon ^j u^{Nj}),x+\epsilon ^j u^{N_j}-w^{N_j}\rangle -\epsilon ^j \langle Fx,u^{N_j} \rangle . \end{aligned}$$
(3.16)

Next, we show that \(\lim \limits _{j \rightarrow \infty }\epsilon ^j u^{N_j}=0.\) To see this, since \(w^{k_j} \rightharpoonup q\) and \(\Vert w^{k_j}-y^{k_j,i}\Vert \rightarrow 0\) as \(j \rightarrow \infty ,\) we obtain \(y^{N_j,i} \rightharpoonup q\) as \(j \rightarrow \infty .\) By \(\{y^{k,i}\} \subset C,\) we obtain \(q \in C.\) Suppose \(0 < \Vert F^i z\Vert ,\) otherwise \(z \in \Gamma .\) By the weakly lower semicontinuity of the functional \(\Vert F(x)\Vert ,\) we obtain

$$\begin{aligned} 0 < \Vert F^i z\Vert \le \liminf \limits _{j \rightarrow \infty }\Vert F^iy^{k_j,i}\Vert . \end{aligned}$$

Since \(\{y^{N_j,i}\} \subset \{y^{k_j,i}\}\) and \(\epsilon ^j \rightarrow 0\) as \(j \rightarrow \infty .\) We obtain

$$\begin{aligned} 0 \le \limsup \limits _{j \rightarrow \infty }\Vert \epsilon ^j u^{N_j}\Vert =\limsup \limits _{j \rightarrow \infty }\left( \frac{\epsilon ^j}{\Vert F^iy^{k_j,i}\Vert }\right) \le \frac{\limsup \limits _{j \rightarrow \infty } \epsilon ^j}{\liminf \limits _{j \rightarrow \infty }\Vert F^iy^{k_j,i}\Vert } =0, \end{aligned}$$

which implies that \(\lim \limits _{j \rightarrow \infty }\epsilon ^ju^{N_j}=0.\)

Letting \(j \rightarrow \infty ,\) then the right hand side of (3.16) tends to zeo. Since \(F^i\) is Lipschitz continuous for each i\(\{w^{N_j}\},\) \(\{u^{N_j}\} \) are bounded and \(\lim \limits _{j \rightarrow \infty } \epsilon ^j u^{N_j}=0.\) Thus, we obtain

$$\begin{aligned} \liminf \limits _{j \rightarrow \infty } \langle F^i x,x-w^{k_j} \rangle \ge 0. \end{aligned}$$

Hence, for all \(x \in C,\) we have

$$\begin{aligned} \langle Fx,x-q \rangle = \lim \limits _{j \rightarrow \infty }\langle F^ix,x-w^{N_j} \rangle =\liminf \limits _{j \rightarrow \infty }\langle F^ix,x-w^{N_j} \rangle \ge 0. \end{aligned}$$

By Lemma 2.3, \(q \in \Gamma .\)

Theorem 3.8

Let Assumptions A and B hold. Then the sequence \(\{x^k\}\) defined iteratively by Algorithm 3.1 converges strongly to a point \(p \in \Gamma ,\) where \(p=\Pi _{\Gamma }(x^0).\)

Proof

Assume \(p \in \Gamma .\) We have from (3.5), that

$$\begin{aligned} D_f(p,x^{k+1})&=D_f(p,\nabla f^*(\beta ^k \nabla f(x^0)+(1-\beta ^k)\nabla f(v^k)))\nonumber \\&\le \beta ^k D_f(p,p)+(1-\beta ^k)D_f(p,v_k)+ \beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p\rangle \end{aligned}$$
(3.17)
$$\begin{aligned}&\le (1-\beta ^k)D_f(p,w_k)-\rho _{s}^*(\Vert \nabla f(w^k)\nonumber \\&\quad -\nabla f(z^{k,i})\Vert )+ \beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p\rangle \end{aligned}$$
(3.18)
$$\begin{aligned}&\le (1-\beta ^k)D_f(p,x^k)+\beta _k \left[ \frac{\theta ^k}{\beta ^k}(D_f(p,x^{k-1})-D_f(p,x^k))\nonumber \right. \\&\left. \quad + \beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p\rangle \right] \nonumber \\&=(1-\beta ^k)D_f(p,x^k)+\beta _k \Upsilon ^k, \end{aligned}$$
(3.19)

where

$$\begin{aligned}\Upsilon ^k: =\left[ \frac{\theta ^k}{\beta ^k}(D_f(p,x^{k-1})-D_f(p,x^k))+ \beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p\rangle \right] . \end{aligned}$$

It follows from (3.18), that

$$\begin{aligned} \sum \limits _{i=1}^{N}\delta ^{k,0}\delta ^{k,i}\rho _{s}^*(\Vert \nabla f(w^k)-\nabla f(z^{k,i})\Vert )\le (1-\beta ^k)D_f(p,x^k)+\beta ^k M_1, \end{aligned}$$
(3.20)

with \(M_1=\sup \{\Upsilon ^k: k \in {\mathbb {N}} \}.\) We now show that \(\{x^k\}\) converges strongly to p. To achieve this, set \(a^k:=D_f(p,x^k),\) then it follows from (3.19), that

$$\begin{aligned} a^{k+1}\le (1-\beta ^k)a^k+\beta ^k \Upsilon ^k.\end{aligned}$$

We will then apply Lemma 2.4. Indeed, we only need to show that \(\limsup \limits _{j \rightarrow \infty }\Upsilon ^{k_j} \le 0\) whenever a subsequence \(\{a^{k_j}\}\) of \(\{a^k\}\) satisfies

$$\begin{aligned} \liminf \limits _{j \rightarrow \infty }(a^{k_j+1}-a^{k_j}) \ge 0. \end{aligned}$$
(3.21)

Now, let \(\{a^{k_j}\}\) be a subsequence of \(\{a^k\}\) satisfies (3.21). Then by (3.20) and Assumption (B1), we obtain

$$\begin{aligned} \limsup \limits _{j \rightarrow \infty }\sum \limits _{i=1}^{N}\delta ^{k_j,0}\delta ^{k_j,i}\rho _{s}^{*}(\Vert \nabla f(w^k)-\nabla f(z^{k_j,i})\Vert )&\le \limsup \limits _{j \rightarrow \infty }(a^{k_j+1}-a^{k_j})+M_1\lim \limits _{j \rightarrow \infty }\beta ^{k_j}\\&=-\liminf \limits _{j \rightarrow \infty }(a^{k_j +1}-a^{k_j})\\&\le 0, \end{aligned}$$

which implies

$$\begin{aligned} \limsup \limits _{j \rightarrow \infty }\sum \limits _{i=1}^{N}\delta ^{k_j,0}\delta ^{k_j,i}\rho _{s}^{*}(\Vert \nabla f(w^k)-\nabla f(z^{k_j,i})\Vert )=0. \end{aligned}$$

From Assumption (B3), we obtain

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert \nabla f(w^{k_j})-\nabla f(z^{k_j,i})\Vert =0, \end{aligned}$$
(3.22)

thus

$$\begin{aligned} \Vert w^{k_j}-z^{k_j,i}\Vert \rightarrow 0~~\text {as}~~j \rightarrow \infty . \end{aligned}$$
(3.23)

Furthermore, we see from Lemma 3.5, that

$$\begin{aligned} D_f(p,x^{k+1})&\le (1-\beta ^k)D_f(p,v^k)+\beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p \rangle \nonumber \\&\le (1-\beta ^k)[\delta ^{k,0}D_f(p,w^k)+\sum \limits _{i=1}^{N}\delta ^{k,i}D_f(p,w^k)\nonumber \\&\quad -\sum \limits _{i=1}^{N}\delta ^{k,i}\left( 1-\frac{\mu \alpha ^k}{\sigma \alpha {k+1}} \right) [D_f(y^{k,i},w^k)\nonumber \\&\quad +D_f(z^{k,i},y^{k,i})] ]+\beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p \rangle \nonumber \\&\le (1-\beta ^k)[(1-\theta ^k)D_f(p,x^k)+\theta ^kD_f(p,x^{k-1})]\nonumber \\&\quad +\beta ^k \langle \nabla f(x^0)-\nabla f(p),x^{k+1}-p \rangle \nonumber \\&-(1-\beta ^k)\sum \limits _{i=1}^{N}\delta ^{k,i}\left( 1-\frac{\mu \alpha ^k}{\sigma \alpha ^{k+1}} \right) [D_f(y^{k,i},w^k)+D_f(z^{k,i},y^{k,i})] \nonumber \\&\le (1-\beta ^k)D_f(p,x^k)+\beta ^k M_1, \end{aligned}$$
(3.24)

where \(M_1\) is as before. Similarly, we obtain from (3.24), that

$$\begin{aligned}&\limsup \limits _{j \rightarrow \infty }(1-\beta ^{k_j})\sum \limits _{i=1}^{N}\delta ^{{k_j},i}\left( 1-\frac{\mu \alpha ^{k_j}}{\sigma \alpha ^{k_j+1}} \right) [D_f(y^{k_j,i},w^{k_j})+D_f(z^{k_j,i},y^{k_j,i})]\\&\quad \le \limsup \limits _{j \rightarrow \infty }(a^{k_j+1}-a^{k_j})+M_1\lim \limits _{j \rightarrow \infty }\beta ^{k_j}\\&\quad =-\liminf \limits _{j \rightarrow \infty }(a^{k_j +1}-a^{k_j})\\&\quad \le 0, \end{aligned}$$

which implies

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }D_f(y^{k,i},w^k)=\lim \limits _{j \rightarrow \infty }D_f(z^{k,i},y^{k,i})=0, ~~i=1,2, \cdots , N. \end{aligned}$$

It follows from (2.3), that

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert y^{k,i}-w^k\Vert =\lim \limits _{j \rightarrow \infty }\Vert z^{k,i}-y^{k,i}\Vert =0, ~~i=1,2, \cdots , N. \end{aligned}$$
(3.25)

Observe from (3.2), that

$$\begin{aligned} \Vert \nabla f(w^{k_j})-\nabla f(x^{k_j})\Vert&=\beta ^{k_j}\cdot \frac{\theta ^{k_j}}{\beta ^{k_j}}\Vert \nabla f(x^{k_j})-\nabla f(x^{k_j -1})\Vert \nonumber \\&=\beta ^{k_j}\cdot \frac{\theta ^{k_j}}{\beta ^{k_j}}\Vert x^{k_j}-x^{k_j -1}\Vert \rightarrow 0~~ \text {as}~~j \rightarrow \infty , \end{aligned}$$
(3.26)

thus

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert w^{k_j}-x^{k_j}\Vert =0. \end{aligned}$$
(3.27)

The following are easily obtained from previous established limits:

$$\begin{aligned} \Vert z^{k_j,i}-w^{k_j}\Vert =\Vert z^{k_j,i}-x^{k_j}\Vert =\Vert y^{k_j,i}-x^{k_j}\Vert =0, ~~\text{ as }~~ j \rightarrow \infty , ~~\forall ~i=1,2, \cdots , N. \end{aligned}$$
(3.28)

Using (3.28), the boundedness of \(\nabla f\) and the uniform continuity of f on bounded subsets of H,  we have

$$\begin{aligned} D_f(z^{k_j,i},w^{k_j})=f(z^{k_j,i})-f(w^{k_j})-\langle z^{k_j,i}-w^{k_j},\nabla f(z^{k_j,i}) \rightarrow 0 ~~\text{ as }~~j \rightarrow \infty , \end{aligned}$$
(3.29)

thus, we obtain

$$\begin{aligned} D_f(z^{k_j,i},v^{k_j})&=D_f(z^{k_j,i},\nabla f^*(\delta ^{k_j,0}\nabla f(w^{k_j})+\sum \limits _{i=1}^{N}\delta ^{k_j,i}\nabla f(z^{k_j,i}))\\&\le \delta ^{k_j,0}D_f(z^{k_j,i},w^{k_j})+\sum \limits _{i=1}^{N}\delta ^{k_j,i}D_f(z^{k_j,i},z^{k_j,i}). \end{aligned}$$

This implies \(D_f(z^{k_j,i},v^{k_j}) \rightarrow 0\) as \(j \rightarrow \infty \) for each \(i=1,2 \cdots , N.\) Hence

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert z^{k_j,i}-v^{k_j}\Vert =0,~~i=1,2,\cdots , N. \end{aligned}$$
(3.30)

It also follows Assumption (B1) and (3.5), that

$$\begin{aligned} D_f(v^{k_j},x^{k_j+1}) \le \beta ^{k_j}D_f(v^{k_j},x^0)+(1-\beta ^{k_j})D_f(v^{k_j},v^{k_j})~~\rightarrow 0~~\text{ as } ~j\rightarrow \infty , \end{aligned}$$

which implies

$$\begin{aligned} \lim \limits _{j \rightarrow \infty }\Vert v^{k_j}-x^{k_j+1}\Vert =0. \end{aligned}$$
(3.31)

Therefore, we obtain

$$\begin{aligned} \Vert x^{k_j+1}-x^{k_j}\Vert \le \Vert x^{k_j+1}-v^{k_j}\Vert +\Vert v^{k_j}-z^{k_j,i}\Vert +\Vert z^{k_j,i}-x^{k_j}\Vert \rightarrow 0~~\text{ as }~~j \rightarrow \infty . \end{aligned}$$
(3.32)

We can now show that \(\limsup \limits _{j \rightarrow \infty } \Upsilon ^{k_j}\le 0.\) Indeed, we only need to show that \(\langle \nabla f(x^0)-\nabla f(p),x^{k_j+1}-p \rangle .\) Now, let \(\{x^{k_{j_l}}\}\) be a subsequence of \(\{x^{k_j}\}\) satisfying

$$\begin{aligned} \limsup \limits _{j \rightarrow \infty }\langle \nabla f(x^0)-\nabla f(p),x^{k_j+1}-p \rangle =\lim \limits _{l \rightarrow \infty }\langle \nabla f(x^0)-\nabla f(p),x^{k_{j_l}+1}-p \rangle .\end{aligned}$$

Since \(\{x^{k_j}\}\) is bounded, there exists a subsequence of \(\{x^{k_{j_l}}\}\) of \(\{x^{k_j}\}\) such that \(\{x^{k_{j_l}}\} \rightharpoonup q \in H.\) The weak convergences of the subsequences \(\{w^{k_{j_l}}\},\{y^{k_{j_l}}\}, \{z^{k_{j_l}}\}\) and \(\{v^{k_{j_l}}\}\) from this and previously established limits. We therefore get from (3.25) and Lemma 3.7, that \(q \in \Gamma .\) Also from \(p=\Pi _{\Gamma }x^0,\) we have by the charateristics of the generalized projection, that

$$\begin{aligned} \limsup \limits _{j \rightarrow \infty }\langle \nabla f(x^0)-\nabla f(p),x^{k_j}-p\rangle =\langle \nabla f(x^0)-\nabla f(p),q-p \rangle \le 0. \end{aligned}$$
(3.33)

Hence, by this and (3.32), we get

$$\begin{aligned} \limsup \limits _{j \rightarrow \infty } \langle \nabla f(x^0)-\nabla f(p), x^{k_j+1}-p \rangle \le 0, \end{aligned}$$
(3.34)

which implies \(\limsup \limits _{j \rightarrow \infty } \Upsilon ^{k_j}\le 0.\) We therefore conclude by Lemma 2.4 and (3.21), that the sequence \(\{x^{k}\}\) converges strongly to p. \(\square \)

The following are some consequences of our main theorem.

  1. (1)

    Let \(N=1,\) then we obtain a result for approximating a solution of VIP problem. Thus the results of Thong et al. (2021) are corollaries of our main result in this paper.

  2. (2)

    Suppose \(H=E\) is a real Hilbert space, then the result obtained in Jolaoso et al. (2021b) becomes a corollary of our main theorem.

4 Application to equilibrium problem

Let E be a real reflexive Banach space and C be a nonempty, closed and convex subset of E. Let \(G:C \times C \rightarrow {\mathbb {R}}\) be a bifunction, the Equilibrium Problem (shortly, EP) with respect to G on C is defined as:

$$\begin{aligned} \text {Find} \quad x^* \in C \quad \text {such that}\quad G(x^*,y) \ge 0, \quad \forall ~y \in C. \end{aligned}$$
(4.1)

We denote the solution set of EP (4.1) by EP(CG). The EP was earlier referred to as Ky Fan inequality due to the contribution of Fan (1972) in the development of the field. However, it gained more attention after the revisitation by Muu and Oettli (1992) and Blum and Oettli (1994) in the 90s. Many important optimization problems such as Kakutani fixed point problems, minimax problem, Nash equilibrium problem, variational inequality problem, and minimization problem can be reformulated as (1.1). The EP has also served as an important tool in the study of wide range of obstacle, unilateral, and important problems arising in various branches of pure and applied sciences; see, for instance, Brézis et al. (1972), Lyashko and Semenov (2016). More so, several algorithms have been introduced for solving the EP (4.1) Hilbert and Banach spaces (see for instance Jolaoso 2021 and references therein).

Definition 4.1

Let \(C \subseteq E\) and \(G: C \times C \rightarrow {\mathbb {R}}\) be a bifunction. Then G is said to be

  • monotone on C if \(G(x,y) + G(y,x) \le 0 \quad \forall ~~x,y \in C;\)

  • pseudo-monotone on C if \(G(x,y) \ge 0 \Rightarrow G(y,x) \le 0 \quad \forall ~~x,y \in C.\)

Remark 4.2

Jolaoso (2021) Every monotone bifunction on C is pseudo-monotone but the converse is not true. A mapping \(A: C \rightarrow E^{*}\) is pseudo-monotone if and only if the bifunction \(G(x,y) = \langle Ax, y-x \rangle \) is pseudo-monotone on C.

For solving the EP, we assume that the bifunction g satisfies the following:

Assumption 4.3

  1. (A1)

    G is weakly continuous on \(C \times C\),

  2. (A2)

    \(G(x,\cdot )\) is convex lower semicontinuous and subdifferentiable on C for every fixed \(x \in C\),

  3. (A3)

    for each \(x,y,z \in C\), \(\limsup _{t \downarrow 0}G(tx + (1-t)y,z) \le G(y,z).\)

Proposition 4.4

Jolaoso et al. (2020) Let E be a real reflexive Banach space and C be a nonempty, closed and convex subset of E. Let \(G:C \times C \rightarrow {\mathbb {R}}\) be a bifunction such that \(G(x,x) = 0\) and \(f:E \rightarrow {\mathbb {R}}\) be a Legendre and totally coercive function. Then a point \(x^* \in EP(C,G)\) if and only if \(x^*\) solves the following minimization problem:

$$\begin{aligned} \min \bigg \{\lambda G(x,y) + D_f(y,x) : y \in C\bigg \}, \end{aligned}$$

where \(x\in C\) and \( \lambda >0\).

Proposition 4.5

Jolaoso et al. (2020) Let C be a nonempty closed convex subset of a real reflexive Banach space E and \(f:E \rightarrow {\mathbb {R}}\) be a Legendre and totally coercive function. Let \(F: C \rightarrow E^*\) be a nonlinear mapping such that \(x \in VI(C,F)\). Then x is the unique solution of the minimization problem

$$\begin{aligned} \min \bigg \{ \lambda \langle Fu,y-u \rangle + D_f (y,u) : y \in C\bigg \}, \end{aligned}$$

where \(u \in C\) and \(\lambda >0\).

Thus, with the aid of Proposition 4.4 and 4.5, setting \(\langle Fx, y-x \rangle = G(x,y)\) for all \(x,y \in C\) in Algorithm 3.1. We obtain the following result for solving finite family of pseudomonotone EP in reflexive Banach spaces.

Theorem 4.6

Let E be a real reflexive Banach space and C be a nonempty closed and convex subset of E. For \(i=1,\dots ,N\), let \(G^i:C \times C \rightarrow {\mathbb {R}}\) be a finite family of bifunction such that \(G_i(x,x) = 0\) for all \(x \in C\) satisfying Assumption 4.3. Let \(f:E \rightarrow {\mathbb {R}}\cup \{+\infty \}\) be a function which is Legendre, uniformly Fréchet differentiable, strongly coercive and bounded on bounded subsets of E. Suppose \(\Gamma = \bigcap _{i=1}^N EP(C,G^i)\) is nonempty and Assumption B is satisfied. Let the sequence \(\{x^k\}\) be generated by the following algorithm:

figure b

Then \(\{x^k\}\) converges strongly to a point \(u = \Pi _\Gamma (x^0).\)

5 Numerical example

In this section, we present some numerical illustrations which demonstrate the performance of our proposed algorithm. In the first example, we consider several types of the Bregman dunction f given in Example 2.1 and compare the performance of the algorithm for each function. In particular, for \(x \in {\mathbb {R}}^m,\) it is clear that

  1. (i)

    \(\nabla f^{KL}(x) = ( 1 + \ln (x_1), \dots , 1 + \ln (x_m))^\top \) and \((\nabla f^{KL})^{-1}(x) = (\exp (x_1 - 1), \dots , \exp (x_m -1))^\top \);

  2. (ii)

    \(\nabla f^{IS}(x) = - \left( \frac{1}{x_1}, \dots , \frac{1}{x_m} \right) ^\top \) and \((\nabla f^{IS})^{-1}(x) =- \left( \frac{1}{x_1}, \dots , \frac{1}{x_m} \right) ^\top \);

  3. (iii)

    \(\nabla f^{SE}(x) = x\) and \((\nabla f^{SE})^{-1}(x) = x.\)

Example 5.1

Let \(E= {\mathbb {R}}^m\) and operator \(A_i\) be defined by \(F^i(x) = M^i x + q\) for \(i=1,\dots ,N,\) where \(M^i := B^i{B^i}^\top + S^i + D^i,\) with \(B^i, S^i, D^i \in {\mathbb {R}}^{m \times m}\) randomly generated matrices such that \(S^i\) is skew-symmetric, \(D^i\) is positive definite diagonal matrix and \(q=0.\) Hence the CSVIP has a unique solution. The feasible set C is defined by

$$\begin{aligned} C:= \left\{ x = (x_1, \dots ,x_m) \in {\mathbb {R}}^m : \Vert x \Vert \le 1, \quad x_i > 1/a \right\} \end{aligned}$$

for \(a =2.\) Hence the unique solution of the CSVIP = \(\{0\}.\) The projection onto C is calculated explicitly. We use the following parameter for our computation: \(\mu = 0.25, \theta = 0.0125, \alpha ^0 = 0.01,\) \(\beta ^k = \frac{1}{10(k+1)}, \delta ^{k,i} = \frac{8}{9^i} + \frac{1}{N9^N}\) for \(i=1,\dots ,N\) and \(\delta ^{k,0} =0.\) The initial points \(x^0, x^1 \in {\mathbb {R}}^m\) are generated randomly. We choose the following criterion \(E_n = \Vert x^k \Vert ^2 < \varepsilon ,\) where \(\varepsilon = 10^{-3}\) and test the algorithm for the following values of m and N : 

  • Case 1: \(m = 5\) and \( N =10,\)

  • Case 2: \(m=20\) and \(N=20,\)

  • Case 3: \(m=25\) and \(N=20.\)

We present the corresponding numerical results in terms of number of iterations and CPU time-taken for the computations in Table 1 and Figure 1.

Table 1 Computation result for Example 5.1
Fig. 1
figure 1

Example 5.1, Left: Case 1; Middle: Case 2; Right: Case 3

Example 5.2

In this example, we consider the same problem as in Example 5.1 with \(f(x) = \frac{1}{2}\Vert x\Vert ^2.\) The feasible set is described by the linear inequality constraint \(Bx \le b\) for some random matrix \(B \in {\mathbb {R}}^{k \times m}\) and a random vector \(b \in {\mathbb {R}}^k\) with nonnegative entries. The projection onto C is computed using the MATLAB solver "quadprog”. In this case, we compare the performance of Algorithm 3.1 (namely, IBPSEM) with (Hieu 2017, Algorithm 3.1) (namely, PHEM) and (Kitisak et al. 2020, Algorithm 3.1) (namely, MPHSEM). For Algorithm 3.1, we take \(\mu = 0.09, \theta = 0.36, \beta ^k = \frac{1}{k+1}, \delta ^{k,i} = \frac{1}{N+1}\) for \(i=0,1,\dots ,N\); for PHEM, we take \(\lambda = \frac{1}{2L}\) with \(L = \underset{1 \le i \le N}{\max }\Vert M_i\Vert ,\) and for MPHEM, we take \(\mu = 0.25, \rho = 0.9.\) We test the algorithms for the following values of m and N while \(E_n = \Vert x^{k+1}-x^k \Vert < 10^{-4}.\) The numerical results are shown in Table 2 and Figure 2.

Table 2 Computation result for Example 5.2
Fig. 2
figure 2

Example 5.2, Left: Case 1; Middle: Case 2; Right: Case 3

Example 5.3

Let \(E=L^2([0,2\pi ])\) and \(C=\{x \in L^2([0,2\pi ]):\Vert x(t)-\sin t\Vert ^2 \le \sqrt{t} \}\) for \(t \in [0,2\pi ]\) and \(i=1,2.\) Let \(F^i(x)=\frac{x}{i}.\) The set of solution \(\Gamma \) is nonempty since \(\Gamma =\{0\}.\) In this example, we compare our Algorithm 3.1 with PHEM and MPSEM. We choose the following parameters: \(\delta ^{k,0} = \frac{2}{100k+1}, \delta ^{k,1} = \frac{5}{500k+7}, \delta ^{k,2} = 1-\delta ^{k,0} - \delta ^{k,1}, \alpha ^0= \theta = 0.25, \beta ^k = \frac{1}{5k+1}.\) We test the algorithms using the following initial points:

  • Case I: \(x^0 = \frac{3t}{4}\) and \(x^1 = t-2\),

  • Case II: \(x^0 = t^2 -3t\) and \(x^1 = 2t +1,\)

  • Case III: \(x^0 = 2t +3\) and \(x^1 = \frac{t}{4}.\)

We use \(E_k = \Vert x^{k+1} - x^k\Vert < 10^{-4}\) as the stopping criterion. The numerical results are shown in Table 3 and Figure 3.

Table 3 Computation result for Example 5.3
Fig. 3
figure 3

Example 5.3, Left: Case 1; Middle: Case 2; Right: Case 3